Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

How Fast Can Your Model Composition Run in Serverless Inference?

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off your first 3 months — limited time.
Unlock All Certificates
Explore the challenges and solutions for efficient multi-model composition and inference in serverless Kubernetes environments in this conference talk. Learn how the integration of BentoML with Dragonfly addresses slow deployment times, high operational costs, and scalability issues when serving interconnected suites of ML models. Discover a compelling case study of a RAG application combining LLM, embedding, and OCR models, showcasing efficient packaging and swift distribution through Dragonfly's innovative P2P network. Delve into the utilization of open-source technologies like JuiceFS and VLLM to achieve remarkable deployment times of just 40 seconds and establish a scalable blueprint for multi-model composition deployments. Gain insights into transforming the landscape of AI model serving and overcoming complexities in typical AI applications requiring multiple interconnected models.

Syllabus

How Fast Can Your Model Composition Run in Serverless Inference? - Fog Dong, BentoML & Wenbo Qi

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of How Fast Can Your Model Composition Run in Serverless Inference?

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.