Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Accelerating LLM Inference with vLLM

Databricks via YouTube

Start learning Write review

Details

Start learning

Provider

YouTube
Pricing

Free Video
Languages

English
Effort

36 minutes
Sessions

Self-Paced
Level

Advanced

Found in

Learn AI, Data Science & Business — Earn Certificates That Get You Hired

Learn More →

Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified

Learn More →

Overview

AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off

One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.

Unlock All Certificates

Explore the cutting-edge advancements in LLM inference performance through this 36-minute conference talk by Cade Daniel and Zhuohan Li. Dive into the world of vLLM, an open-source engine developed at UC Berkeley that has revolutionized LLM inference and serving. Learn about key performance-enhancing techniques such as paged attention and continuous batching. Discover recent innovations in vLLM, including Speculative Decoding, Prefix Caching, Disaggregated Prefill, and multi-accelerator support. Gain insights from industry case studies and get a glimpse of vLLM's future roadmap. Understand how vLLM's focus on production-readiness and extensibility has led to new system insights and widespread community adoption, making it a state-of-the-art, accelerator-agnostic solution for LLM inference.