Build GenAI Apps from Scratch — UCSB PaCE Certificate Program
Stuck in Tutorial Hell? Learn Backend Dev the Right Way
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the cutting-edge advancements in LLM inference performance through this 36-minute conference talk by Cade Daniel and Zhuohan Li. Dive into the world of vLLM, an open-source engine developed at UC Berkeley that has revolutionized LLM inference and serving. Learn about key performance-enhancing techniques such as paged attention and continuous batching. Discover recent innovations in vLLM, including Speculative Decoding, Prefix Caching, Disaggregated Prefill, and multi-accelerator support. Gain insights from industry case studies and get a glimpse of vLLM's future roadmap. Understand how vLLM's focus on production-readiness and extensibility has led to new system insights and widespread community adoption, making it a state-of-the-art, accelerator-agnostic solution for LLM inference.
Syllabus
Accelerating LLM Inference with vLLM
Taught by
Databricks