Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Llumnix - Dynamic Scheduling for Large Language Model Serving

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a groundbreaking approach to large language model (LLM) serving in this 16-minute conference talk from OSDI '24. Dive into Llumnix, an innovative system designed to address the challenges of heterogeneous and unpredictable requests in LLM inference serving. Learn how Llumnix implements runtime rescheduling across multiple model instances, similar to context switching in modern operating systems, to improve load balancing, resource utilization, and request prioritization. Discover the efficient live migration mechanism for requests and in-memory states, and understand how the dynamic scheduling policy unifies multiple rescheduling scenarios. Gain insights into Llumnix's impressive performance improvements, including significant reductions in tail latencies, acceleration of high-priority requests, and potential cost savings compared to existing LLM serving systems. Access the open-source implementation and explore how Llumnix is revolutionizing the field of LLM serving to unlock the full potential of these powerful models in real-world applications.

Syllabus

OSDI '24 - Llumnix: Dynamic Scheduling for Large Language Model Serving

Taught by

USENIX

Reviews

Start your review of Llumnix - Dynamic Scheduling for Large Language Model Serving

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.