Efficiently Serving Large Language Models - Optimizing Performance and Resource Management
Centre for Networked Intelligence, IISc via YouTube
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Get Coursera Plus for 40% off
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the challenges and solutions in efficiently serving Large Language Models (LLMs) in this technical talk by Microsoft Research India's Senior Researcher Dr. Ashish Panwar. Gain insights into why LLM deployment requires multiple GPUs per replica despite low resource utilization, and discover cutting-edge research from Microsoft addressing these efficiency challenges. Learn about innovative solutions like Sarathi-Serve [OSDI'24] and vAttention [ASPLOS'25], which tackle fundamental scheduling and memory management issues in LLM serving systems. Understand the current landscape of LLM deployment across applications such as chatbots, search, and code assistants, while diving into the technical complexities of making these systems more resource-efficient and cost-effective.
Syllabus
Time: 5:00 PM - PM IST
Taught by
Centre for Networked Intelligence, IISc