Efficiently Serving Large Language Models - Optimizing Performance and Resource Management
Centre for Networked Intelligence, IISc via YouTube
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
Build a Learning Habit
Download Class Central's free printable study calendar
Download for Free
Explore the challenges and solutions in efficiently serving Large Language Models (LLMs) in this technical talk by Microsoft Research India's Senior Researcher Dr. Ashish Panwar. Gain insights into why LLM deployment requires multiple GPUs per replica despite low resource utilization, and discover cutting-edge research from Microsoft addressing these efficiency challenges. Learn about innovative solutions like Sarathi-Serve [OSDI'24] and vAttention [ASPLOS'25], which tackle fundamental scheduling and memory management issues in LLM serving systems. Understand the current landscape of LLM deployment across applications such as chatbots, search, and code assistants, while diving into the technical complexities of making these systems more resource-efficient and cost-effective.
Syllabus
Time: 5:00 PM - PM IST
Taught by
Centre for Networked Intelligence, IISc