Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

SGLang - An Efficient Open-Source Framework for Large-Scale LLM Serving

Anyscale via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore SGLang's high-performance architecture for large-scale LLM serving in this 28-minute conference talk from Ray Summit 2025. Learn how SGLang has emerged as a leading framework powering production workloads at major companies through its optimized design and advanced inference capabilities. Discover the core features that set SGLang apart, including its lightweight execution engine, optimized KVCache handling, and flexible serving abstractions engineered for both high-throughput batch processing and ultra-low-latency interactive applications. Dive deep into the performance optimization techniques that enable SGLang to consistently outperform traditional serving stacks, covering scheduling strategies, memory management improvements, parallel execution paths, and advanced kernel optimizations designed for modern accelerators. Gain insights from real-world production deployments and understand how companies leverage SGLang to support rapid model iteration, achieve cost-efficient scaling, and maintain stable high-volume traffic patterns. Examine the future roadmap for SGLang, including upcoming performance enhancements, deeper hardware integration, and new features aimed at simplifying large-scale LLM serving for enterprises and open-source users alike.

Syllabus

SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving | Ray Summit 2025

Taught by

Anyscale

Reviews

Start your review of SGLang - An Efficient Open-Source Framework for Large-Scale LLM Serving

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.