SGLang - An Efficient Open-Source Framework for Large-Scale LLM Serving

Free courses from frontend to fullstack and AI

Learn More →

Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified

Learn More →

Overview

AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off

One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.

Unlock All Certificates

Explore SGLang's high-performance architecture for large-scale LLM serving in this 28-minute conference talk from Ray Summit 2025. Learn how SGLang has emerged as a leading framework powering production workloads at major companies through its optimized design and advanced inference capabilities. Discover the core features that set SGLang apart, including its lightweight execution engine, optimized KVCache handling, and flexible serving abstractions engineered for both high-throughput batch processing and ultra-low-latency interactive applications. Dive deep into the performance optimization techniques that enable SGLang to consistently outperform traditional serving stacks, covering scheduling strategies, memory management improvements, parallel execution paths, and advanced kernel optimizations designed for modern accelerators. Gain insights from real-world production deployments and understand how companies leverage SGLang to support rapid model iteration, achieve cost-efficient scaling, and maintain stable high-volume traffic patterns. Examine the future roadmap for SGLang, including upcoming performance enhancements, deeper hardware integration, and new features aimed at simplifying large-scale LLM serving for enterprises and open-source users alike.