Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Introduction to LLM Serving with SGLang

AI Engineer via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn how to serve large language models like DeepSeek and Qwen with state-of-the-art speeds using SGLang, an open-source fast serving framework for LLMs and VLMs that generates trillions of tokens daily at companies like xAI, AMD, and Meituan. Discover when SGLang is the appropriate tool for LLM workloads and gain hands-on experience deploying and optimizing your first model with this powerful framework. Explore the advantages of SGLang over other serving frameworks like vLLM, Ollama, and TensorRT-LLM through practical demonstrations and expert guidance. Master the deployment process, optimization techniques, and best practices for achieving maximum performance when serving large language models in production environments. Understand the technical architecture behind SGLang's high-speed token generation capabilities and learn how to leverage its features for various AI engineering use cases. Gain insights from Philip Kiely, who leads Developer Relations at Baseten, and Yineng Zhang, a core SGLang developer and Software Engineer on Baseten's Model Performance team, as they share real-world experience and practical tips for implementing SGLang in your AI infrastructure.

Syllabus

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

Taught by

AI Engineer

Reviews

Start your review of Introduction to LLM Serving with SGLang

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.