Stuck in Tutorial Hell? Learn Backend Dev the Right Way
Master Production-Ready Machine Learning, Step by Step
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how to serve large language models like DeepSeek and Qwen with state-of-the-art speeds using SGLang, an open-source fast serving framework for LLMs and VLMs that generates trillions of tokens daily at companies like xAI, AMD, and Meituan. Discover when SGLang is the appropriate tool for LLM workloads and gain hands-on experience deploying and optimizing your first model with this powerful framework. Explore the advantages of SGLang over other serving frameworks like vLLM, Ollama, and TensorRT-LLM through practical demonstrations and expert guidance. Master the deployment process, optimization techniques, and best practices for achieving maximum performance when serving large language models in production environments. Understand the technical architecture behind SGLang's high-speed token generation capabilities and learn how to leverage its features for various AI engineering use cases. Gain insights from Philip Kiely, who leads Developer Relations at Baseten, and Yineng Zhang, a core SGLang developer and Software Engineer on Baseten's Model Performance team, as they share real-world experience and practical tips for implementing SGLang in your AI infrastructure.
Syllabus
Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten
Taught by
AI Engineer