Introduction to LLM Serving with SGLang

Learn how to serve large language models like DeepSeek and Qwen with state-of-the-art speeds using SGLang, an open-source fast serving framework for LLMs and VLMs that generates trillions of tokens daily at companies like xAI, AMD, and Meituan. Discover when SGLang is the appropriate tool for LLM workloads and gain hands-on experience deploying and optimizing your first model with this powerful framework. Explore the advantages of SGLang over other serving frameworks like vLLM, Ollama, and TensorRT-LLM through practical demonstrations and expert guidance. Master the deployment process, optimization techniques, and best practices for achieving maximum performance when serving large language models in production environments. Understand the technical architecture behind SGLang's high-speed token generation capabilities and learn how to leverage its features for various AI engineering use cases. Gain insights from Philip Kiely, who leads Developer Relations at Baseten, and Yineng Zhang, a core SGLang developer and Software Engineer on Baseten's Model Performance team, as they share real-world experience and practical tips for implementing SGLang in your AI infrastructure.