Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore the revolutionary new TPU backend in vLLM through this comprehensive conference talk from Ray Summit 2025, where Google's Manoj Krishnan and Brittany Rockwell demonstrate how this unified backend accelerates large-scale inference across both PyTorch and JAX models under a single consolidated codepath. Discover how this backend maintains vLLM's signature ease-of-use and portability while enabling seamless transitions between hardware types and introducing cutting-edge TPU capabilities specifically designed for XL-scale model deployments. Learn about disaggregated serving for flexible resource allocation, advanced parallelism strategies optimized for Mixture-of-Experts (MoE) models, highly optimized Pallas kernels that maximize TPU performance, and enhanced multimodal support tailored for large heterogeneous model architectures. Gain insights into architectural details, performance optimizations, and practical deployment patterns that position this new TPU backend as a powerful solution for teams operating frontier-scale AI models.
Syllabus
vLLM TPU: A new unified-backend supporting Pytorch and JAX natively on TPU | Ray Summit 2025
Taught by
Anyscale