vLLM TPU - A New Unified Backend Supporting PyTorch and JAX Natively on TPU

Explore the revolutionary new TPU backend in vLLM through this comprehensive conference talk from Ray Summit 2025, where Google's Manoj Krishnan and Brittany Rockwell demonstrate how this unified backend accelerates large-scale inference across both PyTorch and JAX models under a single consolidated codepath. Discover how this backend maintains vLLM's signature ease-of-use and portability while enabling seamless transitions between hardware types and introducing cutting-edge TPU capabilities specifically designed for XL-scale model deployments. Learn about disaggregated serving for flexible resource allocation, advanced parallelism strategies optimized for Mixture-of-Experts (MoE) models, highly optimized Pallas kernels that maximize TPU performance, and enhanced multimodal support tailored for large heterogeneous model architectures. Gain insights into architectural details, performance optimizations, and practical deployment patterns that position this new TPU backend as a powerful solution for teams operating frontier-scale AI models.