Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

vLLM TPU - A New Unified Backend Supporting PyTorch and JAX Natively on TPU

Anyscale via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore the revolutionary new TPU backend in vLLM through this comprehensive conference talk from Ray Summit 2025, where Google's Manoj Krishnan and Brittany Rockwell demonstrate how this unified backend accelerates large-scale inference across both PyTorch and JAX models under a single consolidated codepath. Discover how this backend maintains vLLM's signature ease-of-use and portability while enabling seamless transitions between hardware types and introducing cutting-edge TPU capabilities specifically designed for XL-scale model deployments. Learn about disaggregated serving for flexible resource allocation, advanced parallelism strategies optimized for Mixture-of-Experts (MoE) models, highly optimized Pallas kernels that maximize TPU performance, and enhanced multimodal support tailored for large heterogeneous model architectures. Gain insights into architectural details, performance optimizations, and practical deployment patterns that position this new TPU backend as a powerful solution for teams operating frontier-scale AI models.

Syllabus

vLLM TPU: A new unified-backend supporting Pytorch and JAX natively on TPU | Ray Summit 2025

Taught by

Anyscale

Reviews

Start your review of vLLM TPU - A New Unified Backend Supporting PyTorch and JAX Natively on TPU

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.