The Open Source AI Compute Tech Stack - Kubernetes + Ray + PyTorch + vLLM

Explore the emerging open source technology stack for AI compute workloads through this 19-minute conference talk that examines how Kubernetes, Ray, PyTorch, and vLLM work together to address the computational demands of modern AI applications. Learn about the challenges companies face when productionizing AI, including the need for scale across compute and data, unprecedented heterogeneity across workloads and hardware accelerators, and the fragmented nature of the rapidly evolving software landscape. Discover three key trends shaping AI development: the shift to multimodal data processing, the rise of agentic AI and multi-agent systems, and the increasing complexity of post-training and reinforcement learning workflows. Understand how each framework in the stack serves a specific role and how they operate together to bridge the gap between AI applications and hardware infrastructure. Examine real-world case studies from Pinterest, Uber, and Roblox that demonstrate practical implementations of this technology stack. Gain insights into the layered architecture spanning training and inference, distributed compute, and orchestration layers, while exploring the future direction of AI infrastructure development.

Syllabus

0:00 - Introduction to Ray and AnyScale
1:27 - Early Adoption and Growth of Ray
2:59 - Trend 1: Shift to Multimodal Data Processing
5:42 - Trend 2: Rise of Agentic AI and Multi-Agent Systems
7:48 - Trend 3: Post-Training and Reinforcement Learning Complexity
12:00 - Bridging the Gap Between AI Applications and Hardware
13:00 - Layered Tech Stack: Training/Inference, Distributed Compute, Orchestration
17:08 - Future of AI Tech Stack and Conference Invitation