Learn Excel & Financial Modeling the Way Finance Teams Actually Use Them
Give the Gift That Unlocks Potential
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn how Roblox built a modern machine learning platform using Ray to train 3D foundation models at scale in this 31-minute conference talk from Ray Summit 2025. Discover the platform architecture that integrates KubeRay with Istio and Kubeflow to support authentication, multi-tenancy, and secure workflow orchestration. Explore Roblox's open-source contributions including the new KubeRay dashboard for improved iterative development, along with innovations like peer-to-peer Docker image distribution, lazy image pulling, and scaling Ray jobs across multiple Kubernetes clusters. Understand the challenges of applying Ray to large-scale foundation model training, including high-volume LLM batch labeling jobs, leveraging Ray Data at scale, and supporting demanding distributed workloads. Follow Roblox's journey from MPI-based distributed training to adopting Ray Train as their default framework, gaining critical features like observability and fault tolerance while successfully migrating the majority of their training use cases. Gain practical insights into building production ML platforms on Ray, modernizing distributed training infrastructure, and supporting large-scale foundation model development in complex, multi-team environments through real-world implementation examples and architectural decisions.
Syllabus
How Roblox Trains 3D Foundation Models with Ray | Ray Summit 2025
Taught by
Anyscale