Free courses from frontend to fullstack and AI
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how Uber leverages Ray to scale machine learning model training across heterogeneous compute environments in this 31-minute conference talk from Ray Summit 2025. Discover Uber's architectural strategies for supporting large-scale training of LLMs, recommendation systems, and high-capacity models through Ray's unified distributed computing framework. Explore multi-cloud training approaches that span diverse compute providers for enhanced flexibility and resource availability, along with disaster recovery-ready designs that ensure continuity for production-grade ML workloads during cloud outages or regional failures. Gain insights into cross-environment portability solutions that enable seamless transitions between cloud and on-premise clusters, plus detailed performance optimization techniques covering GPU/accelerator efficiency, system-level tuning, and effective patterns for large distributed jobs. Understand how Ray has become integral to Uber's rapidly expanding machine learning infrastructure, enabling standardized training workflows while operating at massive scale, and learn practical lessons applicable to enterprise ML systems of any size.
Syllabus
Inside Uber: Scaling Model Training with Ray | Ray Summit 2025
Taught by
Anyscale