Power BI Fundamentals - Create visualizations and dashboards from scratch
Get Coursera Plus for 40% off
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how Robinhood scaled its machine learning platform to support large-model and large-dataset training through distributed training with KubeRay in this 32-minute conference talk from Ray Summit 2025. Learn from Lanting Chiang and Robert Macy as they detail Robinhood's journey from single-node training limitations to implementing distributed training capabilities essential for future model development. Discover the evaluation process and architectural decisions that led to adopting KubeRay for large-scale distributed training, including how Ray was integrated into their existing ML training stack. Understand the platform-level abstractions Robinhood built to make distributed training seamless and accessible for internal teams, and examine how their unique Kubernetes environment influenced their choice between native KubeRay components and alternative solutions. Gain practical insights into integrating Ray into a production ML platform, including lessons learned, architectural best practices, and strategies for enabling distributed training at scale in real-world enterprise environments.
Syllabus
Ray @ Robinhood: Distributed ML Training with KubeRay | Ray Summit 2025
Taught by
Anyscale