MIT Sloan: Lead AI Adoption Across Your Organization — Not Just Pilot It
Build the Finance Skills That Lead to Promotions — Not Just Certificates
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore how Robinhood scaled its machine learning platform to support large-model and large-dataset training through distributed training with KubeRay in this 32-minute conference talk from Ray Summit 2025. Learn from Lanting Chiang and Robert Macy as they detail Robinhood's journey from single-node training limitations to implementing distributed training capabilities essential for future model development. Discover the evaluation process and architectural decisions that led to adopting KubeRay for large-scale distributed training, including how Ray was integrated into their existing ML training stack. Understand the platform-level abstractions Robinhood built to make distributed training seamless and accessible for internal teams, and examine how their unique Kubernetes environment influenced their choice between native KubeRay components and alternative solutions. Gain practical insights into integrating Ray into a production ML platform, including lessons learned, architectural best practices, and strategies for enabling distributed training at scale in real-world enterprise environments.
Syllabus
Ray @ Robinhood: Distributed ML Training with KubeRay | Ray Summit 2025
Taught by
Anyscale