Get 20% off all career paths from fullstack to AI
Google AI Professional Certificate - Learn AI Skills That Get You Hired
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore how Robinhood scaled its machine learning platform to support large-model and large-dataset training through distributed training with KubeRay in this 32-minute conference talk from Ray Summit 2025. Learn from Lanting Chiang and Robert Macy as they detail Robinhood's journey from single-node training limitations to implementing distributed training capabilities essential for future model development. Discover the evaluation process and architectural decisions that led to adopting KubeRay for large-scale distributed training, including how Ray was integrated into their existing ML training stack. Understand the platform-level abstractions Robinhood built to make distributed training seamless and accessible for internal teams, and examine how their unique Kubernetes environment influenced their choice between native KubeRay components and alternative solutions. Gain practical insights into integrating Ray into a production ML platform, including lessons learned, architectural best practices, and strategies for enabling distributed training at scale in real-world enterprise environments.
Syllabus
Ray @ Robinhood: Distributed ML Training with KubeRay | Ray Summit 2025
Taught by
Anyscale