Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

NVIDIA's Framework for Scalable Data Curation

Anyscale via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how NVIDIA and Roblox built a modern ML platform using Ray to train large-scale 3D foundation models in this 30-minute conference talk from Ray Summit 2025. Discover the platform's architecture, including the integration of KubeRay with Istio and Kubeflow to support authentication, multi-tenancy, and secure orchestration. Explore infrastructure innovations such as peer-to-peer Docker image distribution, lazy pulling, and the ability to scale Ray jobs across multiple clusters, along with the open-sourcing of the new KubeRay dashboard for improved iterative development. Examine the challenges faced when applying Ray to foundation-model training workloads, including orchestrating massive LLM batch labeling jobs, leveraging Ray Data at scale, and supporting large distributed pipelines across heterogeneous compute. Understand how Roblox transitioned from MPI-based distributed training to Ray Train as their default framework, improving reliability and simplifying operations while gaining critical capabilities like observability and fault tolerance. Gain practical insights into building production-grade Ray platforms, modernizing distributed training workflows, and supporting multimodal foundation model development at enterprise scale.

Syllabus

NVIDIA’s Framework for Scalable Data Curation | Ray Summit 2025

Taught by

Anyscale

Reviews

Start your review of NVIDIA's Framework for Scalable Data Curation

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.