Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Fast and Furious - Practice in Horizon Robotics on Large-scale End-to-end Model Training

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn how Horizon Robotics tackles large-scale end-to-end model training for autonomous driving technology in this 30-minute conference talk from CNCF. Discover the company's approach to efficiently training and deploying advanced perception models like Sparse4D using cloud-native technologies and deep learning algorithms combined with chip design expertise. Explore the significant challenges of managing massive video datasets and numerous small files while maintaining high-performance training across over 2000 GPUs on RDMA infrastructure. Understand how to quickly identify various failure types and diagnose issues in large-scale training environments. Gain insights into Horizon Robotics' strategies for managing large-scale training on Kubernetes, including the implementation of distributed data caching, network topology awareness, and job affinity scheduling to optimize 2000 GPU training jobs. Learn about effective approaches for restoring interrupted training jobs through backup machine replacement to enhance task resilience. Discover practical experiences with CNCF projects including Volcano for job scheduling, Fluid for data orchestration, and NPD (Node Problem Detector) for cluster health monitoring in production autonomous driving model training environments.

Syllabus

Fast and Furious: Practice in Horizon Robotics on Large-scale End-to-e... Chen Yangxue, & Zhihao Xu

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Fast and Furious - Practice in Horizon Robotics on Large-scale End-to-end Model Training

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.