Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

From Scaling to Observability - Solving Key Challenges for Distributed ML with Ray

Data Council via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This 28-minute conference talk from Data Council explores the observability challenges encountered when scaling distributed machine learning training across thousands of nodes using Ray. Discover insights from Nikita Vemuri, Software Engineer at Anyscale, who shares practical experiences in tracking vast amounts of system data in multi-node environments. Learn effective strategies for correlating information across clusters and designing observability stacks that balance providing relevant insights with maintaining data privacy. Valuable for professionals running large-scale ML workloads or building monitoring systems for distributed training environments.

Syllabus

From Scaling to Observability Solving Key Challenges for Distributed ML with Ray

Taught by

Data Council

Reviews

Start your review of From Scaling to Observability - Solving Key Challenges for Distributed ML with Ray

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.