Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Kubernetes Deep Dive - Elevating ML Workload Monitoring to Art

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Why Pay Per Course When You Can Get All of Coursera for 40% Off?
10,000+ courses, Google, IBM & Meta certificates, one annual plan at 40% off. Upgrade now.
Get Full Access
Dive deep into the art of monitoring ML workloads on Kubernetes in this comprehensive conference talk. Explore strategies for optimizing AI/ML workloads, combining node health assurance with advanced monitoring techniques. Learn about AWS Neuron's integration for problem detection and the deployment of Neuron Monitor for enhanced observability. Discover how to diagnose and resolve real-world issues in AI/ML clusters using robust detection and recovery mechanisms. Gain insights on leveraging tools such as Kubernetes node problem detector, Prometheus, Grafana, and AWS CloudWatch for in-depth performance analytics. Empower yourself with the knowledge to ensure resilient and transparent Kubernetes environments for AI/ML applications.

Syllabus

Kubernetes Deep Dive: Elevating ML Workload Monitoring to Art - Ziwen Ning & Geeta Gharpure

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Kubernetes Deep Dive - Elevating ML Workload Monitoring to Art

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.