Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Zero Downtime ML Deployments

Conf42 via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This conference talk from Conf42 SRE 2025 explores how to achieve zero downtime in machine learning deployments. Discover the critical gap between ML engineers and SREs, understand why traditional observability approaches fall short for ML systems, and learn strategies for detecting silent failures and data drifts. Explore implementation techniques for effective ML monitoring systems, understand different types of data drifts and their operational impact, and master best practices for ML observability. The presentation covers essential tools and techniques to maintain continuous service while deploying ML models, concluding with actionable insights for maintaining reliable ML systems in production environments.

Syllabus

00:00 Introduction to Zero Downtime ML Observability
01:07 Understanding the Gap Between ML Engineers and SREs
02:43 Challenges in Traditional Observability for ML Systems
04:45 Addressing Silent Failures and Data Drifts
08:05 Implementing Effective ML Monitoring Systems
10:37 Types of Data Drifts and Their Impact
16:50 Best Practices for ML Observability
18:52 Tools and Techniques for Zero Downtime
20:31 Conclusion and Final Thoughts

Taught by

Conf42

Reviews

Start your review of Zero Downtime ML Deployments

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.