Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore how AutoML is revolutionizing Site Reliability Engineering (SRE) practices at enterprise scale in this 19-minute conference talk from Conf42 SRE 2025. Learn about the fundamental challenges facing traditional operations teams and discover how modern SRE practices integrate with automated machine learning to address reliability concerns at scale. Understand why enterprises are increasingly adopting SRE methodologies and examine specific use cases where AutoML transforms operational efficiency. Dive into automated alert tuning systems and self-healing infrastructure mechanisms that reduce manual intervention and improve system reliability. Master the essential components of data acquisition and model training processes specifically designed for SRE applications. Follow a comprehensive practical implementation framework that guides organizations through adopting AutoML-driven SRE practices. Analyze key operational metrics that measure success and identify strategic next steps for scaling these implementations. Address common implementation challenges and discover proven solutions for overcoming technical and organizational barriers. Examine domain-specific applications of AutoML across different industry verticals and infrastructure types. Gain actionable insights for transforming your organization's approach to reliability engineering through intelligent automation and machine learning-driven operational practices.
Syllabus
00:00 Introduction to Auto ML in SRE
00:37 Challenges in Traditional Operations
01:36 Modern SRE and Auto ML Integration
02:23 Why Enterprises Need SRE
03:36 Use Cases of Auto ML in SRE
05:35 Automated Alert Tuning and Self-Healing Systems
07:07 Data Acquisition and Model Training
09:53 Practical Implementation Framework
11:55 Key Operational Metrics and Next Steps
13:34 Implementation Challenges and Solutions
15:53 Domain-Specific Applications of Auto ML
18:02 Conclusion
Taught by
Conf42