Dashboards and Dragons: Crafting SLOs to Tame the AI Platform Chaos
CNCF [Cloud Native Computing Foundation] via YouTube
Google Data Analytics, IBM AI & Meta Marketing — All in One Subscription
Launch Your Cybersecurity Career in 6 Months
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
This conference talk explores how to scale Kubernetes platforms effectively using Service Level Indicators (SLIs), Service Level Objectives (SLOs), and observability dashboards. Learn from Bloomberg engineers as they share their journey of managing multi-cluster platform complexity across cloud, on-premises, and hybrid environments. Discover practical strategies for defining meaningful metrics, designing actionable dashboards, and maintaining platform reliability at scale. The presentation offers real-life lessons and battle-tested approaches specifically focused on ensuring AI workloads run smoothly even during chaotic conditions. Gain valuable insights into platform observability design and best practices that can be applied to your own infrastructure challenges.
Syllabus
Dashboards & Dragons: Crafting SLOs To Tame the AI Platform Cha... Alexa Griffith & Ankita Chaudhari
Taught by
CNCF [Cloud Native Computing Foundation]