Watching the Watchers: How We Do Continuous Reliability at Grafana Labs
CNCF [Cloud Native Computing Foundation] via YouTube
Our career paths help you become job ready faster
Gain a Splash of New Skills - Coursera+ Annual Nearly 45% Off
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn about continuous reliability practices at Grafana Labs in this technical conference talk that reveals real-world challenges and solutions in maintaining observability tools. Explore how the company solved a costly mystery exceeding $100,000, successfully scaled Mimir clusters to handle 1.3 billion time series metrics, and optimized Loki clusters to process 324 TB of daily logs. Gain insights into the internal monitoring dashboards used for Grafana Cloud and discover valuable lessons learned from production incidents and system failures. Through candid discussions of past challenges and current improvements, understand the practical aspects of implementing observability at scale and maintaining reliability in complex microservices-based systems.
Syllabus
Watching the Watchers: How We Do Continuous Reliability at Grafana Labs - Nicole van der Hoeven
Taught by
CNCF [Cloud Native Computing Foundation]