OTel-y Oops: Learning From Our Observability Blunders
CNCF [Cloud Native Computing Foundation] via YouTube
Get 50% Off Udacity Nanodegrees — Code CC50
AI Engineer - Learn how to integrate AI into software applications
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
This conference talk explores how Akamai Technologies accidentally created observability chaos by sending 300 million time series to their stack, causing cascading failures in their OpenTelemetry agents and creating significant challenges for their Observability SREs. Learn from their key mistakes and discover best practices for scaling observability in complex systems, avoiding common pitfalls, and building resilient monitoring pipelines using OpenTelemetry, VictoriaMetrics, Loki, and Tempo. Gain actionable insights on how to optimize your observability strategy without overwhelming your systems or team, and understand how over-ambitious instrumentation combined with lack of foresight can lead to major problems in production environments.
Syllabus
OTel-y Oops: Learning From Our Observability Blunders - Joe Stephenson & Rodney Karemba
Taught by
CNCF [Cloud Native Computing Foundation]