OTel-y Oops: Learning From Our Observability Blunders
CNCF [Cloud Native Computing Foundation] via YouTube
Get 20% off all career paths from fullstack to AI
Google, IBM & Microsoft Certificates — All in One Plan
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
This conference talk explores how Akamai Technologies accidentally created observability chaos by sending 300 million time series to their stack, causing cascading failures in their OpenTelemetry agents and creating significant challenges for their Observability SREs. Learn from their key mistakes and discover best practices for scaling observability in complex systems, avoiding common pitfalls, and building resilient monitoring pipelines using OpenTelemetry, VictoriaMetrics, Loki, and Tempo. Gain actionable insights on how to optimize your observability strategy without overwhelming your systems or team, and understand how over-ambitious instrumentation combined with lack of foresight can lead to major problems in production environments.
Syllabus
OTel-y Oops: Learning From Our Observability Blunders - Joe Stephenson & Rodney Karemba
Taught by
CNCF [Cloud Native Computing Foundation]