Retrofitting OTEL Collectors and Prometheus - How To Overcome Scale and Design Limitations
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn how to overcome scale and design limitations when implementing OpenTelemetry Collectors and Prometheus in large-scale enterprise environments through practical retrofitting solutions. Explore real-world challenges faced at eBay, including managing span traffic across multiple Kubernetes clusters and regions, where service graph connectors require global span routing to single collectors, demanding substantial memory resources for various trace durations. Discover how Prometheus lacks native support for long-term retention of exemplars, creating operational constraints. Examine innovative approaches using ClickHouse-based internal trace stores to sustainably provide spans to service graph connectors for metrics generation, and implement ClickHouse solutions for long-term exemplar retention. Gain insights into practical workarounds that avoid the common pitfalls of extensive customization or complete project abandonment when standard OTEL and Prometheus configurations don't align with organizational design requirements or scaling needs.
Syllabus
Retrofitting OTEL Collectors & Prometheus - How To Overcome Scale... Vijay Samuel & Sandeep Raveesh
Taught by
CNCF [Cloud Native Computing Foundation]