Thanos Receiver Deep Dive - Stability and Incident Management
CNCF [Cloud Native Computing Foundation] via YouTube
Power BI Fundamentals - Create visualizations and dashboards from scratch
AI, Data Science & Business Certificates from Google, IBM & Microsoft
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Dive deep into the intricacies of Thanos Receiver in this informative conference talk. Explore the challenges and solutions for tuning the stability of metric receivers in Thanos, a system known for its ability to ingest metrics via remote write from multiple sources simultaneously. Learn from real-world incidents and their impact on current approaches to running metric receivers in Kubernetes. Discover strategies for achieving a stable setup that can withstand scheduled rollouts and node restarts, and gain insights into attempts at making receivers self-healing. Examine a surprising failure mode that affected multiple hard-tenants, and understand its implications for system reliability. Gain valuable knowledge for optimizing Thanos Receiver performance and stability in cloud-native environments.
Syllabus
Thanos Receiver Deep Dive - Joel Verezhak, Open Systems
Taught by
CNCF [Cloud Native Computing Foundation]