Fail Open, Fail Fast - Improving Envoy Resilience in Latency-Critical Systems
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to improve Envoy proxy resilience in latency-critical distributed storage systems through this 15-minute conference talk from the Cloud Native Computing Foundation. Discover practical approaches to addressing network latency issues that cause dropped requests, error messages, and inconsistent logs in large-scale systems. Explore the identification and resolution of intermittent "no such bucket" errors that emerge during peak loads due to rate-limiting service timeouts. Understand the tuning strategies for Envoy's fail-open versus fail-close behavior and examine architectural trade-offs involved in co-locating critical services to reduce cross-load balancer latency. Gain insights into transitioning from K6-based load testing to Envoy's Nighthawk tool for simulating high traffic conditions and identifying reliability bottlenecks. Master techniques for tagging requests using gRPC metadata to enhance observability and learn methods for implementing changes without disrupting production traffic. Acquire practical debugging tips, understand common pitfalls, and receive actionable insights for maintaining system reliability in demanding environments.
Syllabus
Fail Open, Fail Fast: Improving Envoy Resilience in Latency-Critical Systems
Taught by
CNCF [Cloud Native Computing Foundation]