Fix First, Investigate Later - When an eBPF Rollout Brought Down Our Network
CNCF [Cloud Native Computing Foundation] via YouTube
Power BI Fundamentals - Create visualizations and dashboards from scratch
Get 35% Off CFI Certifications - Code CFI35
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn how to handle critical production network outages through a real-world incident where an unexpected eBPF monitoring tool deployment caused severe packet loss and network performance degradation. Discover the emergency response strategies used when production networks suddenly dropped from 800MB/s to 250MB/s throughput due to a silently deployed Retina DaemonSet running eBPF programs. Explore the technical challenges of dealing with self-healing infrastructure that prevented standard removal procedures and understand the creative solution of building a mutation webhook to intercept and neutralize the problematic deployment. Gain insights into post-incident investigation techniques to understand how eBPF observability tools can impact network performance and learn best practices for managing unexpected infrastructure changes in cloud-native environments.
Syllabus
Fix First, Investigate Later: When an eBPF Rollout Brought Down Our... Zain Malik & Grzegorz Głąb
Taught by
CNCF [Cloud Native Computing Foundation]