Handling Traffic Spikes Through Automated Prioritized Load Shedding

Learn how to implement automated prioritized load shedding systems to handle traffic spikes effectively in this 21-minute conference talk from KubeCon + CloudNativeCon. Discover Netflix's approach to preventing random call failures during high-volume traffic periods by implementing intelligent load shedding mechanisms that prioritize critical requests over low-priority ones. Explore the key considerations for determining when to engage load shedding, proper request prioritization strategies, and how to operate clusters with fewer resources while minimizing overload risks. Gain insights into Netflix's automated system that tunes, tests, and deploys customized load shedding configurations across gRPC service clusters. Understand cluster buffer reasoning, automation techniques for achieving optimal buffer levels, system utilization signals that work (and those that don't), and how these methodologies enable running both fewer and smaller clusters while maintaining service reliability.