Build AI Apps with Azure, Copilot, and Generative AI — Microsoft Certified
Free courses from frontend to fullstack and AI
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the critical but often misunderstood aspects of error handling in high-scale distributed systems through this 22-minute conference talk from DevOpsDays Tel Aviv. Discover why short timeouts are crucial for system resilience yet dangerously prone to misuse, and learn how poorly implemented retry mechanisms can trigger catastrophic system-wide failures. Examine real-world scenarios through a detailed postmortem-style analysis of production incidents based on actual system problems. Master actionable strategies for designing intelligent retry logic, preventing service overload, and avoiding cascading failures that can bring down entire infrastructures. Gain practical wisdom, understand common pitfalls, and acquire essential system design insights that will fundamentally change how you approach implementing timeout and retry mechanisms in your applications.
Syllabus
Dancing with Failure - The Art of Timeouts & Retries, Alon Nativ
Taught by
DevOpsDays Tel Aviv