Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about FiDe, a groundbreaking failure detection system that revolutionizes crash failure detection in datacenter environments through this 15-minute conference presentation from USENIX ATC '25. Discover how this fully reliable failure detector overcomes the limitations of traditional timeout-based approaches that suffer from unpredictable interaction times caused by network and processor resource contention. Explore the innovative ground-up design that enables FiDe to detect remote process crashes within less than 30 microseconds—7.2 times faster than current state-of-the-art solutions—while maintaining extremely high reliability through stable end-to-end process interactions. Examine how FiDe's dramatically reduced worst-case crash detection times enable new classes of algorithms that enhance coordination services even during failure-free operations. Understand the development and implementation of two novel FiDe-based consensus protocols and their integration into key-value stores and synchronization services, demonstrating performance improvements of up to 2.23× in throughput and latency reductions down to 0.46×. Gain insights into how this advancement addresses fundamental challenges in distributed fault-tolerant services and applications, particularly benefiting modern microsecond-scale services that require rapid failure detection for optimal performance.