Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to automatically derive semantic checkers from system tests to detect silent failures in production distributed systems through this 18-minute conference talk from OSDI '25. Explore a novel approach that addresses the challenge of detecting semantic violations in distributed systems that occur without explicit errors, which traditionally requires extensive domain knowledge and manual effort to identify. Discover findings from a large-scale study on existing system test cases and understand how the T2C framework uses static and dynamic analysis to transform and generalize tests into runtime checkers. Examine the practical application of this methodology across four major distributed systems, where researchers successfully derived tens to hundreds of checkers that detected 15 out of 20 real-world silent failures while maintaining low runtime overhead. Gain insights into how this automated approach can significantly improve the reliability and correctness of production distributed systems by leveraging existing test infrastructure to create effective runtime monitoring capabilities.