Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This talk by Dr. Vipul Harsh explores the critical field of failure diagnosis in networked systems. Learn how Root Cause Analysis (RCA) can help mitigate significant downtime and service-level agreement violations that result in financial losses for distributed services, datacenter networks, and cloud environments. Discover the two major challenges in designing effective RCA solutions: accurately modeling system behavior through available telemetry and implementing inference algorithms that can accurately identify root causes. Dr. Harsh presents practical solutions that leverage powerful reasoning techniques to extract insights from monitoring data and discusses the importance of taking a holistic systems-level approach to failure diagnosis. The speaker, who earned his Ph.D. from UIUC and currently works as a post-doc at Conviva Networks, brings expertise from his research on networked and distributed systems, with work spanning failure diagnosis, datacenter topology, distributed monitoring, and parallel algorithms.