Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore how tracing uncovers half-truths in Slack's CI infrastructure in this 23-minute conference talk from Strange Loop. Discover why traditional monitoring tools like logs and metrics were insufficient for debugging CI system failures. Learn how traces provided critical capabilities for understanding fault occurrences in interconnected systems such as GHE, Checkpoint, and Cypress. Gain insights into shared tooling for high-dimensionality event traces using SlackTrace and SpanEvents, and how they increased velocity in diagnosing code and debugging complex system interactions. Follow the journey from early incidents that motivated investment in internal tooling to improvements in performance and resiliency across Slack's infrastructure. Delve into topics including developer productivity, span event structure, shared dimensions, use cases, fuzzy service boundaries, incident command systems, and testing changes.