Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the latest advancements in Ray observability through this 32-minute conference talk from Ray Summit 2025, where Anyscale engineers Nikita Vemuri and Mengjin Yan demonstrate cutting-edge tools designed to help developers debug, optimize, and understand distributed AI workloads with unprecedented clarity. Learn why contextual observability has become critical as Ray establishes itself as the standard framework for distributed AI applications, where users increasingly encounter the inherent complexity of large-scale, multi-node systems requiring purpose-built tooling to diagnose resource bottlenecks, task failures, and memory pressure issues. Discover major new improvements to observability on Anyscale, including scalable and persistent dashboard views for Ray Core, Ray Train, and Ray Data, while understanding the secure architecture behind these dashboards that keeps all data within your cloud environment. Get introduced to the open-source Ray Export API, which enables you to persist, analyze, and integrate the same events displayed in Ray dashboards into your own monitoring or analytics systems. Watch a live demonstration showing how to use these tools to debug real-world issues ranging from out-of-memory errors to inefficient resource utilization, making Ray workloads more transparent, reliable, and easier to optimize than ever before.
Syllabus
Ray Observability Upgrades: Debug, Optimize, and Scale Faster | Ray Summit 2025
Taught by
Anyscale