Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

The Path to Zero Flap - Reinventing Optical Reliability for Scalable AI Clusters

Open Compute Project via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to eliminate optical link failures that disrupt AI training in large-scale GPU clusters through a comprehensive conference presentation from the Open Compute Project. Discover the critical reliability challenges facing AI networks as they scale to millions of GPUs, where commodity pluggable optics suffer up to 30,000 FITs and single unstable optical links can trigger disruptive flaps that derail entire training runs. Explore the innovative "Zero Flap" optical solution developed by Credo and Oracle that combines host- and line-side telemetry, in-band messaging, event logging, and predictive analytics to prevent unplanned optical failures. Understand the technical architecture being contributed to OCP's new optical reliability workstream, examine the solution's tradeoffs, and review initial results demonstrating the path toward more stable, large-scale AI infrastructure. Gain insights from Oracle's Senior Principal Network Engineer and Credo's President and CEO as they present their joint approach to solving one of the most critical reliability challenges in modern AI data centers.

Syllabus

The Path to Zero Flap Reinventing Optical Reliability for Scalable AI Clusters presented by C

Taught by

Open Compute Project

Reviews

Start your review of The Path to Zero Flap - Reinventing Optical Reliability for Scalable AI Clusters

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.