Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

AI/ML Networking Challenges - The Fast and the Finicky

Linux Foundation via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore critical networking challenges that impact AI/ML job performance and reliability in this 21-minute conference talk from the Linux Foundation's Open Source Summit. Discover how AI/ML workloads, like high-performance race cars, require optimized network fabrics such as RoCE and InfiniBand to achieve peak efficiency and speed. Learn about key networking issues including NIC flapping that reduces reliability and limited visibility at the queue pair level that hampers troubleshooting. Understand how these challenges act like debris on a race track, causing slowdowns, disruptions, and costly rollbacks to previous checkpoints that directly impact ROI. Watch practical demonstrations showing the real-world effects of networking problems on AI/ML job completion times and overall system performance. Gain essential knowledge about network fabric optimization and monitoring techniques that AI/ML engineers need to ensure their workloads run at full speed, drawing parallels to how pit crews maintain race cars for optimal performance.

Syllabus

AI/ML Networking Challenges: The Fast and the Finicky! - Lerna Ekmekcioglu, Clockwork Systems

Taught by

Linux Foundation

Reviews

Start your review of AI/ML Networking Challenges - The Fast and the Finicky

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.