Get 20% off all career paths from fullstack to AI
AI, Data Science & Cloud Certificates from Google, IBM & Meta
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the intricacies of GPU Cloud infrastructure optimization in this technical conference talk that delves deep into hardware-level considerations for AI systems. Learn how to fine-tune various machine learning models using an H100 Cluster, with detailed analysis of critical components like POD Scheduler, Device Plugin, GPU/NUMA topology, and ROCE/NCCL Stack. Gain valuable insights from first-hand experimental results demonstrating the relationship between model performance and device operator configurations in nodes, focusing particularly on CNN, RNN, and Transformer models from MLPerf. Master the often-overlooked hardware aspects of AI infrastructure that can significantly impact distributed machine learning performance and efficiency.
Syllabus
Optimize Your AI Cloud Infrastructure: A Hardware Perspective - Liang Yan, CoreWeave
Taught by
Linux Foundation