Is Your GPU Really Working Efficiently in the Data Center? N Ways to Improve GPU Usage
CNCF [Cloud Native Computing Foundation] via YouTube
The Private Equity Associate Certification
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore strategies to optimize GPU efficiency in data centers through this informative conference talk. Discover how to improve Model Flops Utilization (MFU) of AI accelerators by examining real-world production practices. Learn about training Large Language Models (LLMs) with billions of parameters on large-scale Kubernetes clusters, covering techniques such as model parallelism, switch-affinity scheduling, and checkpoint optimization. Gain insights into enhancing GPU utilization through GPU sharing technology, implementing training-inference hybrid solutions for tidal scenarios, and improving efficiency through node grouping and application matching. Understand the challenges of GPU monopolization by underutilized applications and explore methods to ensure AI devices work efficiently around the clock.
Syllabus
Is Your GPU Really Working Efficiently in the Data Center? N Ways to... - Xiao Zhang & Wu Ying Jun
Taught by
CNCF [Cloud Native Computing Foundation]