Power BI Fundamentals - Create visualizations and dashboards from scratch
Gain a Splash of New Skills - Coursera+ Annual Just ₹7,999
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This conference talk features Greg Diamos, Founder of MLCommons, sharing his experience building ScalarLM, a framework that unifies training and inference workloads for large language models on AMD MI300X GPUs. Discover how ScalarLM leverages the MI300X's high memory bandwidth and compute density to achieve superior performance. Learn about innovative memory management techniques, dynamic kernel fusion approaches, and custom CDNA3 architecture optimizations that enable efficient scaling from single-GPU deployments to multi-node clusters. Explore the challenges encountered during development, including HIP programming model adaptations and workload-specific tuning, while gaining insights into quantitative performance comparisons against existing frameworks. Valuable for researchers and engineers working to optimize LLM workloads across diverse hardware platforms. Greg brings extensive expertise as a founder of MLPerfâ„¢, the industry standard benchmark for deep learning performance, and from his work at Baidu's Silicon Valley AI Lab where he co-invented the framework for the first 1,000 CUDA GPU training cluster.
Syllabus
Pretraining on AMD MI300X using ScalarLM
Taught by
MLOps World: Machine Learning in Production