Free courses from frontend to fullstack and AI
Learn Generative AI, Prompt Engineering, and LLMs for Free
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
This conference talk features Greg Diamos, Founder of MLCommons, sharing his experience building ScalarLM, a framework that unifies training and inference workloads for large language models on AMD MI300X GPUs. Discover how ScalarLM leverages the MI300X's high memory bandwidth and compute density to achieve superior performance. Learn about innovative memory management techniques, dynamic kernel fusion approaches, and custom CDNA3 architecture optimizations that enable efficient scaling from single-GPU deployments to multi-node clusters. Explore the challenges encountered during development, including HIP programming model adaptations and workload-specific tuning, while gaining insights into quantitative performance comparisons against existing frameworks. Valuable for researchers and engineers working to optimize LLM workloads across diverse hardware platforms. Greg brings extensive expertise as a founder of MLPerfâ„¢, the industry standard benchmark for deep learning performance, and from his work at Baidu's Silicon Valley AI Lab where he co-invented the framework for the first 1,000 CUDA GPU training cluster.
Syllabus
Pretraining on AMD MI300X using ScalarLM
Taught by
MLOps World: Machine Learning in Production