Pretraining on AMD MI300X using ScalarLM

This conference talk features Greg Diamos, Founder of MLCommons, sharing his experience building ScalarLM, a framework that unifies training and inference workloads for large language models on AMD MI300X GPUs. Discover how ScalarLM leverages the MI300X's high memory bandwidth and compute density to achieve superior performance. Learn about innovative memory management techniques, dynamic kernel fusion approaches, and custom CDNA3 architecture optimizations that enable efficient scaling from single-GPU deployments to multi-node clusters. Explore the challenges encountered during development, including HIP programming model adaptations and workload-specific tuning, while gaining insights into quantitative performance comparisons against existing frameworks. Valuable for researchers and engineers working to optimize LLM workloads across diverse hardware platforms. Greg brings extensive expertise as a founder of MLPerf™, the industry standard benchmark for deep learning performance, and from his work at Baidu's Silicon Valley AI Lab where he co-invented the framework for the first 1,000 CUDA GPU training cluster.