Scaling Large Language Models - Getting Started with Large-Scale Parallel Training of LLMs

Learn to implement large-scale parallel training strategies for billion-parameter language models in this hands-on workshop. Explore fundamental parallelization techniques including data, tensor, and pipeline parallelism, and discover how to compose them effectively for training massive LLMs when single or few GPUs lack sufficient memory capacity. Master strategic data and parameter sharding across devices, efficient collective communication operations for synchronizing gradients and activations, and recent LLM-specific techniques such as context parallelism. Engage in live coding exercises and practical implementations to build each strategy from first principles, understand their trade-offs, and optimize communication patterns and memory usage for maximum training throughput across distributed hardware. Gain insights from an independent machine learning researcher with extensive experience advising startups and large companies, whose research has been cited nearly 2000 times and won awards including best paper at NeurIPS 2022.