Scaling Large Language Models - Getting Started with Large-Scale Parallel Training of LLMs
MLOps World: Machine Learning in Production via YouTube
AI, Data Science & Cloud Certificates from Google, IBM & Meta
Get 20% off all career paths from fullstack to AI
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn to implement large-scale parallel training strategies for billion-parameter language models in this hands-on workshop. Explore fundamental parallelization techniques including data, tensor, and pipeline parallelism, and discover how to compose them effectively for training massive LLMs when single or few GPUs lack sufficient memory capacity. Master strategic data and parameter sharding across devices, efficient collective communication operations for synchronizing gradients and activations, and recent LLM-specific techniques such as context parallelism. Engage in live coding exercises and practical implementations to build each strategy from first principles, understand their trade-offs, and optimize communication patterns and memory usage for maximum training throughput across distributed hardware. Gain insights from an independent machine learning researcher with extensive experience advising startups and large companies, whose research has been cited nearly 2000 times and won awards including best paper at NeurIPS 2022.
Syllabus
Scaling Large Language Models: Getting Started with Large-Scale Parallel Training of LLMs
Taught by
MLOps World: Machine Learning in Production