Scaling Large Language Models - Getting Started with Large-Scale Parallel Training of LLMs
MLOps World: Machine Learning in Production via YouTube
Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
NY State-Licensed Certificates in Design, Coding & AI — Online
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn to implement large-scale parallel training strategies for billion-parameter language models in this hands-on workshop. Explore fundamental parallelization techniques including data, tensor, and pipeline parallelism, and discover how to compose them effectively for training massive LLMs when single or few GPUs lack sufficient memory capacity. Master strategic data and parameter sharding across devices, efficient collective communication operations for synchronizing gradients and activations, and recent LLM-specific techniques such as context parallelism. Engage in live coding exercises and practical implementations to build each strategy from first principles, understand their trade-offs, and optimize communication patterns and memory usage for maximum training throughput across distributed hardware. Gain insights from an independent machine learning researcher with extensive experience advising startups and large companies, whose research has been cited nearly 2000 times and won awards including best paper at NeurIPS 2022.
Syllabus
Scaling Large Language Models: Getting Started with Large-Scale Parallel Training of LLMs
Taught by
MLOps World: Machine Learning in Production