Future-Proof Your Career: AI Manager Masterclass
Finance Certifications Goldman Sachs & Amazon Teams Trust
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore high-performance distributed model training techniques in this AWS re:Invent 2024 conference session focused on Amazon SageMaker's capabilities. Discover advanced parallelization techniques, communication optimizations, and efficient checkpointing strategies for distributing training workloads across hundreds or thousands of GPUs. Learn how to effectively handle foundation models with billions or trillions of parameters that exceed single GPU capacity, while reducing model training time and costs by up to 20%. Dive deep into the infrastructure requirements for scaling distributed training and master the integration of SageMaker training capabilities to optimize the total cost of foundation model development.
Syllabus
AWS re:Invent 2024 - High performance distributed model training with Amazon SageMaker (AIM380)
Taught by
AWS Events