Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

High Performance Distributed Model Training with Amazon SageMaker

AWS Events via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore high-performance distributed model training techniques in this AWS re:Invent 2024 conference session focused on Amazon SageMaker's capabilities. Discover advanced parallelization techniques, communication optimizations, and efficient checkpointing strategies for distributing training workloads across hundreds or thousands of GPUs. Learn how to effectively handle foundation models with billions or trillions of parameters that exceed single GPU capacity, while reducing model training time and costs by up to 20%. Dive deep into the infrastructure requirements for scaling distributed training and master the integration of SageMaker training capabilities to optimize the total cost of foundation model development.

Syllabus

AWS re:Invent 2024 - High performance distributed model training with Amazon SageMaker (AIM380)

Taught by

AWS Events

Reviews

Start your review of High Performance Distributed Model Training with Amazon SageMaker

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.