Build the Finance Skills That Lead to Promotions — Not Just Certificates
Google AI Professional Certificate - Learn AI Skills That Get You Hired
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Join this webinar to learn how to scale machine learning model training from single-GPU setups to massive distributed clusters using PyTorch and Ray. Explore the fundamentals of distributed training, starting with understanding what distributed training is and when it becomes necessary for your machine learning projects. Discover Distributed Data Parallel (DDP) techniques and advance to sophisticated methods including ZeRO-1, ZeRO-2, ZeRO-3, and Fully Sharded Data Parallel (FSDP) for optimizing memory usage and training efficiency. Get introduced to Ray, a powerful distributed computing framework, and learn how Ray Train enables seamless model training at scale. Practice implementing distributed training solutions by building a scalable model training pipeline using Ray Train and PyTorch. Gain practical insights into how Ray integrates with Anyscale to accelerate AI development workflows. Walk away with hands-on experience, a reusable project foundation, and the knowledge to implement distributed training in your own machine learning initiatives.
Syllabus
Webinar: Getting Started with Distributed Training at Scale
Taught by
Anyscale