Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Build AI Apps with Azure, Copilot, and Generative AI — Microsoft Certified
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore a 15-minute conference talk from USENIX NSDI '23 that introduces Bamboo, an innovative distributed system designed to significantly reduce the costs of training large Deep Neural Network (DNN) models. Learn how Bamboo leverages preemptible instances and introduces redundant computations into the training pipeline to achieve resilience and efficiency in the face of frequent preemptions. Discover how this approach outperforms traditional checkpointing techniques, resulting in 3.7× improvement in training throughput and 2.4× reduction in costs compared to using on-demand instances. Gain insights into the challenges of training increasingly large DNN models and the novel solutions proposed to make this process more affordable for organizations and research labs of all sizes.
Syllabus
NSDI '23 - Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs
Taught by
USENIX