Build AI Apps with Azure, Copilot, and Generative AI — Microsoft Certified
NY State-Licensed Certificates in Design, Coding & AI — Online
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn about an industrial-scale optimization system for accelerating deep learning model training on Huawei Ascend chips through this 13-minute conference presentation from USENIX ATC '25. Discover how researchers from Nanjing University, Peng Cheng Laboratory, Huawei, Shandong University, and Peking University developed Hermes, a comprehensive system that addresses the critical challenges of optimizing training efficiency for large-scale deep learning models. Explore the three core components of their solution: a lightweight profiling approach that captures sporadic performance fluctuations during extended training sessions, a hierarchical bottleneck analysis framework that provides comprehensive and accurate identification of performance issues among numerous influencing factors, and an optimization advisor that guides the selection of effective optimization strategies. Examine real-world experimental results demonstrating significant performance improvements, including 3.05× speedup for PanGu-α, 1.91× acceleration for MobileNetV1, and 1.19× improvement for Mixture of Experts (MoE) models, all based on three years of practical experience with 135 typical optimization cases on Ascend hardware architecture.
Syllabus
USENIX ATC '25 - Accelerating Model Training on Ascend Chips: An Industrial System for Profiling...
Taught by
USENIX