Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Accelerating Model Training on Ascend Chips - An Industrial System for Profiling, Analysis and Optimization

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about an industrial-scale optimization system for accelerating deep learning model training on Huawei Ascend chips through this 13-minute conference presentation from USENIX ATC '25. Discover how researchers from Nanjing University, Peng Cheng Laboratory, Huawei, Shandong University, and Peking University developed Hermes, a comprehensive system that addresses the critical challenges of optimizing training efficiency for large-scale deep learning models. Explore the three core components of their solution: a lightweight profiling approach that captures sporadic performance fluctuations during extended training sessions, a hierarchical bottleneck analysis framework that provides comprehensive and accurate identification of performance issues among numerous influencing factors, and an optimization advisor that guides the selection of effective optimization strategies. Examine real-world experimental results demonstrating significant performance improvements, including 3.05× speedup for PanGu-α, 1.91× acceleration for MobileNetV1, and 1.19× improvement for Mixture of Experts (MoE) models, all based on three years of practical experience with 135 typical optimization cases on Ascend hardware architecture.

Syllabus

USENIX ATC '25 - Accelerating Model Training on Ascend Chips: An Industrial System for Profiling...

Taught by

USENIX

Reviews

Start your review of Accelerating Model Training on Ascend Chips - An Industrial System for Profiling, Analysis and Optimization

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.