Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

SimAI - Unifying Architecture Design and Performance Tuning for Large-Scale Large Language Model Training with Scalability and Precision

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about SimAI, a unified simulator designed to precisely and efficiently simulate large language model training procedures at scale in this 11-minute conference presentation from NSDI '25. Discover how this innovative tool addresses the challenge of validating new designs, tunings, and optimizations for LLM training that requires thousands of GPUs by providing high-fidelity simulation capabilities. Explore SimAI's architecture that integrates training frameworks, kernel computation, and collective communication libraries to achieve 98.1% alignment with real-world results across various test scenarios. Understand the multi-thread acceleration techniques and lock-free global context-sharing mechanisms that enhance execution speed while maintaining precision. Examine how SimAI bridges the gap between small-scale laboratory environments and large-scale industrial deployments, providing meaningful guidelines for host designs and parameter settings that directly benefit production LLM training. Gain insights from the development team's experiences and lessons learned during SimAI's evolution, and learn about the open-source availability of this tool for the research and industry community.

Syllabus

NSDI '25 - SimAI: Unifying Architecture Design and Performance Tuning for Large-Scale Large...

Taught by

USENIX

Reviews

Start your review of SimAI - Unifying Architecture Design and Performance Tuning for Large-Scale Large Language Model Training with Scalability and Precision

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.