AI Adoption - Drive Business Value and Organizational Impact
The Most Addictive Python and SQL Courses
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about SimAI, a unified simulator designed to precisely and efficiently simulate large language model training procedures at scale in this 11-minute conference presentation from NSDI '25. Discover how this innovative tool addresses the challenge of validating new designs, tunings, and optimizations for LLM training that requires thousands of GPUs by providing high-fidelity simulation capabilities. Explore SimAI's architecture that integrates training frameworks, kernel computation, and collective communication libraries to achieve 98.1% alignment with real-world results across various test scenarios. Understand the multi-thread acceleration techniques and lock-free global context-sharing mechanisms that enhance execution speed while maintaining precision. Examine how SimAI bridges the gap between small-scale laboratory environments and large-scale industrial deployments, providing meaningful guidelines for host designs and parameter settings that directly benefit production LLM training. Gain insights from the development team's experiences and lessons learned during SimAI's evolution, and learn about the open-source availability of this tool for the research and industry community.
Syllabus
NSDI '25 - SimAI: Unifying Architecture Design and Performance Tuning for Large-Scale Large...
Taught by
USENIX