Google AI Professional Certificate - Learn AI Skills That Get You Hired
35% Off Finance Skills That Get You Hired - Code CFI35
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about CrossPipe, a framework for optimizing large language model training across geographically distributed datacenters in this 19-minute conference presentation from USENIX ATC '25. Discover how researchers from ETH Zurich address the challenge of training LLMs that require resources exceeding a single datacenter by explicitly modeling and mitigating network latency and bandwidth limitations. Explore the unified analysis and optimization approach that incorporates both pipeline parallelism and overlapping data parallelism communication opportunities. Understand how CrossPipe generates optimized pipeline schedules using solver-based optimal or fast near-optimal greedy algorithms, built on a flexible execution engine that separates scheduling logic from communication details. Examine evaluation results showing up to 33.6% reduction in training time compared to traditional pipeline schedules under identical memory constraints, and learn how the framework maintains strong performance despite communication delays while approaching the efficiency of idealized schedules without delays, offering improved scalability and resource utilization in high-latency or limited-bandwidth environments.
Syllabus
USENIX ATC '25 - CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training
Taught by
USENIX