LLAMP - Assessing Network Latency Sensitivity Tolerance of HPC Applications with Linear Programming
Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about LLAMP, a novel analytical toolchain that uses linear programming to assess network latency sensitivity tolerance of HPC applications without requiring specialized hardware or time-consuming network simulators. Discover how this innovative approach leverages the LogGPS model to provide software developers and network architects with crucial insights for optimizing HPC infrastructures and strategically deploying applications to minimize latency impacts. Explore the methodology behind evaluating communication-intensive MPI applications' tolerance to network latency degradation, with validation results showing high accuracy and relative prediction errors generally below 2% across applications like MILC, LULESH, and LAMMPS. Examine a comprehensive case study of the ICON weather and climate model that demonstrates LLAMP's broad applicability in evaluating collective algorithms and network topologies. Understand the critical importance of network latency assessment as high-bandwidth networks driven by AI workloads in data centers and HPC clusters have unintentionally increased latency issues, and learn how this toolchain addresses the significant differences in latency tolerance exhibited by large-scale MPI applications.
Syllabus
00:00 Introduction
05:33 LLAMP Toolchain
10:20 Network Latency Sensitivity
14:47 Linear Programming
21:15 Evaluation
24:24 Conclusion
25:23 Q&A
Taught by
Scalable Parallel Computing Lab, SPCL @ ETH Zurich