LLAMP - Assessing Network Latency Sensitivity Tolerance of HPC Applications with Linear Programming

Learn about LLAMP, a novel analytical toolchain that uses linear programming to assess network latency sensitivity tolerance of HPC applications without requiring specialized hardware or time-consuming network simulators. Discover how this innovative approach leverages the LogGPS model to provide software developers and network architects with crucial insights for optimizing HPC infrastructures and strategically deploying applications to minimize latency impacts. Explore the methodology behind evaluating communication-intensive MPI applications' tolerance to network latency degradation, with validation results showing high accuracy and relative prediction errors generally below 2% across applications like MILC, LULESH, and LAMMPS. Examine a comprehensive case study of the ICON weather and climate model that demonstrates LLAMP's broad applicability in evaluating collective algorithms and network topologies. Understand the critical importance of network latency assessment as high-bandwidth networks driven by AI workloads in data centers and HPC clusters have unintentionally increased latency issues, and learn how this toolchain addresses the significant differences in latency tolerance exhibited by large-scale MPI applications.