Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

This conference talk presents a comprehensive analysis of GPU-to-GPU communication in modern supercomputer interconnects, based on research presented at the International Conference for High Performance Computing, Networking, Storage, and Analysis 2024 (SC'24). Discover how multi-GPU nodes in exascale supercomputers are connected through dedicated networks with bandwidths reaching several terabits per second. Explore the performance characteristics of three major supercomputers—Alps, Leonardo, and LUMI—each featuring unique architectures and designs. Learn about the challenges in maximizing system efficiency due to varying technologies, design options, and software layers. Examine detailed performance evaluations of both intra-node and inter-node interconnects on systems scaling up to 4,096 GPUs. Gain practical insights for researchers, system architects, and software developers working with multi-GPU supercomputing, including untapped bandwidth potential and optimization opportunities from network to software levels. The presentation, delivered by Daniele De Sensi from the Scalable Parallel Computing Lab at ETH Zurich, condenses findings from research conducted by a collaborative team of experts in high-performance computing.