ARMing GPUs - On the Memory Subsystem of Grace Hopper GH200

Explore the memory subsystem architecture and performance characteristics of NVIDIA's Grace Hopper GH200 superchip in this 28-minute research talk from ETH Zurich's Scalable Parallel Computing Lab. Delve into the analysis of tightly coupled heterogeneous systems where CPUs and GPUs share a unified address space, enabling transparent fine-grained access to all system memory. Learn about comprehensive characterization studies of both intra- and inter-node memory operations conducted on Quad GH200 nodes of the Swiss National Supercomputing Centre's Alps supercomputer. Discover the critical importance of strategic memory placement in heterogeneous computing environments and understand the performance tradeoffs and optimization opportunities that emerge in these advanced architectures. Gain insights into how improved communication latency and bandwidth within tightly coupled systems address the memory-bound nature of modern HPC workloads, particularly in AI and climate modeling applications.