Accelerating Scientific Applications with Automatic BLAS GPU Offload on NVIDIA Grace-Hopper

Learn about an innovative automatic GPU offloading technique for scientific applications in this technical presentation from Dr. Junjie Li at Texas Advanced Computing Center. Explore SCILIB-Accel, a groundbreaking tool that enables automatic BLAS offload on NVIDIA Grace-Hopper platforms without requiring code modifications. Discover how the tool leverages unified memory architectures and cache-coherent NVLink C2C interconnect to eliminate traditional GPU programming bottlenecks. Understand the GPU First-Use data movement policy, inspired by OpenMP First-Touch, and see how it minimizes CPU-GPU data transfers in scientific computing applications. Examine real-world performance results, including a 3x speedup achieved in quantum physics codes like the LSMS method in the MuST suite when comparing Grace-Hopper (1 CPU + 1 GPU) to Grace-Grace (dual CPU) configurations. Gain insights into how this pioneering tool is making high-performance automatic BLAS offload practical for scientific applications.