GPU Performance or Ease of Use - Why Not Both? Accelerating Scientific Computing with nvmath-python

Explore how to achieve both GPU performance and ease of use in Python through this 25-minute conference talk from EuroPython 2025. Learn about the nvmath-python open-source library that bridges the gap between high-performance GPU computing and Python's accessibility, eliminating the traditional trade-off between performance and usability. Discover how this library provides Python developers with the same flexibility and control previously available only to C++ developers, including integration with Numba and NVIDIA's runtime compiler stack (nvrtc and nvjitlink). Understand the design principles behind nvmath-python and see practical demonstrations of how it accelerates scientific computing workloads while seamlessly integrating with popular Python libraries like NumPy, CuPy, SciPy, and scikit-learn. Gain insights into advanced GPU optimization techniques including math libraries for scientific computing, device kernel fusion, JIT compilation, and LTO, all accessible through Python's familiar syntax. Examine the library's support for both CPU and GPU execution spaces, GEMMs and FFTs with prolog/epilog fusion, narrow precision types, random number generation, and comprehensive CUDA math library capabilities. Focus particularly on the newly released sparsity support and distributed computing capabilities that make this tool valuable for Python developers, scientific computing practitioners, AI-oriented users, and anyone seeking peak GPU kernel performance without sacrificing development ease.