Unlocking GPU Performance with CUDA Tile

Explore tile-based programming on GPUs through this 32-minute live discussion with Stephen Jones, one of CUDA's inventors and leading experts. Discover how CUDA Tile represents one of the most significant innovations in CUDA since its creation, abstracting special-purpose hardware like tensor cores to help you write future-compatible code for NVIDIA GPU architectures. Learn how tile programming simplifies kernel development compared to the traditional SIMT (single-instruction multiple-thread) model, and get introduced to cuTile Python for writing tile kernels in Python by dividing arrays into tiles for parallel operations while abstracting low-level compiler and runtime tasks including block-level parallelism, memory movement, and hardware feature usage. Understand how to apply cuTile Python for data-parallel workloads, particularly in AI and ML applications, and participate in a live Q&A session focused on the new tile-programming paradigm and unlocking advanced CUDA capabilities.