Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to leverage NVIDIA's cuTile Python library for advanced GPU programming and optimization in this technical tutorial. Explore the revolutionary CUDA Tile virtual instruction set introduced in CUDA 13.1, which represents the most significant advancement in the CUDA platform since its inception in 2006. Master tile-based parallel programming techniques that allow you to write algorithms at a higher abstraction level while automatically handling the complexities of specialized hardware like tensor cores. Discover how cuTile Python simplifies GPU kernel development by providing intuitive APIs for tile operations, memory management, and parallel computation patterns. Gain hands-on experience with practical examples demonstrating kernel tiling strategies, data layout optimizations, and performance tuning techniques essential for high-performance computing applications in data science and machine learning workloads.
Syllabus
Deep Dive: How to Use cuTile Python
Taught by
NVIDIA Developer