Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

PyTorch Data Loader Tuning and GPU Cross-Architecture Optimizations: CUDA and AMD

Generative AI on AWS via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This webinar features two technical talks focused on AI performance optimization. Begin with an introduction by Chris Fregly, followed by Chaim Rand's presentation on "Solving Bottlenecks with Data Input Pipeline with PyTorch Profiler and TensorBoard," which explores how to identify and resolve performance bottlenecks in PyTorch data pipelines using profiling tools. Then, dive into Quentin Anthony's talk on "How to Write Cross-Architecture Kernels: NVIDIA CUDA and AMD ROCm," where he explains the techniques for developing GPU kernels that work efficiently across both NVIDIA and AMD hardware platforms, particularly relevant for deploying modern AI models like DeepSeek-R1 and Llama-4. Learn about kernel sizing and cross-architecture optimization strategies for different SIMD hardware implementations. Access additional resources including a GitHub repository, related O'Reilly book, and free Generative AI course materials to further enhance your understanding of AI performance engineering.

Syllabus

PyTorch Data Loader Tuning + GPU Cross-Architecture Optimizations: CUDA and AMD

Taught by

Generative AI on AWS

Reviews

Start your review of PyTorch Data Loader Tuning and GPU Cross-Architecture Optimizations: CUDA and AMD

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.