Master Finance Tools - 35% Off CFI (Code CFI35)
AI Engineer - Learn how to integrate AI into software applications
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore how to leverage eBPF technology for automatic GPU performance monitoring in this 21-minute conference talk from SREcon25 EMEA. Learn to capture CUDA calls made to GPUs, including kernel launches and memory allocations, without requiring intrusive instrumentation or imposing significant overhead on running applications. Discover how to export Prometheus metrics from eBPF probes to enable detailed analysis of kernel launch patterns and associated memory usage. Understand the key advantage of this approach: the ability to enable or disable instrumentation dynamically while GPU applications are running, making it particularly valuable for AI/ML training monitoring and profiling scenarios where you can start monitoring after training has already begun. Gain insights into implementing minimal-overhead GPU performance monitoring solutions that can be toggled on-demand for production environments.
Syllabus
SREcon25 Europe/Middle East/Africa - Auto-Instrumentation for GPU Performance using eBPF
Taught by
USENIX