Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to automatically instrument GPU performance monitoring using eBPF technology in this 29-minute conference talk from DevConf.CZ 2025. Discover the challenges of gathering telemetry from modern AI workloads running on expensive GPU fleets, where manual instrumentation creates performance overhead and lacks standardized output formats for visualization tools like Prometheus. Explore how eBPF can capture CUDA calls to GPUs, including kernel launches and memory allocations, without requiring intrusive code changes. Understand the implementation of eBPF probes that export Prometheus metrics for detailed analysis of kernel launch patterns and memory usage statistics. Examine the minimal overhead benefits of this approach compared to traditional monitoring methods and access the open-source implementation available on GitHub for immediate use in your GPU monitoring infrastructure.
Syllabus
Auto-instrumentation for GPU performance using eBPF - DevConf.CZ 2025
Taught by
DevConf