Master AI and Machine Learning: From Neural Networks to Applications
2,000+ Free Courses with Certificates: Coding, AI, SQL, and More
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how to automatically instrument GPU performance monitoring using eBPF technology in this 29-minute conference talk from DevConf.CZ 2025. Discover the challenges of gathering telemetry from modern AI workloads running on expensive GPU fleets, where manual instrumentation creates performance overhead and lacks standardized output formats for visualization tools like Prometheus. Explore how eBPF can capture CUDA calls to GPUs, including kernel launches and memory allocations, without requiring intrusive code changes. Understand the implementation of eBPF probes that export Prometheus metrics for detailed analysis of kernel launch patterns and memory usage statistics. Examine the minimal overhead benefits of this approach compared to traditional monitoring methods and access the open-source implementation available on GitHub for immediate use in your GPU monitoring infrastructure.
Syllabus
Auto-instrumentation for GPU performance using eBPF - DevConf.CZ 2025
Taught by
DevConf