Auto-instrumentation for GPU Performance Using eBPF
CNCF [Cloud Native Computing Foundation] via YouTube
Get 35% Off CFI Certifications - Code CFI35
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to automatically instrument GPU performance monitoring using eBPF technology in this conference talk from KubeCon + CloudNativeCon. Discover the challenges of gathering telemetry from modern AI workloads that rely on expensive GPU fleets, where manual instrumentation creates performance overhead and lacks standardized output formats for visualization tools like Prometheus. Explore how eBPF can capture CUDA calls made to GPUs, including kernel launches and memory allocations, without requiring intrusive code changes. Understand the implementation of eBPF probes that export Prometheus metrics for detailed analysis of kernel launch patterns and memory usage patterns. Examine the benefits of this approach, including minimal performance overhead and the availability of open-source implementations on GitHub, making GPU performance optimization more accessible for cloud native environments.
Syllabus
Auto-instrumentation for GPU Performance Using eBPF - Annanay Agarwal, Grafana Labs
Taught by
CNCF [Cloud Native Computing Foundation]