AI, CERN, and the Quest for GPU Custody: How CERN Leverages DRA for Efficient GPU Sharing

This conference talk explores how CERN utilizes Dynamic Resource Allocation (DRA) for efficient GPU sharing in Kubernetes environments. Learn about the current state of DRA, implementation updates, and feature additions as presenters Diana Gaponcic from CERN and Jan-Philip Gehrcke from NVIDIA guide you through getting started with DRA and explain its relevance for engineers looking to enhance GPU offerings on their clusters. Discover configuration techniques for time-slicing, MPS, and MIG, along with building custom layouts. The presentation demonstrates CERN's practical application of DRA for colocating machine learning workloads on the same GPU, including how to select appropriate sharing mechanisms based on performance requirements. Examine comprehensive training and inference benchmarking results, understand how DRA creates a flexible and user-friendly system, and explore the tradeoffs of GPU sharing while learning how this approach can ultimately conserve valuable resources.