Incremental GPU Slicing in Kubernetes Clusters - Dynamic Resource Management
CNCF [Cloud Native Computing Foundation] via YouTube
Cybersecurity: Ethical Hacking Fundamentals - Self Paced Online
Google, IBM & Microsoft Certificates — All in One Plan
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Watch a technical conference talk exploring how to implement incremental GPU slicing for large language model inference services. Learn about replacing Multi-Instance GPU managers with an open-source incremental-slicing controller to enable dynamic GPU resource allocation without requiring new APIs or device plugin modifications. Discover how GPU vendors are developing dynamic slicing capabilities that allow workloads to request fractional compute and memory units on demand, and understand the current work being done by the Kubernetes Device Management Working Group to expose these features. Gain practical insights into achieving incremental slicing in GPU clusters to optimize costs through dynamic model selection and resource utilization.
Syllabus
Incremental GPU Slicing in Action - Abhishek Malvankar & Olivier Tardieu, IBM Research
Taught by
CNCF [Cloud Native Computing Foundation]