Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

Gotta Cache 'em All - Scaling AI Workloads With Model Caching in a Hybrid Cloud

Linux Foundation via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore how to scale AI workloads efficiently through model caching in hybrid cloud environments in this 31-minute conference talk from the Linux Foundation. Learn about the challenges of rapidly scaling inference services during peak hours and optimizing GPU utilization for fine-tuning workloads as AI models grow exponentially in size and complexity. Discover Bloomberg's Data Science Platform team's implementation of a "Model Cache" feature in the open source KServe project, designed for caching large models on GPUs across multi-cloud and multi-cluster cloud-native environments. Understand how model caching reduces load times during auto-scaling of services, improves resource utilization, and boosts data scientists' productivity. Gain insights into Bloomberg's integration of KServe's Model Cache into their AI workloads and their development of an API built on top of Karmada for managing cache federation. Learn about the significant impact of enabling model caching and discover practical strategies for adopting this feature in your own AI infrastructure environment, making this essential viewing for AI infrastructure engineers working with large-scale machine learning deployments.

Syllabus

Gotta Cache 'em All: Scaling AI Workloads With Model Caching in a Hybrid... Rituraj Singh & Jin Dong

Taught by

Linux Foundation

Reviews

Start your review of Gotta Cache 'em All - Scaling AI Workloads With Model Caching in a Hybrid Cloud

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.