Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

KVCache Cache in the Wild - Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the first systematic characterization of KV cache workload patterns from a leading large language model service provider in this 16-minute conference talk from USENIX ATC '25. Discover how researchers from Shanghai Jiao Tong University and Alibaba Group analyzed real-world LLM serving workloads to understand KV cache performance, revealing that KV cache reuses are skewed across requests with single-turn and multi-turn requests showing equal importance for reuse patterns. Learn about the diverse reuse time and probability patterns across different request categories, and understand how the overall cache size requirements for optimal hit ratios remain moderate in practice. Examine the workload-aware cache eviction policy proposed by the research team that demonstrates improved serving performance under real-world traces, particularly when operating with limited cache capacity constraints. Gain insights into system design decisions for LLM serving infrastructure and understand how workload-dependent cache eviction policies can optimize throughput and latency for cloud-scale language model deployments.

Syllabus

USENIX ATC '25 - KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a...

Taught by

USENIX

Reviews

Start your review of KVCache Cache in the Wild - Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.