Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Oneiros - KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving

Centre for Networked Intelligence, IISc via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Attend this technical seminar to explore Oneiros, an innovative approach to optimizing KV cache memory management for multi-tenant Large Language Model (LLM) serving systems. Learn how this novel solution addresses the memory bottleneck in LLM inference by introducing parameter remapping techniques that repurpose model parameter memory for KV cache storage, eliminating the need for costly CPU-GPU memory swapping. Discover the key insight that model parameters remain constant during runtime while KV caches update dynamically, and understand how Oneiros leverages this observation to achieve significant performance improvements. Examine the technical implementation details of parameter remapping in multi-tenant environments where inactive model memory can be aggressively reclaimed for active KV cache needs. Analyze comprehensive performance benchmarks demonstrating 44.8%-82.5% reduction in tail time-between-token latency, 20.7%-99.3% improvement in tail time-to-first-token latency, and 6.6%-86.7% higher throughput compared to existing vLLM solutions. Explore how modern hardware architectures like the NVIDIA Grace Hopper Superchip enable high CPU-GPU bandwidth utilization for optimal parameter remapping efficiency. Gain insights into the broader implications for datacenter power management, efficient memory allocation strategies, and workload characterization in AI systems from Dr. Ruihao Li, Research Scientist at Meta's AI and Systems Co-Design group.

Syllabus

Time: 7:00 PM - PM IST

Taught by

Centre for Networked Intelligence, IISc

Reviews

Start your review of Oneiros - KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.