Faster Containerized LLM Serving via Knowledge Sharing
CNCF [Cloud Native Computing Foundation] via YouTube
Get 20% off all career paths from fullstack to AI
Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn about an innovative knowledge-sharing system for Large Language Models (LLMs) in this conference talk from KubeCon. Explore how LLMs can efficiently share their digested knowledge through KV caches, eliminating redundant document processing and significantly reducing serving delays. Discover practical implementations on Kubernetes that enable cost-effective storage solutions while maintaining performance, with particular focus on optimizing time-to-first-token metrics. Examine real-world demonstrations showcasing how storing KV caches on economical devices can enhance both financial efficiency and operational performance in containerized LLM deployments.
Syllabus
Faster Containerized LLM Serving via Knowledge Sharing - Junchen Jiang & Zhou Sun
Taught by
CNCF [Cloud Native Computing Foundation]