Faster Containerized LLM Serving via Knowledge Sharing
CNCF [Cloud Native Computing Foundation] via YouTube
The Investment Banker Certification
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn about an innovative knowledge-sharing system for Large Language Models (LLMs) in this conference talk from KubeCon. Explore how LLMs can efficiently share their digested knowledge through KV caches, eliminating redundant document processing and significantly reducing serving delays. Discover practical implementations on Kubernetes that enable cost-effective storage solutions while maintaining performance, with particular focus on optimizing time-to-first-token metrics. Examine real-world demonstrations showcasing how storing KV caches on economical devices can enhance both financial efficiency and operational performance in containerized LLM deployments.
Syllabus
Faster Containerized LLM Serving via Knowledge Sharing - Junchen Jiang & Zhou Sun
Taught by
CNCF [Cloud Native Computing Foundation]