Faster Containerized LLM Serving via Knowledge Sharing
CNCF [Cloud Native Computing Foundation] via YouTube
Power BI Fundamentals - Create visualizations and dashboards from scratch
All Coursera Certificates 40% Off
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about an innovative knowledge-sharing system for Large Language Models (LLMs) in this conference talk from KubeCon. Explore how LLMs can efficiently share their digested knowledge through KV caches, eliminating redundant document processing and significantly reducing serving delays. Discover practical implementations on Kubernetes that enable cost-effective storage solutions while maintaining performance, with particular focus on optimizing time-to-first-token metrics. Examine real-world demonstrations showcasing how storing KV caches on economical devices can enhance both financial efficiency and operational performance in containerized LLM deployments.
Syllabus
Faster Containerized LLM Serving via Knowledge Sharing - Junchen Jiang & Zhou Sun
Taught by
CNCF [Cloud Native Computing Foundation]