Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Faster Containerized LLM Serving via Knowledge Sharing

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about an innovative knowledge-sharing system for Large Language Models (LLMs) in this conference talk from KubeCon. Explore how LLMs can efficiently share their digested knowledge through KV caches, eliminating redundant document processing and significantly reducing serving delays. Discover practical implementations on Kubernetes that enable cost-effective storage solutions while maintaining performance, with particular focus on optimizing time-to-first-token metrics. Examine real-world demonstrations showcasing how storing KV caches on economical devices can enhance both financial efficiency and operational performance in containerized LLM deployments.

Syllabus

Faster Containerized LLM Serving via Knowledge Sharing - Junchen Jiang & Zhou Sun

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Faster Containerized LLM Serving via Knowledge Sharing

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.