Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Making Long-context LLM Inference 10x Faster and 10x Cheaper Through Knowledge Sharing

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn about an innovative knowledge-sharing system for Large Language Models (LLMs) in this technical conference talk from CNCF. Explore how LLMs can efficiently share digested knowledge through KV caches, eliminating the need for multiple document processing. Discover implementation techniques on Kubernetes that enable storing KV caches on cost-effective devices while significantly reducing LLM serving delays. Examine practical demonstrations showing how this approach not only improves economic efficiency but also enhances performance, particularly in first-token response times. Gain insights into solving the challenge of storing and quickly serving KV caches without relying solely on GPU/CPU memory.

Syllabus

Making Long-context LLM Inference 10x faster & 10x cheap... - Junchen Jiang, Yihua Cheng, & Zhou Sun

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Making Long-context LLM Inference 10x Faster and 10x Cheaper Through Knowledge Sharing

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.