Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

LMCache - Lower LLM Performance Costs in the Enterprise

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to reduce GPU costs and improve LLM performance in enterprise environments through this 26-minute conference talk from CNCF. Discover LMCache, an open source LLM serving engine extension that significantly reduces Time to First Token (TTFT) and increases throughput for large language model deployments. Explore the key enterprise concerns of cost optimization and return on investment (ROI) when implementing AI applications like copilots, search engines, document understanding, and chatbots that rely on GPU clusters for high-throughput inference. Examine LMCache's high-performance KV cache management layer and see demonstrations of its integration with production inference engines including vLLM and KServe deployed on Kubernetes clusters. Understand real-world applications through examples of document analysis and high-speed RAG (Retrieval-Augmented Generation) support. Gain insights into the growing open source community developing KV caching solutions that are already impacting ROI for major companies including RedHat, IBM, Google, Nvidia, and CoreWeave.

Syllabus

LMCache: Lower LLM Performance Costs in the Enterprise - Martin Hickey & Junchen Jiang

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of LMCache - Lower LLM Performance Costs in the Enterprise

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.