Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Smarter RAG, Smaller Bill - Optimize for Performance and Price

DevConf via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn advanced cost optimization techniques for Retrieval-Augmented Generation (RAG) applications in this 14-minute conference talk from DevConf.US 2025. Discover how RAGCache technology can deliver additional cost savings beyond the standard 60% reduction that RAG apps typically provide over standard LLMs. Explore three cutting-edge optimization techniques: dynamic knowledge caching that stores intermediate states in structured knowledge trees while balancing GPU and host memory usage, efficient replacement policies specifically tailored for LLM inference and RAG retrieval patterns, and seamless overlap strategies that combine retrieval and inference to minimize latency. Understand how integrating RAGCache with tools like vLLM and Faiss achieves 4x faster Time to First Token (TTFT) and 2.1x throughput boost while optimizing both latency and computational efficiency. Examine current RAG challenges, explore practical solutions for reducing costs while improving user experience, analyze performance metrics and key benefits, and review real-world applications of these optimization strategies for building more efficient LLM applications in 2025.

Syllabus

Smarter RAG, Smaller Bill: Optimize for Performance and Price - DevConf.US 2025

Taught by

DevConf

Reviews

Start your review of Smarter RAG, Smaller Bill - Optimize for Performance and Price

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.