Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Optimized RAG - Strategies for Cost and Scale

MLOps World: Machine Learning in Production via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn practical strategies for optimizing Retrieval-Augmented Generation (RAG) systems for production deployment in this comprehensive conference talk and hands-on coding lab. Explore the critical transition from prototype to production, focusing on reducing latency and cost while maintaining performance at scale. Discover high-impact optimization techniques across different stages of the RAG pipeline, including data preparation, retrieval and ranking, and generation with observability. Master embedding quantization to reduce memory footprint and compute costs, implement context highlighting for improved relevance and reduced latency, apply Reciprocal Rank Fusion ranking techniques for low-latency use cases, and utilize context compression methods. Participate in a hands-on coding laboratory using Python, Google Colab, Elasticsearch, and Hugging Face models to implement filtered search, embedding quantization, and context highlighting in real-world workflows. Gain insights from a senior data scientist at Elastic who specializes in GenAI-powered search solutions and has extensive experience developing AI solutions across internet-scale platforms, metals and mining, oil and gas, and e-commerce domains.

Syllabus

Optimized RAG: Strategies for Cost and Scale

Taught by

MLOps World: Machine Learning in Production

Reviews

Start your review of Optimized RAG - Strategies for Cost and Scale

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.