Optimized RAG - Strategies for Cost and Scale
MLOps World: Machine Learning in Production via YouTube
AI Product Expert Certification - Master Generative AI Skills
The Perfect Gift: Any Class, Never Expires
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn practical strategies for optimizing Retrieval-Augmented Generation (RAG) systems for production deployment in this comprehensive conference talk and hands-on coding lab. Explore the critical transition from prototype to production, focusing on reducing latency and cost while maintaining performance at scale. Discover high-impact optimization techniques across different stages of the RAG pipeline, including data preparation, retrieval and ranking, and generation with observability. Master embedding quantization to reduce memory footprint and compute costs, implement context highlighting for improved relevance and reduced latency, apply Reciprocal Rank Fusion ranking techniques for low-latency use cases, and utilize context compression methods. Participate in a hands-on coding laboratory using Python, Google Colab, Elasticsearch, and Hugging Face models to implement filtered search, embedding quantization, and context highlighting in real-world workflows. Gain insights from a senior data scientist at Elastic who specializes in GenAI-powered search solutions and has extensive experience developing AI solutions across internet-scale platforms, metals and mining, oil and gas, and e-commerce domains.
Syllabus
Optimized RAG: Strategies for Cost and Scale
Taught by
MLOps World: Machine Learning in Production