Building RAG Applications with Elasticsearch - From Vector Search to Generative Caching

Learn how to effectively build and optimize Retrieval Augmented Generation (RAG) chat applications in this 44-minute conference talk from GOTO Chicago 2024. Explore essential concepts from lexical and semantic search fundamentals to advanced topics like ELSER implementation and generative caching. Master the challenges of file processing, semantic search, and prompt iteration while discovering how Elasticsearch streamlines these processes. Follow along with practical demonstrations using Elastic Playground to fine-tune settings, test RAG prompts, and optimize performance. Gain hands-on experience transforming files into sparse vector embeddings for improved search results and learn to export code for seamless integration into functional chat applications. Through detailed examples and live demonstrations, understand the benefits of LLM caching, explore inference APIs, and discover practical solutions for common query problems in RAG implementations.

Syllabus

Intro
Lexical 101
Semantical 101
ELSER
Query problem
Inference API
Semantic text
b RAG Retrieval Augmented Generation
Generative caching
LLM caching benefits
LLM helpy helper
Demo
Outro