Overview
Syllabus
0:00 - Introduction & Multimodal RAG with Pixtral & vLLM
1:45 - What Vector Search Actually Is and Why It Matters
4:10 - Indexing Deep Dive: FLAT, IVF, and HNSW Explained
7:50 - The Index Tradeoff Matrix: Speed vs. Accuracy vs. Cost
9:45 - Embedding Models: Stop Using the Wrong Ones!
13:30 - The RAG Pipeline: From Unstructured Data to Retrieval
14:20 - Why Vibe Coding Your RAG is a Disaster Proper Evals
16:45 - RAG vs. The Long Context LLM Myth Llama 4, Gemini 2.5 Benchmarks
19:10 - Hybrid Search BM25 + Similarity and Metadata Filtering
22:30 - Building the Self-Hosted Multimodal RAG Stack Pixtral, Milvus, vLLM
25:05 - The Inference Challenge: Latency, Throughput, and Batching
27:15 - Model Parallelism: Why You Need to Split the Model Tensor Parallelism
29:30 - Optimization Secrets: Quantization & Paged Attention KV Cache
31:50 - Live Demo Architecture & Setup
33:05 - Q&A: Chunking Strategy, CAG, and More
Taught by
InfoQ