Completed
0:00 - Introduction & Multimodal RAG with Pixtral & vLLM
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
The Truth About RAG and vLLM - Why Your Multimodal System Fails at Scale
Automatically move to the next video in the Classroom when playback concludes
- 1 0:00 - Introduction & Multimodal RAG with Pixtral & vLLM
- 2 1:45 - What Vector Search Actually Is and Why It Matters
- 3 4:10 - Indexing Deep Dive: FLAT, IVF, and HNSW Explained
- 4 7:50 - The Index Tradeoff Matrix: Speed vs. Accuracy vs. Cost
- 5 9:45 - Embedding Models: Stop Using the Wrong Ones!
- 6 13:30 - The RAG Pipeline: From Unstructured Data to Retrieval
- 7 14:20 - Why Vibe Coding Your RAG is a Disaster Proper Evals
- 8 16:45 - RAG vs. The Long Context LLM Myth Llama 4, Gemini 2.5 Benchmarks
- 9 19:10 - Hybrid Search BM25 + Similarity and Metadata Filtering
- 10 22:30 - Building the Self-Hosted Multimodal RAG Stack Pixtral, Milvus, vLLM
- 11 25:05 - The Inference Challenge: Latency, Throughput, and Batching
- 12 27:15 - Model Parallelism: Why You Need to Split the Model Tensor Parallelism
- 13 29:30 - Optimization Secrets: Quantization & Paged Attention KV Cache
- 14 31:50 - Live Demo Architecture & Setup
- 15 33:05 - Q&A: Chunking Strategy, CAG, and More