The Truth About RAG and vLLM - Why Your Multimodal System Fails at Scale

The Truth About RAG and vLLM - Why Your Multimodal System Fails at Scale

InfoQ via YouTube Direct link

0:00 - Introduction & Multimodal RAG with Pixtral & vLLM

1 of 15

1 of 15

0:00 - Introduction & Multimodal RAG with Pixtral & vLLM

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

The Truth About RAG and vLLM - Why Your Multimodal System Fails at Scale

Automatically move to the next video in the Classroom when playback concludes

  1. 1 0:00 - Introduction & Multimodal RAG with Pixtral & vLLM
  2. 2 1:45 - What Vector Search Actually Is and Why It Matters
  3. 3 4:10 - Indexing Deep Dive: FLAT, IVF, and HNSW Explained
  4. 4 7:50 - The Index Tradeoff Matrix: Speed vs. Accuracy vs. Cost
  5. 5 9:45 - Embedding Models: Stop Using the Wrong Ones!
  6. 6 13:30 - The RAG Pipeline: From Unstructured Data to Retrieval
  7. 7 14:20 - Why Vibe Coding Your RAG is a Disaster Proper Evals
  8. 8 16:45 - RAG vs. The Long Context LLM Myth Llama 4, Gemini 2.5 Benchmarks
  9. 9 19:10 - Hybrid Search BM25 + Similarity and Metadata Filtering
  10. 10 22:30 - Building the Self-Hosted Multimodal RAG Stack Pixtral, Milvus, vLLM
  11. 11 25:05 - The Inference Challenge: Latency, Throughput, and Batching
  12. 12 27:15 - Model Parallelism: Why You Need to Split the Model Tensor Parallelism
  13. 13 29:30 - Optimization Secrets: Quantization & Paged Attention KV Cache
  14. 14 31:50 - Live Demo Architecture & Setup
  15. 15 33:05 - Q&A: Chunking Strategy, CAG, and More

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.