Scaling RAG with NVMe

Explore how to scale Retrieval-Augmented Generation (RAG) systems beyond traditional DRAM limitations using NVMe storage technology in this 49-minute conference presentation. Examine the challenges of relying solely on DRAM for vector index storage as Large Language Models drive petabyte-scale growth, and discover how traditional in-memory indexing strategies quickly exhaust host memory as vector collections expand. Learn about DISKANN (Disk-Accelerated Approximate Nearest Neighbor), Microsoft's hybrid vector search algorithm designed to offload portions of search indexes to NVMe SSDs while maintaining performance. Understand how DISKANN enables scalable approximate nearest neighbor search through intelligent management of multi-level indexes, keeping latency-critical portions in memory while utilizing SSDs for the remainder without significant performance degradation. Analyze current vector indexing approaches and sizing challenges before diving into DISKANN's hybrid memory-and-SSD architecture. Compare latency, throughput, and resource utilization between pure in-memory indexes and DISKANN-backed indexes through real-world use cases. Gain critical insights into when and how NVMe-augmented indexing makes sense, plus practical guidance on tuning SSD parameters to sustain high service levels as vector databases continue to scale. Presented by Alessandro Goncalves from Solidigm at SNIA SDC 2025.