Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Data-Intensive AI Inference Done Better - Offloading Model Weights and RAG Data to Storage

SNIAVideo via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore advanced techniques for optimizing AI inference solutions with Retrieval-Augmented Generation (RAG) in this 35-minute conference talk from SNIA SDC 2025. Learn how enterprises can overcome infrastructure limitations and cost barriers when implementing complex AI models and large RAG datasets by leveraging open-source software components and high-performance NVMe SSDs. Discover two complementary approaches for achieving unprecedented scale: offloading model weights to storage using DeepSpeed and offloading RAG data to storage using DiskANN. Examine how combining these methods enables more complex models to run on GPUs that were previously unusable while achieving greater cost efficiency with large amounts of RAG data. Analyze benchmarking results demonstrating the impact of SSD offload on DRAM usage, queries per second (QPS), index time, and recall performance. Review a practical demonstration showing how this solution works in a real-world traffic video use case, and understand the broader opportunities and challenges associated with AI inference using RAG technology.

Syllabus

SNIA SDC 2025 - Data-Intensive Inference Done Better

Taught by

SNIAVideo

Reviews

Start your review of Data-Intensive AI Inference Done Better - Offloading Model Weights and RAG Data to Storage

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.