Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Towards Memory Efficient RAG Pipelines with CXL Technology

Open Compute Project via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore memory optimization strategies for Retrieval-Augmented Generation (RAG) pipelines using Compute Express Link (CXL) technology in this 15-minute conference talk. Learn how various stages of RAG AI-inference pipelines consume large volumes of data, particularly during the data preparation phase for creating and inserting embeddings into Vector databases, which requires significant transient memory. Discover how the search phase also increases memory consumption depending on index tree sizes and parallel queries, with peak memory usage varying based on RAG pipeline load including insertions and transient behaviors. Understand why statically provisioned local memory to meet peak usage proves inefficient and examine two proposed CXL memory approaches to address high memory challenges while reducing locally attached memory costs: CXL Memory Pooling for provisioning memory based on transient needs, and CXL Memory Tiering using cheaper, larger capacity memory solutions.

Syllabus

Towards memory efficient RAG pipelines with CXL technology

Taught by

Open Compute Project

Reviews

Start your review of Towards Memory Efficient RAG Pipelines with CXL Technology

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.