Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Towards High-throughput and Low-latency Billion-scale Vector Search via CPU/GPU Collaborative Filtering and Re-ranking

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This conference talk from FAST '25 presents FusionANNS, a novel system for high-throughput and low-latency billion-scale vector search that leverages CPU/GPU collaborative filtering and re-ranking. Discover how researchers from Huazhong University of Science and Technology and Huawei Technologies address the performance, cost, and accuracy challenges in Approximate Nearest Neighbor Search (ANNS) services. Learn about their innovative approach using SSDs and a single entry-level GPU, featuring multi-tiered indexing to prevent data swapping, heuristic re-ranking to eliminate unnecessary I/O operations, and redundant-aware I/O deduplication for improved efficiency. The 16-minute presentation demonstrates how FusionANNS achieves 9.4-13.1× higher query per second and 5.7-8.8× higher cost efficiency compared to SPANN, while also outperforming RUMMY with 2-4.9× higher QPS and 2.3-6.8× better cost efficiency, all while maintaining low latency and high accuracy.

Syllabus

FAST '25 - Towards High-throughput and Low-latency Billion-scale Vector Search via CPU/GPU...

Taught by

USENIX

Reviews

Start your review of Towards High-throughput and Low-latency Billion-scale Vector Search via CPU/GPU Collaborative Filtering and Re-ranking

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.