Scaling Vector Database Usage Without Breaking the Bank - Quantization and Adaptive Retrieval
Toronto Machine Learning Series (TMLS) via YouTube
Learn Python with Generative AI - Self Paced Online
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to optimize vector search deployment costs and performance in this technical talk from the Toronto Machine Learning Series. Explore practical techniques for scaling vector databases efficiently, focusing on quantization methods and adaptive retrieval strategies. Discover how to perform real-time billion-scale vector searches on modest hardware through various quantization approaches including product, binary, scalar, and matryoshka quantization. Master the implementation of adaptive retrieval, which combines fast low-accuracy searches using compressed vectors with targeted high-accuracy rescoring. Understand how to achieve significant memory cost reductions (up to 32x) while maintaining strong retrieval performance with only minimal accuracy trade-offs in RAG applications. Gain valuable insights from Senior ML Developer Advocate Zain Hassan on balancing memory costs, latency performance, and retrieval accuracy for production-level vector search deployments.
Syllabus
Scaling Vector Database Usage Without Breaking the Bank Quantization and Adaptive Retrieval
Taught by
Toronto Machine Learning Series (TMLS)