Scaling Vector Database Usage Without Breaking the Bank - Quantization and Adaptive Retrieval
Toronto Machine Learning Series (TMLS) via YouTube
PowerBI Data Analyst - Create visualizations and dashboards from scratch
40% Off Career-Building Certificates
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to optimize vector search deployment costs and performance in this technical talk from the Toronto Machine Learning Series. Explore practical techniques for scaling vector databases efficiently, focusing on quantization methods and adaptive retrieval strategies. Discover how to perform real-time billion-scale vector searches on modest hardware through various quantization approaches including product, binary, scalar, and matryoshka quantization. Master the implementation of adaptive retrieval, which combines fast low-accuracy searches using compressed vectors with targeted high-accuracy rescoring. Understand how to achieve significant memory cost reductions (up to 32x) while maintaining strong retrieval performance with only minimal accuracy trade-offs in RAG applications. Gain valuable insights from Senior ML Developer Advocate Zain Hassan on balancing memory costs, latency performance, and retrieval accuracy for production-level vector search deployments.
Syllabus
Scaling Vector Database Usage Without Breaking the Bank Quantization and Adaptive Retrieval
Taught by
Toronto Machine Learning Series (TMLS)