BLAS-on-Flash - An Efficient Alternative for Large Scale ML Training and Inference

Explore a 24-minute conference talk from USENIX NSDI '19 that presents BLAS-on-flash, an innovative approach for large-scale machine learning training and inference. Discover how this method addresses memory limitations in ML tasks by utilizing SSD-resident matrices and BLAS interfaces. Learn about the potential of this technique to enable multi-threaded code to handle industrial-scale datasets on a single workstation, offering near in-memory performance. Examine case studies demonstrating BLAS-on-flash's effectiveness in complex algorithms like eigensolvers, outperforming both in-memory and distributed solutions. Gain insights into practical applications in ranking and relevance pipelines, including large-scale topic modeling and extreme multi-label learning. Consider how this approach could serve as an efficient and cost-effective alternative to expensive big-data compute systems for scaling up structurally complex machine learning tasks.