PluS - Highly Efficient and Expandable ML Compiler with Pluggable Graph Schedules

Learn about PluS, a novel end-to-end machine learning compiler designed to bridge the gap between compiler flexibility and expert-optimized performance in this 15-minute conference presentation from USENIX ATC '25. Discover how traditional ML compilers struggle with supporting emerging optimization techniques like recent attention optimizations and lack the flexibility for timely expert-driven subgraph optimizations, while template-based compilers cannot abstractly express subgraphs, reducing adaptability to model architecture changes. Explore PluS's innovative approach that decouples the burdensome embedded graph transformation process and provides a lightweight loop-centric subgraph abstraction, enabling experts to manage a flexible pattern warehouse through pattern identification for subgraph generation. Understand how this architecture allows PluS to deploy efficient subgraph implementations with minimal manual effort, achieving up to 4.04× speedup over state-of-the-art rule-based embedded compilers on popular ML models. Gain insights into the technical implementation presented by researchers from Renmin University of China, Microsoft, and Tsinghua University, demonstrating how PluS maintains compiler flexibility while incorporating expert-optimized subgraph implementations for superior performance in deploying diverse Deep Neural Network workloads across various hardware platforms.