Pareto-efficient AI Systems: Expanding the Quality and Efficiency Frontier of AI

This Allen School Colloquia Series talk features Simran Arora from Stanford University discussing "Pareto-efficient AI systems: Expanding the quality and efficiency frontier of AI." Explore how to build language model architectures that optimize the tradeoff between quality and throughput efficiency. The presentation is structured in three parts: first, identifying fundamental quality-efficiency tradeoffs between architecture classes; second, evaluating existing architecture candidates with the ThunderKittens programming library; and third, expanding the Pareto frontier with the BASED architecture, which led to state-of-the-art Transformer-free language models developed on an academic budget. Arora, a PhD student advised by Chris Ré, blends AI and systems research to maximize capabilities while minimizing compute constraints. Her award-winning work has been published at major conferences including NeurIPS, ICML, ICLR, VLDB, and SIGMOD, with her systems artifacts widely adopted in research and industry.