Static Inference of Regular Grammars for Ad Hoc Parsers

Learn about a novel approach for automatically inferring regular grammars from ad hoc parser source code in this 16-minute conference presentation from OOPSLA 2025. Discover how researchers Michael Schröder and Jürgen Cito from TU Wien address the challenge of understanding and formalizing the implicit grammars used by ad hoc parsers—those written using common string operations without explicitly defined input grammars. Explore their innovative method that applies refinement type inference to synthesize logical and string constraints representing regular parsing operations, which are then interpreted into regular expressions using abstract semantics. Understand the core calculus λΣ for representing ad hoc parsers, the formulation of grammar inference as refinement inference, and the abstract interpretation framework for solving string refinement variables. Examine the set of abstract domains designed for efficiently representing constraints encountered during regular ad hoc parsing. Review the evaluation results of their PANINI system implementation, which demonstrates superior performance on a benchmark of 204 Python ad hoc parsers, achieving 100% precision and 93% average recall in just 0.82 ± 2.85 seconds without requiring prior knowledge of the input space. Gain insights into how this approach can enhance program comprehension, facilitate testing and debugging, and provide formal guarantees for parsing code while addressing the tedious and error-prone nature of manual grammar writing.