Self-play LLM Theorem Provers with Iterative Conjecturing and Proving

Explore advanced reinforcement learning techniques for automated theorem proving in this 56-minute conference talk from Harvard CMSA's Workshop on Mathematical Foundations of AI. Learn about the Self-play Theorem Prover (STP), an innovative approach that addresses the challenge of improving AI models when high-quality training data becomes scarce. Discover how this system mimics mathematician behavior by simultaneously functioning as both conjecturer and prover, with each role providing training signals to enhance the other. Examine the methodology behind creating novel conjectures and exercises as variants of known mathematical results, and understand how this iterative process enables continuous model improvement. Review the state-of-the-art performance results achieved on benchmark datasets including miniF2F-test (65.0% success rate), Proofnet-test (23.9% success rate), and PutnamBench (8/644 problems solved), all evaluated using whole-proof generation methods with pass@3200 metrics. Gain insights into the future of AI-assisted mathematical reasoning and the potential for self-improving theorem proving systems in scenarios with limited training data availability.