Interactive Proofs, Debate, and AI Safety - Theoretical Approaches to Human Oversight

Watch a 54-minute lecture from Google DeepMind researcher Jonah Brown-Cohen at the Simons Institute exploring the intersection of interactive proofs, AI debate theory, and AI safety. Discover how classical computational complexity theory concepts can help address challenges in AI oversight as systems become more sophisticated. Learn about methods for amplifying human supervision capabilities through the lens of interactive proof systems, where computationally limited verifiers can effectively judge outputs from more powerful provers. Examine the foundational theory of AI debate, including its connections to interactive proofs and the importance of relativizing protocols. Explore unique theoretical challenges specific to the debate framework and newly developed approaches for overcoming these obstacles. Gain insights into cutting-edge research aimed at improving human oversight of increasingly complex AI systems through formal theoretical frameworks.