AI Breaks Science - Evaluating Research Agents with Progressive Code Masking

Explore a 20-minute video examining groundbreaking AI research from Carnegie Mellon University on AI agents' capabilities in scientific discovery. Delve into the latest findings on how AI systems perform when tasked with complex scientific research, from initial promise to emerging limitations. Discover the performance cliff that appears as scientific tasks increase in complexity, revealing both the potential and current boundaries of AI in scientific domains. Learn about the research methodology involving progressive code masking to evaluate AI agents' ability to move from reproduction to replication in scientific work. Understand the implications of this research for the future of AI-assisted scientific discovery across multiple disciplines, based on the work "From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking" by researchers Gyeongwon James Kim, Alex Wilf, Louis-Philippe Morency, and Daniel Fried.