Build AI Apps with Azure, Copilot, and Generative AI — Microsoft Certified
The Most Addictive Python and SQL Courses
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the critical issue of data leakage and reproducibility in machine learning-based science through this insightful 48-minute talk. Delve into a comprehensive investigation of reproducibility failures across 17 scientific fields, affecting 329 papers and leading to overly optimistic conclusions. Examine a detailed taxonomy of 8 types of leakage, ranging from basic errors to complex research challenges. Learn about proposed methodological changes, including model info sheets, to prevent leakage before publication. Discover the results of a reproducibility study in civil war prediction, revealing how complex ML models fail to outperform older statistical methods due to data leakage. Gain valuable insights from Sayash Kapoor, a Ph.D. candidate at Princeton University, whose research on ML methods in science has garnered recognition and been featured in prominent media outlets.
Syllabus
DSI | Leakage and the Reproducibility Crisis in ML-based Science
Taught by
Inside Livermore Lab