MIT Sloan AI Adoption: Build a Playbook That Drives Real Business ROI
NY State-Licensed Certificates in Design, Coding & AI — Online
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off your first 3 months — limited time.
Unlock All Certificates
Watch a 37-minute lecture from UC Berkeley researcher Irene Y Chen at the Simons Institute exploring why combining data from different sources for machine learning training isn't always beneficial. Learn about the "Data Addition Dilemma" where mixing dissimilar data sources can reduce accuracy, create fairness issues, and harm performance for underrepresented groups. Examine the fundamental trade-off between benefits of increased data scale and drawbacks of distribution shifts when combining datasets. Discover practical strategies and heuristics for deciding which data sources to combine to achieve optimal model performance improvements. Gain insights into key considerations for data collection and composition as AI models continue growing in size and complexity.
Syllabus
The Data Addition Dilemma
Taught by
Simons Institute