35% Off Finance Skills That Get You Hired - Code CFI35
AI Adoption - Drive Business Value and Organizational Impact
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a mathematical framework for cost-effective dataset labeling that combines expert annotations with AI predictions in this conference talk from Harvard's Center of Mathematical Sciences and Applications. Learn how to construct high-quality labeled datasets by supplementing expensive human annotations or experimental data with predictions from pre-trained AI models, while maintaining rigorous statistical guarantees. Discover the theoretical foundations behind "probably approximately correct labels" - a method that ensures with high probability that overall labeling error remains small. Examine practical applications across three domains: text annotation using large language models, image classification with pre-trained vision models, and protein structure analysis with AlphaFold. Understand how this approach enables efficient dataset curation while preserving the reliability needed for machine learning applications, presented as part of the Workshop on Mathematical Foundations of AI by Stanford researcher Tijana Zrnic in collaboration with Emmanuel Candes and Andrew Ilyas.
Syllabus
Tijana Zrnic | Probably Approximately Correct Labels
Taught by
Harvard CMSA