Lead AI Strategy with UCSB's Agentic AI Program — Microsoft Certified
AI Engineer - Learn how to integrate AI into software applications
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn advanced data mining concepts in this 21-minute lecture focusing on Min Hashing techniques and statistical foundations. Explore the mathematical principles behind choosing optimal k values for Min Hashing, including Probably Approximately Correct (PAC) learning, Central Limit Theorem, and Chernoff-Hoeffding Inequality. Master the application of these theoretical concepts to obtain accurate Jaccard Similarity estimates through Min Hashing. Delve into practical implementations while understanding the statistical guarantees that make Min Hashing a powerful technique in data mining applications.
Syllabus
Recording Start
Lecture starts
Course Materials Copyright
Announcements
Choosing k for minhashing motivation
PAC
Central Limit Theorem
Chernoff-Hoeffding Inequality
Choosing k for a good estimate of JS
Recording ends
Taught by
UofU Data Science