Continuous Data Pipeline for Real-Time Benchmarking and Data Set Augmentation
Data Council via YouTube
Master AI & Data—50% Off Udacity (Code CC50)
Start speaking a new language. It’s just 3 weeks away.
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a 15-minute conference talk from Data Council on building continuous data pipelines for real-time benchmarking and dataset augmentation. Learn how to generate datasets and implement real-time precision/recall splits to detect data shifts, prioritize data collection, and retrain models. Discover the importance of curating representative datasets for accurate ML systems and monitoring post-deployment metrics. Gain insights into addressing data shifts in unstructured language models and leveraging open-source APIs and annotation tools to streamline processes. Presented by Ivan Aguilar, a data scientist at Teleskope, this talk covers topics such as the problem statement, usual approaches, open-source data APIs, task overview, annotations overview, and final thoughts on improving ML model performance through effective data management strategies.
Syllabus
Intro
Why is this a problem?
Usual Approaches
Open Source Data API's
Task Overview
Annotations Overview
Final Thoughts
Taught by
Data Council