The Fastest Way to Become a Backend Developer Online
Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore transfer learning and pre-trained contextualized representations in this 20-minute conference talk from KDD2020. Dive into BERT and its improvements, including span-based efficient pre-training and ROBERTA. Learn about extractive QA, GLUE, and the challenges that remain in the field. Discover potential future directions such as few-shot learning and non-parametric memories. Gain insights from Mandar Joshi on advancing natural language processing techniques through innovative pre-training approaches and model architectures.
Syllabus
Transfer Learning via Pre-training
Pre-trained Contextualized Representations
BERT [Devlin et al. (2018)]
How can we do better?
Span-based Efficient Pre-training
Pre-training Span Representations
Why is this more efficient?
Random subword masks can be too easy
Which spans to mask?
Why SBO?
Single-sequence Inputs
Evaluation
Baselines
Extractive QA: SQUAD
GLUE
ROBERTA: Scaling BERT
The ROBERTA Recipe
What is still hard?
Next Big Thing: Few Shot Learning?
Next Big Thing: Non-parametric Memories?
Taught by
Association for Computing Machinery (ACM)