KDD2020 - Transfer Learning Joshi

Explore transfer learning and pre-trained contextualized representations in this 20-minute conference talk from KDD2020. Dive into BERT and its improvements, including span-based efficient pre-training and ROBERTA. Learn about extractive QA, GLUE, and the challenges that remain in the field. Discover potential future directions such as few-shot learning and non-parametric memories. Gain insights from Mandar Joshi on advancing natural language processing techniques through innovative pre-training approaches and model architectures.

Syllabus

Transfer Learning via Pre-training
Pre-trained Contextualized Representations
BERT [Devlin et al. (2018)]
How can we do better?
Span-based Efficient Pre-training
Pre-training Span Representations
Why is this more efficient?
Random subword masks can be too easy
Which spans to mask?
Why SBO?
Single-sequence Inputs
Evaluation
Baselines
Extractive QA: SQUAD
GLUE
ROBERTA: Scaling BERT
The ROBERTA Recipe
What is still hard?
Next Big Thing: Few Shot Learning?
Next Big Thing: Non-parametric Memories?