Multi-Task Learning in Transformer-Based Architectures for Natural Language Processing

Learn about multi-task learning in transformer-based NLP architectures through this 31-minute conference talk that explores cost-effective alternatives to training separate models. Discover how leveraging information across multiple tasks and datasets can enhance performance through shared models, representation bias, increased data efficiency, and eavesdropping. Explore solutions to challenges like catastrophic forgetting and interference, while diving into general approaches to multi-task learning, innovative adapter-based techniques, hypernetwork methods, and strategies for task sampling and balancing. The presentation covers key topics including the Bird Paper, architecture considerations, modularity concepts, function composition, input composition, parameter composition, fusion techniques, and shared hypernetworks, concluding with insights into Chad GP implementations.

Syllabus

Intro
Outline
Agenda
Bird Paper
Architecture
Problems
Adapters
Modularity
Compositions
Overview
Function Composition
Input Composition
Parameter Composition
Fusion
Hyper Networks
Shared Hyper Networks
Chad GP
Questions