Build AI Apps with Azure, Copilot, and Generative AI — Microsoft Certified
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore data augmentation techniques for machine translation in this 25-minute lecture from CMU's Multilingual Natural Language Processing course. Delve into methods utilizing monolingual data and high-resource languages, covering topics such as back translation, multilingual training approaches, and pivoting strategies. Learn about iterative back-translation, English-HRL augmentation, and dictionary-based techniques. Gain insights into word alignment and word-by-word data augmentation with reordering. Understand the challenges of low-resource machine translation and discover practical solutions to enhance translation quality in resource-constrained scenarios.
Syllabus
Intro
Data Challenges in Low-resource MT
Multilingual Training Approaches
Data Augmentation 101: Back Translation
Back Translation Idea
How to Generate Translations
Iterative Back-translation
Back Translation Issues
English - HRL Augmentation
Augmentation via Pivoting
Data w/ Various Types of Pivoting
Monolingual Data Copying
Dictionary-based Augmentation
An Aside: Word Alignment
Word-by-word Data Augmentation
Word-by-word Augmentation w/ Reordering
Taught by
Graham Neubig