How Multilingual Is Your NLP Model?
Become an AI & ML Engineer with Cal Poly EPaCE — IBM-Certified Training
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the challenges and solutions for multilingual Natural Language Processing (NLP) models in this 45-minute PyCon US talk by Shreya Khurana. Dive into the complexities of language identification, transliterated and code-switched text, and the use of multilingual BERT models. Learn about existing Python frameworks for language identification tasks and their limitations. Discover approaches to handling the lack of annotated datasets for transliterated and code-switched text using web crawlers and self-generated datasets. Examine the performance of Google's multilingual BERT model trained in 104 languages through practical examples. Gain insights into evaluating NLP models for various tasks in a multilingual context. Access additional resources and code examples on GitHub to further enhance your understanding of multilingual NLP techniques.
Syllabus
Introduction
About me
Outline
Why multilingual data
Tasks associated with language systems
Syntax mixing
Transliterated text
Language identification
Language identification in practice
Other examples
Lambda ID
Blanked
Python
Limitations
Data augmentation
Simple example
The Transformer
Multiheaded attention
Stateoftheart soda
Why is it special
Word Piece Processing
Statistics of Languages
Bird Masked Language Model
Prediction Function
Code Switched Example
Lyrics Example
Task Evaluation
Generation Evaluation
Summary
Taught by
PyCon US