Multilingual and Code-Switching Speech Recognition

Explore the challenges and advancements in multilingual and code-switching speech recognition in this comprehensive 3-hour and 25-minute lecture from the Center for Language & Speech Processing at Johns Hopkins University. Delve into the growing importance of multilingual Automatic Speech Recognition (ASR) systems in the era of personal assistant devices and smartphones. Examine the complexities of handling language and dialectal variations in spoken content, and discover recent studies demonstrating the efficacy of multilingual systems over monolingual ones. Investigate the challenges of designing Code-Switching ASR systems, including data scarcity, grammatical structure complexity, and unbalanced language usage distribution. Learn about proposed studies focusing on English and Arabic language sets, as well as low-resourced and indigenous languages from various regions. Explore novel techniques for building practical Large Vocabulary Continuous Speech Recognition (LVCSR) systems capable of handling both monolingual and code-switching utterances. Gain insights into data augmentation, transfer learning, and self-supervised learning techniques to address the lack of balanced transcribed data. Discover the four work packages covered in the lecture, including designing multilingual ASR with code-switching capabilities, handling low-resourced languages through synthetic data generation, developing robust evaluation measures for mixed script outputs, and analyzing the social and linguistic factors influencing code-switching in speech.