Speaker Diarization - From Modular to End-to-End Systems - Day 3 Morning
Center for Language & Speech Processing(CLSP), JHU via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore speaker diarization techniques through comprehensive slides from Federico Landini's lecture covering the evolution from modular to end-to-end systems. Examine the fundamental concepts, methodologies, and recent advances in automatically determining "who spoke when" in audio recordings. Learn about traditional modular approaches that separate speaker diarization into distinct components like voice activity detection, speaker segmentation, and clustering, then discover how modern end-to-end systems integrate these processes for improved performance. Study various neural network architectures, clustering algorithms, and evaluation metrics used in speaker diarization research. Analyze real-world applications including meeting transcription, broadcast news processing, and multi-speaker conversation analysis. Review current challenges in the field such as overlapping speech detection, speaker change detection, and handling varying numbers of speakers, while exploring cutting-edge solutions and future research directions in this rapidly evolving area of speech processing technology.
Syllabus
[slides] Day 3 morning - JSALT 2025 - Landini: Speaker Diarization
Taught by
Center for Language & Speech Processing(CLSP), JHU