Speaker Diarization - From Modular to End-to-End Systems - Day 3 Morning
Center for Language & Speech Processing(CLSP), JHU via YouTube
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Master Windows Internals - Kernel Programming, Debugging & Architecture
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore speaker diarization techniques through comprehensive slides from Federico Landini's lecture covering the evolution from modular to end-to-end systems. Examine the fundamental concepts, methodologies, and recent advances in automatically determining "who spoke when" in audio recordings. Learn about traditional modular approaches that separate speaker diarization into distinct components like voice activity detection, speaker segmentation, and clustering, then discover how modern end-to-end systems integrate these processes for improved performance. Study various neural network architectures, clustering algorithms, and evaluation metrics used in speaker diarization research. Analyze real-world applications including meeting transcription, broadcast news processing, and multi-speaker conversation analysis. Review current challenges in the field such as overlapping speech detection, speaker change detection, and handling varying numbers of speakers, while exploring cutting-edge solutions and future research directions in this rapidly evolving area of speech processing technology.
Syllabus
[slides] Day 3 morning - JSALT 2025 - Landini: Speaker Diarization
Taught by
Center for Language & Speech Processing(CLSP), JHU