Unlocking Audio AI - Building a Massive Open Dataset for Instruction-Tuned Audio-Text Foundation Models
Center for Language & Speech Processing(CLSP), JHU via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the future of audio AI through this seminar featuring Christoph Schuhmann, co-founder of LAION, who presents his vision for building massive open datasets to enable instruction-tuned audio-text foundation models. Learn about the revolutionary potential of applying large-scale instruction tuning techniques from language models to the audio domain, discovering how millions of permissively licensed audio snippets—including speech, music, and environmental sounds—can be transformed into richly annotated datasets. Understand the practical roadmap for creating billions of carefully annotated soundscapes using advanced generative AI like Gemini, complete with accurate timestamped event labels. Discover how such datasets could enable transformative progress across audio-to-text transcription, sound-event detection, audio generation, and multimodal voice assistants. Gain insights into LAION's commitment to open science and accessible AI development from the leader behind landmark datasets like LAION-400M and LAION-5B that powered models like Stable Diffusion. Examine the intersection of educational reform, open-source AI development, and the democratization of foundational AI technologies through the perspective of a physics and computer science educator who champions transparent, community-driven research initiatives.
Syllabus
[camera] JSALT 2025 - Seminar with Christoph Schuhmann (LAION)
Taught by
Center for Language & Speech Processing(CLSP), JHU