Unlocking Audio AI - Building a Massive Open Dataset for Instruction-Tuned Audio-Text Foundation Models
Center for Language & Speech Processing(CLSP), JHU via YouTube
Build GenAI Apps from Scratch — UCSB PaCE Certificate Program
Launch a New Career with Certificates from Google, IBM & Microsoft
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the future of audio AI through this seminar featuring Christoph Schuhmann, co-founder of LAION, who presents his vision for building massive open datasets to enable instruction-tuned audio-text foundation models. Learn about the revolutionary potential of applying large-scale instruction tuning techniques from language models to the audio domain, discovering how millions of permissively licensed audio snippets—including speech, music, and environmental sounds—can be transformed into richly annotated datasets. Understand the practical roadmap for creating billions of carefully annotated soundscapes using advanced generative AI like Gemini, complete with accurate timestamped event labels. Discover how such datasets could enable transformative progress across audio-to-text transcription, sound-event detection, audio generation, and multimodal voice assistants. Gain insights into LAION's commitment to open science and accessible AI development from the leader behind landmark datasets like LAION-400M and LAION-5B that powered models like Stable Diffusion. Examine the intersection of educational reform, open-source AI development, and the democratization of foundational AI technologies through the perspective of a physics and computer science educator who champions transparent, community-driven research initiatives.
Syllabus
[camera] JSALT 2025 - Seminar with Christoph Schuhmann (LAION)
Taught by
Center for Language & Speech Processing(CLSP), JHU