Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Large Language-Audio Models and Applications

SAIConference via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the cutting-edge intersection of Natural Language Processing and audio intelligence in this comprehensive 41-minute keynote presentation that delves into Large Language-Audio Models (LLAMs) and their transformative applications. Learn how Large Language Models like GPT are evolving beyond text to integrate with audio signal processing, unlocking powerful new capabilities for speech, music, environmental sounds, and acoustic data processing. Discover the motivation and architecture behind merging LLMs with acoustic models, including techniques for aligning text and audio in joint embedding spaces and strategies for tokenizing audio to enable processing through language models. Examine state-of-the-art tools and frameworks including AudioLDM/AudioLDM2 for text-to-audio generation, WavJourney for audio storytelling, AudioSep for language-guided source separation, ACTUAL for automatic audio captioning, WavCraft for controllable audio editing, APT-LLMs for audio reasoning, and SemantiCodec for neural audio coding. Understand real-world applications across media, gaming, film, assistive technology, and education sectors while addressing critical challenges in multimodal data fusion, low-resource environments, and model scalability. Gain insights into essential datasets like WavCaps, Sound-VECaps, and AudioSetCaps used for training and benchmarking these advanced models, presented by Professor Wenwu Wang, a leading expert in signal processing and machine learning from the University of Surrey.

Syllabus

Large Language-Audio Models and Applications | Wenwu Wang | Computing2025

Taught by

SAIConference

Reviews

Start your review of Large Language-Audio Models and Applications

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.