Milliseconds to Magic - Real-Time Workflows using the Gemini Live API and Pipecat
AI Engineer via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the capabilities of Google's Gemini Live API powered by Gemini 2.5 Flash in this 22-minute conference talk from the AI Engineer World's Fair. Dive deep into how the Gemini Live API combined with Pipecat unlocks powerful real-time multimodal capabilities for developers, with special focus on session management, turn detection, tool use including async function calls, proactivity, multilinguality, and integration with telephony and other infrastructure. Witness innovative demonstrations showcasing these capabilities and learn about customer use cases including how Pipecat extends real-time multimodal features to client-side applications such as customer support agents, gaming agents, and tutoring agents. Discover Google's experimental native audio offering that enables seamless, emotive, steerable, multilingual dialogue for use cases where natural voices provide significant differentiation. Gain insights from Kwindla Kramer, creator of the open-source Pipecat voice agent framework and WebRTC infrastructure expert at Daily, alongside Shrestha Basu Mallick, Group Product Manager and product lead for Gemini API at Google DeepMind, who brings extensive experience in AI assistance and product development across Google's coding surfaces.
Syllabus
Milliseconds to Magic: Real‑Time Workflows using the Gemini Live API and Pipecat
Taught by
AI Engineer