Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to optimize speech-to-text and text-to-speech models for real-time AI agent applications in this 17-minute conference talk from Agents in Production 2025. Discover key optimization strategies for open source models like Whisper and Orpheus to achieve consistent low latencies in both transcription and speech synthesis workloads. Explore the unique challenges that voice modalities introduce in runtime performance and network overhead, and understand how to avoid common mistakes that can impact system efficiency. Gain practical insights for implementing voice capabilities in production AI agents where speed and reliability are critical for user experience.
Syllabus
Voice model performance optimization // Madison Kanna // Agents in Production 2025
Taught by
MLOps.community