How to Build the World's Fastest Voice Bot

Learn to build ultra-fast voice AI bots with sub-500ms response times in this conference talk from the AI Engineer World's Fair. Discover how to self-host speech-to-text, LLM inference, and text-to-speech components within the same container or cluster for optimal performance. Explore techniques for routing audio over the internet using WebRTC and edge networking to minimize latency. Master the configuration of voice activity detection, phrase endpointing, and other pipeline components while understanding the critical trade-offs involved in latency optimization. See a practical demonstration of a Llama 3 voice bot achieving ~500ms voice-to-voice response times using Deepgram's STT and TTS services hosted on Cerebrium's serverless GPU infrastructure. Gain insights from Daily's CEO and co-founder on building infrastructure for video and audio applications in an increasingly video-first digital landscape.