Stuck in Tutorial Hell? Learn Backend Dev the Right Way
AI Engineer - Learn how to integrate AI into software applications
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore building high-performance, real-time multimodal AI agent systems through a comprehensive conference talk examining server-side architecture using Rust. Discover how to create systems capable of natural, real-time conversations using open-source AI models through a detailed case study of a Rust-based server component that orchestrates communication between edge devices and AI service clusters. Learn about modular approaches utilizing distinct, swappable services for Voice Activity Detection (VAD), Automatic Speech Recognition (ASR), Large Language Models (LLM), and Text-to-Speech (TTS). Understand core orchestration patterns for managing real-time audio streams and API calls to services like Whisper and various open-source LLMs. Examine why Rust was selected for its safety and high-throughput performance, particularly when handling numerous concurrent WebSocket and HTTP/S connections. Investigate the architectural flexibility that enables mixing locally hosted models for privacy (such as LlamaEdge) with powerful cloud APIs (like Google Gemini Live). Discover agentic extensibility through tool call integration using Model Context Protocol (MCP) to provide agents with access to live internet search, online APIs, and other devices. Gain insights valuable for engineers and developers building practical AI applications requiring real-time voice interaction, flexibility, modularity, custom tools, private knowledge, and agentic capabilities.
Syllabus
Orchestrating Real-Time Multimodal AI Agents with Rust - Miley Fu, Second State Inc.
Taught by
Linux Foundation