Save 40% on 3 months of Coursera Plus
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore building high-performance, real-time multimodal AI agent systems through a comprehensive conference talk examining server-side architecture using Rust. Discover how to create systems capable of natural, real-time conversations using open-source AI models through a detailed case study of a Rust-based server component that orchestrates communication between edge devices and AI service clusters. Learn about modular approaches utilizing distinct, swappable services for Voice Activity Detection (VAD), Automatic Speech Recognition (ASR), Large Language Models (LLM), and Text-to-Speech (TTS). Understand core orchestration patterns for managing real-time audio streams and API calls to services like Whisper and various open-source LLMs. Examine why Rust was selected for its safety and high-throughput performance, particularly when handling numerous concurrent WebSocket and HTTP/S connections. Investigate the architectural flexibility that enables mixing locally hosted models for privacy (such as LlamaEdge) with powerful cloud APIs (like Google Gemini Live). Discover agentic extensibility through tool call integration using Model Context Protocol (MCP) to provide agents with access to live internet search, online APIs, and other devices. Gain insights valuable for engineers and developers building practical AI applications requiring real-time voice interaction, flexibility, modularity, custom tools, private knowledge, and agentic capabilities.
Syllabus
Orchestrating Real-Time Multimodal AI Agents with Rust - Miley Fu, Second State Inc.
Taught by
Linux Foundation