Video Intelligence Is Going Agentic

Explore how agentic video intelligence is revolutionizing AI's ability to understand and process dynamic visual content in this conference talk from MLOps World. Discover why traditional AI systems struggle with video data, which comprises over 90% of the world's information, and learn how combining multimodal foundation models with agent architectures creates autonomous systems capable of reasoning about video, planning complex workflows, and executing sophisticated visual tasks. Examine real-world implementations including MLSE's dramatic efficiency improvement that reduced 16-hour highlight creation workflows to just 9 minutes, representing a 98% efficiency boost. Master the design principles of planner-worker-reflector agent systems and understand how to manage temporal context across extended video workflows while building transparent reasoning pipelines that effectively bridge language and visual media. Gain practical implementation strategies for transparent, multimodal agent reasoning and learn to identify high-impact use cases that can drive massive productivity and ROI gains in media, entertainment, and enterprise video processing applications, ultimately pushing video AI beyond simple analysis into intelligent, autonomous action.