Generative Video LLMs - Planning Agents and Multimodal Composition
University of Central Florida via YouTube
Become an AI & ML Engineer with Cal Poly EPaCE — IBM-Certified Training
The Fastest Way to Become a Backend Developer Online
Overview
Google, IBM & Meta Certificates – 40% Off
One plan covers every Professional Certificate on Coursera.
Unlock All Certificates
Explore the cutting-edge intersection of generative artificial intelligence and video understanding in this keynote presentation that delves into how large language models are being adapted for video generation, planning agent capabilities, and multimodal composition tasks. Learn about the latest research developments in generative video LLMs from a leading expert who bridges academic research at the University of North Carolina with industry applications at Amazon. Discover how these advanced models can understand, generate, and manipulate video content while incorporating planning mechanisms that enable autonomous agent behavior. Examine the technical challenges and breakthroughs in multimodal composition, where text, visual, and temporal elements are seamlessly integrated to create sophisticated video content. Gain insights into the current state of the field, emerging applications, and future directions for generative video technologies that combine natural language processing with computer vision and temporal reasoning.
Syllabus
Keynote Talk 5: Mohit Bansal, UNC & Amazon
Taught by
UCF CRCV