Generative Video LLMs - Planning Agents and Multimodal Composition
University of Central Florida via YouTube
Build the Finance Skills That Lead to Promotions — Not Just Certificates
Build GenAI Apps from Scratch — UCSB PaCE Certificate Program
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the cutting-edge intersection of generative artificial intelligence and video understanding in this keynote presentation that delves into how large language models are being adapted for video generation, planning agent capabilities, and multimodal composition tasks. Learn about the latest research developments in generative video LLMs from a leading expert who bridges academic research at the University of North Carolina with industry applications at Amazon. Discover how these advanced models can understand, generate, and manipulate video content while incorporating planning mechanisms that enable autonomous agent behavior. Examine the technical challenges and breakthroughs in multimodal composition, where text, visual, and temporal elements are seamlessly integrated to create sophisticated video content. Gain insights into the current state of the field, emerging applications, and future directions for generative video technologies that combine natural language processing with computer vision and temporal reasoning.
Syllabus
Keynote Talk 5: Mohit Bansal, UNC & Amazon
Taught by
UCF CRCV