Generative Video LLMs - Planning Agents and Multimodal Composition
University of Central Florida via YouTube
PowerBI Data Analyst - Create visualizations and dashboards from scratch
All Coursera Certificates 40% Off
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the cutting-edge intersection of generative artificial intelligence and video understanding in this keynote presentation that delves into how large language models are being adapted for video generation, planning agent capabilities, and multimodal composition tasks. Learn about the latest research developments in generative video LLMs from a leading expert who bridges academic research at the University of North Carolina with industry applications at Amazon. Discover how these advanced models can understand, generate, and manipulate video content while incorporating planning mechanisms that enable autonomous agent behavior. Examine the technical challenges and breakthroughs in multimodal composition, where text, visual, and temporal elements are seamlessly integrated to create sophisticated video content. Gain insights into the current state of the field, emerging applications, and future directions for generative video technologies that combine natural language processing with computer vision and temporal reasoning.
Syllabus
Keynote Talk 5: Mohit Bansal, UNC & Amazon
Taught by
UCF CRCV