Generative Video LLMs - Planning Agents and Multimodal Composition

Explore the cutting-edge intersection of generative artificial intelligence and video understanding in this keynote presentation that delves into how large language models are being adapted for video generation, planning agent capabilities, and multimodal composition tasks. Learn about the latest research developments in generative video LLMs from a leading expert who bridges academic research at the University of North Carolina with industry applications at Amazon. Discover how these advanced models can understand, generate, and manipulate video content while incorporating planning mechanisms that enable autonomous agent behavior. Examine the technical challenges and breakthroughs in multimodal composition, where text, visual, and temporal elements are seamlessly integrated to create sophisticated video content. Gain insights into the current state of the field, emerging applications, and future directions for generative video technologies that combine natural language processing with computer vision and temporal reasoning.