Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Google

How a Moonshot Led to Google DeepMind's Veo 3

Google via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the technical journey behind Google DeepMind's groundbreaking Veo video generation models in this 48-minute conference talk featuring Dumi Erhan, co-lead of the Veo project, in conversation with host Logan Kilpatrick. Trace the evolution of generative video technology from its early research foundations in 2018 through the development of the state-of-the-art Veo 3 model with native audio generation capabilities. Discover the technical challenges involved in evaluating and scaling video models, including the complexities of maintaining coherence across long-duration videos and the limitations of physics-based evaluation methods. Learn about the significant leap from Veo 1 to Veo 2, the viral impact of Veo 3's audio capabilities, and how user feedback is actively shaping the future roadmap of AI-powered video creation. Examine the comparative complexity between image-to-video and text-to-video generation, explore new prompting methods that enhance user control, and understand the steerability challenges in video model development. Gain insights into the role of image data in capability transfer, the connection between video prediction and robotics applications, and the broader implications of world models like Genie 3 for the future of AI-generated content.

Syllabus

0:00 - Intro
0:47 - Veo project's beginnings
3:02 - Veo's origins in Google Brain
5:07 - Video prediction and robotics applications
7:45 - Early progress and evaluation challenges
10:30 - Physics-based evaluations and their limitations
12:18 - The launch of the original Veo model
14:06 - Scaling challenges for video models
16:02 - The leap from Veo1 to Veo2
19:40 - Veo 3’s viral audio moment
21:17 - User trends shaping Veo's roadmap
23:49 - Image-to-video vs. text-to-video complexity
26:00 - New prompting methods and user control
27:55 - Coherence in long video generation
31:03 - Genie 3 and world models
35:54 - The steerability challenge
41:59 - Capability transfer and image data's role
47:25 - Closing

Taught by

Google Developers

Reviews

Start your review of How a Moonshot Led to Google DeepMind's Veo 3

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.