Gemini's Multimodal Capabilities - Deep Dive into Native Multimodality and AI Vision

Gemini's Multimodal Capabilities - Deep Dive into Native Multimodality and AI Vision

Google Developers via YouTube Direct link

17:15 - The vision for proactive assistants

7 of 12

7 of 12

17:15 - The vision for proactive assistants

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Gemini's Multimodal Capabilities - Deep Dive into Native Multimodality and AI Vision

Automatically move to the next video in the Classroom when playback concludes

  1. 1 0:00 - Intro
  2. 2 1:12 - Why Gemini is natively multimodal
  3. 3 2:23 - The technology behind multimodal models
  4. 4 5:15 - Video understanding with Gemini 2.5
  5. 5 9:25 - Deciding what to build next
  6. 6 13:23 - Building new product experiences with multimodal AI
  7. 7 17:15 - The vision for proactive assistants
  8. 8 24:13 - Improving video usability with variable FPS and frame tokenization
  9. 9 27:35 - What’s next for Gemini’s multimodal development
  10. 10 31:47 - Deep dive on Gemini’s document understanding capabilities
  11. 11 37:56 - The teamwork and collaboration behind Gemini
  12. 12 40:56 - What’s next with model behavior

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.