Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Google

PaliGemma - Making Gemma 2 See by Adding a Vision Encoder

Google via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how PaliGemma enhances Gemma 2 with visual capabilities in this 11-minute Google talk. Learn about the integration of a SigLIP vision encoder that enables pre-training on multiple vision tasks including captioning, question answering, object detection, and segmentation. Discover how adjusting image resolution and model size provides flexibility in computational requirements, scaling compute by a factor of 155. The talk, presented by Andreas Steiner, highlights how fine-tuning PaliGemma with your own data can yield excellent performance, particularly for text-related tasks, making it a valuable multimodal extension to the Gemma model family.

Syllabus

PaliGemma – Making Gemma 2 see by adding a vision encoder

Taught by

Google Developers

Reviews

Start your review of PaliGemma - Making Gemma 2 See by Adding a Vision Encoder

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.