Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Google

PaliGemma - Making Gemma 2 See by Adding a Vision Encoder

Google via YouTube

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore how PaliGemma enhances Gemma 2 with visual capabilities in this 11-minute Google talk. Learn about the integration of a SigLIP vision encoder that enables pre-training on multiple vision tasks including captioning, question answering, object detection, and segmentation. Discover how adjusting image resolution and model size provides flexibility in computational requirements, scaling compute by a factor of 155. The talk, presented by Andreas Steiner, highlights how fine-tuning PaliGemma with your own data can yield excellent performance, particularly for text-related tasks, making it a valuable multimodal extension to the Gemma model family.

Syllabus

PaliGemma – Making Gemma 2 see by adding a vision encoder

Taught by

Google Developers

Reviews

Start your review of PaliGemma - Making Gemma 2 See by Adding a Vision Encoder

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.