Python, Prompt Engineering, Data Science — Build the Skills Employers Want Now
The Most Addictive Python and SQL Courses
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore how PaliGemma enhances Gemma 2 with visual capabilities in this 11-minute Google talk. Learn about the integration of a SigLIP vision encoder that enables pre-training on multiple vision tasks including captioning, question answering, object detection, and segmentation. Discover how adjusting image resolution and model size provides flexibility in computational requirements, scaling compute by a factor of 155. The talk, presented by Andreas Steiner, highlights how fine-tuning PaliGemma with your own data can yield excellent performance, particularly for text-related tasks, making it a valuable multimodal extension to the Gemma model family.
Syllabus
PaliGemma – Making Gemma 2 see by adding a vision encoder
Taught by
Google Developers