Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how PaliGemma enhances Gemma 2 with visual capabilities in this 11-minute Google talk. Learn about the integration of a SigLIP vision encoder that enables pre-training on multiple vision tasks including captioning, question answering, object detection, and segmentation. Discover how adjusting image resolution and model size provides flexibility in computational requirements, scaling compute by a factor of 155. The talk, presented by Andreas Steiner, highlights how fine-tuning PaliGemma with your own data can yield excellent performance, particularly for text-related tasks, making it a valuable multimodal extension to the Gemma model family.
Syllabus
PaliGemma – Making Gemma 2 see by adding a vision encoder
Taught by
Google Developers