Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Massachusetts Institute of Technology

GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

Massachusetts Institute of Technology via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
In this 31-minute talk from the Massachusetts Institute of Technology, explore GLOV, a framework that reduces manual effort in creating effective prompts for vision-language models (VLMs). Learn how large language models (LLMs) can function as implicit optimizers that iteratively refine VLM prompts based on task performance, eliminating the need for human intervention. Discover how embedding space steering vectors guide LLM generation during the optimization process, creating a bias toward more effective prompts. See evaluations across multiple downstream tasks and VLM architectures that demonstrate GLOV's strong generalization capabilities. Presented by Jehanzeb Mirza, a postdoc in MIT CSAIL's Spoken Language Systems group whose research focuses on multi-modal learning and fine-grained understanding.

Syllabus

Jehanzeb Mirza, GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

Taught by

MIT Embodied Intelligence

Reviews

Start your review of GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.