Free courses from frontend to fullstack and AI
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore a 37-minute technical presentation that delves into the inner workings of LLaVa Chain of Thought (CoT), focusing on data preparation, training methodologies, and inference-time scaling for Vision Language Models (VLMs). Learn about the importance of reasoning capabilities in VLMs, discover the process of synthetic data generation, and understand how datasets are created and utilized. Gain insights into inference-time scaling techniques and model training approaches, with practical demonstrations and access to relevant datasets through the Image-CoT-1m repository. Follow along with detailed explanations of dataset generation methods, complete with real-world examples and implementation strategies for building more capable vision-language models.
Syllabus
Intro
Overview of VLLMs
Why VLLMs Need Reasoning
LLaVa Chain of Thought
Synthetic Data Generation
Generating Datasets
Where to find the Datasets
How we Generated the Synthetic Data
Questions
What is Inference-Time Scaling?
Model Training
Taught by
Oxen