Completed
00:00 Introduction to Vision Language Models
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Top Vision Language Models 2025 - Comparing Qwen 2.5 VL, Moondream, and SmolVLM
Automatically move to the next video in the Classroom when playback concludes
- 1 00:00 Introduction to Vision Language Models
- 2 00:55 Model Recommendations: Small vs Large
- 3 02:02 Exploring Moondream's Latest Features
- 4 03:00 Inference with Moondream
- 5 12:20 Fine-Tuning SmolVLM
- 6 12:55 Understanding SmolVLM Architecture
- 7 17:22 Fine-Tuning SmolVLM: Step-by-Step
- 8 32:54 Introducing Qwen 2.5 VL
- 9 37:48 Troubleshooting FlashAttention Installation
- 10 38:42 Updating Transformers and Restarting Kernel
- 11 39:50 Handling Token Limits and VRAM Issues
- 12 40:44 Evaluating Model Performance on Chess Pieces
- 13 42:48 Comparing Performance with Florence 2
- 14 44:46 Training Loop and Data Collator Setup
- 15 50:34 Addressing Memory Issues and Image Resolution
- 16 55:39 Final Training and Evaluation
- 17 01:04:22 Inference and Model Comparison
- 18 01:08:27 Conclusion and WebGPU Demo