Top Vision Language Models 2025 - Comparing Qwen 2.5 VL, Moondream, and SmolVLM

Top Vision Language Models 2025 - Comparing Qwen 2.5 VL, Moondream, and SmolVLM

Trelis Research via YouTube Direct link

00:00 Introduction to Vision Language Models

1 of 18

1 of 18

00:00 Introduction to Vision Language Models

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Top Vision Language Models 2025 - Comparing Qwen 2.5 VL, Moondream, and SmolVLM

Automatically move to the next video in the Classroom when playback concludes

  1. 1 00:00 Introduction to Vision Language Models
  2. 2 00:55 Model Recommendations: Small vs Large
  3. 3 02:02 Exploring Moondream's Latest Features
  4. 4 03:00 Inference with Moondream
  5. 5 12:20 Fine-Tuning SmolVLM
  6. 6 12:55 Understanding SmolVLM Architecture
  7. 7 17:22 Fine-Tuning SmolVLM: Step-by-Step
  8. 8 32:54 Introducing Qwen 2.5 VL
  9. 9 37:48 Troubleshooting FlashAttention Installation
  10. 10 38:42 Updating Transformers and Restarting Kernel
  11. 11 39:50 Handling Token Limits and VRAM Issues
  12. 12 40:44 Evaluating Model Performance on Chess Pieces
  13. 13 42:48 Comparing Performance with Florence 2
  14. 14 44:46 Training Loop and Data Collator Setup
  15. 15 50:34 Addressing Memory Issues and Image Resolution
  16. 16 55:39 Final Training and Evaluation
  17. 17 01:04:22 Inference and Model Comparison
  18. 18 01:08:27 Conclusion and WebGPU Demo

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.