Building Local LLMs for OCR, Object Detection and Image Parsing Using Mono-InternVL
Machine Learning With Hamza via YouTube
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Get Coursera Plus for 40% off
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to implement and run the Mono-InternVL model locally for performing OCR, object detection, code generation, and document parsing tasks in this 16-minute tutorial video. Discover how this newly introduced small Vision Language Model (VLM) achieves top precision while maintaining efficient performance. Follow along with a detailed walkthrough covering model architecture, key features, and step-by-step implementation instructions for local deployment. Gain hands-on experience working with the model through practical demonstrations and code examples, with references to the official repository, research paper, and Hugging Face model implementation.
Syllabus
Intro
Model presentation
Run the model locally
Taught by
Machine Learning With Hamza