NY State-Licensed Certificates in Design, Coding & AI — Online
Get 20% off all career paths from fullstack to AI
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn to implement document processing and data extraction using Nanonets-OCR-s, a fine-tuned version of Qwen2.5-VL 3B designed specifically for converting images to Markdown format. Explore how this specialized optical character recognition model can extract complex document elements including tables, equations, signatures, and watermarks from various document types. Set up the development environment using Google Colab and the docext library, then work through practical demonstrations processing financial statements, receipts, and personal documents with watermarks. Discover how to access the model weights on Hugging Face and integrate the OCR capabilities into your own AI projects for automated document processing and structured data extraction workflows.
Syllabus
00:00 - Welcome
01:46 - Model weights on Hugging Face
02:15 - docext library by Nanonets
03:08 - Google Colab setup
08:04 - Financial statement OCR
13:17 - Structured data extraction from receipt
14:58 - Watermark and text extraction from personal document
16:44 - Conclusion
Taught by
Venelin Valkov