Google AI Professional Certificate - Learn AI Skills That Get You Hired
Free courses from frontend to fullstack and AI
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to implement document processing and data extraction using Nanonets-OCR-s, a fine-tuned version of Qwen2.5-VL 3B designed specifically for converting images to Markdown format. Explore how this specialized optical character recognition model can extract complex document elements including tables, equations, signatures, and watermarks from various document types. Set up the development environment using Google Colab and the docext library, then work through practical demonstrations processing financial statements, receipts, and personal documents with watermarks. Discover how to access the model weights on Hugging Face and integrate the OCR capabilities into your own AI projects for automated document processing and structured data extraction workflows.
Syllabus
00:00 - Welcome
01:46 - Model weights on Hugging Face
02:15 - docext library by Nanonets
03:08 - Google Colab setup
08:04 - Financial statement OCR
13:17 - Structured data extraction from receipt
14:58 - Watermark and text extraction from personal document
16:44 - Conclusion
Taught by
Venelin Valkov