Google AI Professional Certificate - Learn AI Skills That Get You Hired
The Investment Banker Certification
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore MinerU 2.5, a 1.2B vision-language model designed for two-stage optical character recognition (OCR) that supports text, table, and formula recognition. Learn how this local OCR solution compares to classical OCR approaches through practical testing and evaluation. Discover the model's capabilities in extracting structured information from documents, including its performance on various text formats and table structures. Examine the technical implementation details and understand how vision-language models can be applied to document processing tasks. Access the technical report, model weights, and utility tools to implement MinerU 2.5 in your own projects while gaining insights into the advantages and limitations of modern OCR approaches compared to traditional methods.
Syllabus
MinerU 2.5 - Local OCR VLM | Text and Table Extraction Test
Taught by
Venelin Valkov