Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore MinerU 2.5, a 1.2B vision-language model designed for two-stage optical character recognition (OCR) that supports text, table, and formula recognition. Learn how this local OCR solution compares to classical OCR approaches through practical testing and evaluation. Discover the model's capabilities in extracting structured information from documents, including its performance on various text formats and table structures. Examine the technical implementation details and understand how vision-language models can be applied to document processing tasks. Access the technical report, model weights, and utility tools to implement MinerU 2.5 in your own projects while gaining insights into the advantages and limitations of modern OCR approaches compared to traditional methods.
Syllabus
MinerU 2.5 - Local OCR VLM | Text and Table Extraction Test
Taught by
Venelin Valkov