Stars
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
pix2tex: Using a ViT to convert images of equations into LaTeX code.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Implementation of the paper "MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation".
A Unified Toolkit for Deep Learning Based Document Image Analysis
Convert a PDF via OCR to a TXT file in UTF-8 encoding
[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipe…
Python code to read text from a PDF file (OCR).
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
Math formula recognition (Images to LaTeX strings)
Call mathpix API to make Mathpix snipping tool.
Extract tables from scanned image PDFs using Optical Character Recognition.