data extraction
Extract structured data from PDF invoices
A Unified Toolkit for Deep Learning Based Document Image Analysis
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
A curated list of resources for Document Understanding (DU) topic
Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.
ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…
Document Layout Analysis resources repos for development with PdfPig.
Research papers and code on information extraction from image/pdf
A curated list of awesome information retrieval resources
📖 A curated list of awesome resources dedicated to Relation Extraction, one of the most important tasks in Natural Language Processing (NLP).
A curated list of awesome data labeling tools
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing and batching to deliver high-quality text extraction from comp…
Python tool for converting files and office documents to Markdown.