Skip to content
View freefaler's full-sized avatar

Block or report freefaler

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

data extraction

19 repositories

Extract structured data from PDF invoices

Python 1,878 484 Updated Jan 6, 2025

A Unified Toolkit for Deep Learning Based Document Image Analysis

Python 5,008 477 Updated Aug 15, 2024

A Repo For Document AI

Python 2,653 145 Updated Jan 9, 2025

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Python 4,129 462 Updated Jan 6, 2025

A curated list of resources for Document Understanding (DU) topic

1,342 153 Updated Jun 2, 2023

Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

Python 78 11 Updated Sep 5, 2024

Detectron2 for Document Layout Analysis

Python 185 63 Updated Aug 2, 2024
Jupyter Notebook 129 33 Updated Mar 24, 2023

ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation

Jupyter Notebook 130 11 Updated Nov 13, 2024

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…

Python 2,399 264 Updated Jun 24, 2024

Document Layout Analysis resources repos for development with PdfPig.

C# 597 67 Updated Oct 1, 2023

Research papers and code on information extraction from image/pdf

96 11 Updated Nov 25, 2022

A curated list of awesome information retrieval resources

1,081 138 Updated Apr 20, 2023

📖 A curated list of awesome resources dedicated to Relation Extraction, one of the most important tasks in Natural Language Processing (NLP).

1,189 136 Updated Jan 27, 2022

A curated list of awesome data labeling tools

3,860 442 Updated Jun 17, 2024

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Python 6,188 556 Updated Jan 10, 2025

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 6,531 573 Updated Dec 31, 2024

An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing and batching to deliver high-quality text extraction from comp…

Python 809 63 Updated Sep 25, 2024

Python tool for converting files and office documents to Markdown.

Python 33,415 1,426 Updated Jan 6, 2025