pdf_preprocess_and_ocr

Preprocesses and performs OCR on a batch of PDFs that contain no text streams (e.g. PDFs created from scanned documents)

Requirements:

Windows NT or later
Tesseract OCR
ImageMagick binaries

Place this file in a folder full of PDFs and run it. It will create TIFF files using a preprocessing method conducive to OCR, and then run them each through Tesseract.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
ocrpreprocess.bat		ocrpreprocess.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf_preprocess_and_ocr

About

Releases

Packages

qadan/pdf_preprocess_and_ocr

Folders and files

Latest commit

History

Repository files navigation

pdf_preprocess_and_ocr

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages