Skip to content

Preprocesses and performs OCR on a batch of PDFs that contain no text streams

Notifications You must be signed in to change notification settings

qadan/pdf_preprocess_and_ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

pdf_preprocess_and_ocr

Preprocesses and performs OCR on a batch of PDFs that contain no text streams (e.g. PDFs created from scanned documents)

Requirements:

  • Windows NT or later
  • Tesseract OCR
  • ImageMagick binaries

Place this file in a folder full of PDFs and run it. It will create TIFF files using a preprocessing method conducive to OCR, and then run them each through Tesseract.

About

Preprocesses and performs OCR on a batch of PDFs that contain no text streams

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published