Tools to consider for the document service
3 repositories
Tesseract Open Source OCR Engine (main repository)
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
A python module that wraps the pdftoppm utility to convert PDF to PIL Image object