⚗️ Experimental Frappe OCR application with tesseract.
This project is a fork of ERPNext-OCR by John Vincent Fiel. Its aim is to fix and cleanup the original source code and add some new features.
https://discuss.erpnext.com/t/erpnext-ocr-app/33834/7
See CHANGELOG
See Taiga.io
Pre-requisites: tesseract-python and imagemagick
Install tesseract-ocr, plus imagemagick and ghostscript (to work with pdf files) using this command on Debian:
sudo apt-get install tesseract-ocr imagemagick libmagickwand-dev ghostscript
Install Frappe application
bench get-app --branch develop erpnext_ocr https://github.com/Monogramm/erpnext_ocr
bench install-app erpnext_ocr
When installing Frappe app, the following python requirements will be installed:
- python binding for tesseract, pytesseract
- image processing library in python, pillow
- HTTP library in python, requests
- python binding for imagemagick, wand
Sample Screenshot:
File Being Read:
In order to use OCR with different languages, you need to install the appropriate trained data files. Check tesseract Wiki for details: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
wand.exceptions.PolicyError: not authorized '/opt/sample.pdf' @ error/constitute.c/ReadImage/412
- This can happen due to security configuration in imagemagick, preventing it to read PDF files.
- Reference:
wand.exceptions.WandRuntimeError: MagickReadImage returns false, but did raise ImageMagick exception. This can occurs when a delegate is missing, or returns EXIT_SUCCESS without generating a raster.
- This might happen if you're missing a dependency to convert PDF, most of the time
ghostscript
- References:
- This might happen if you're missing a dependency to convert PDF, most of the time
OSError: encoder error -2 when writing image file
- This might happen when trying to open a TIFF image, but the real error is "hidden" and only displayed in console.
- If the original error in console is
Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.
that usually happens when TIFF image compression is not valid / recognized.
bench bench run-tests --profile --app erpnext_autoinstall
Monogramm
- Website: https://www.monogramm.io
- Github: @Monogramm
John Vincent Fiel
- Github: @jvfiel
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Check the contributing guide.
Give a ⭐ if this project helped you!
Copyright © 2019 Monogramm.
This project is MIT licensed.
This README was generated with ❤️ by readme-md-generator