forked from openpaperwork/pyocr
-
Notifications
You must be signed in to change notification settings - Fork 0
A Python wrapper for Tesseract and Cuneiform
License
jbest/pyocr
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Pyocr is an optical character recognition (OCR) tool wrapper for python. That is, it helps using OCR tools from a Python program. It has been tested only on GNU/Linux systems. It should also work on similar systems (*BSD, etc). It doesn't work on Windows, MacOSX, etc. Pyocr can be used as a wrapper for google's Tesseract-OCR ( http://code.google.com/p/tesseract-ocr/ ) or Cuneiform. It can read all image types supported by the Python Imaging Library, including jpeg, png, gif, bmp, tiff, and others. It also support bounding box data. USAGE: import Image import sys from pyocr import pyocr tools = pyocr.get_available_tools()[:] if len(tools) == 0: print "No OCR tool found" sys.exit(1) print "Using '%s'" % (tools[0].get_name()) tools[0].image_to_string(Image.open('test.png'), lang='fra', builder=TextBuilder()) DEPENDENCIES: * Pyocr requires python 2.5 or later. * You will need the Python Imaging Library (PIL). Under Debian/Ubuntu, this is the package "python-imaging". * Install an OCR: * tesseract-ocr from http://code.google.com/p/tesseract-ocr/ . You must be able to invoke the tesseract command as "tesseract". Python-tesseract is tested with Tesseract >= 3.01 only. * or cuneiform INSTALLATION: $ sudo python ./setup.py install TESTS: Tests are made to be run with the latest versions of Tesseract and Cuneiform. the first test verifies that you're using the expected version. COPYRIGHT: Pyocr is released under the GPL v3+. tesseract.py: Copyright (c) Samuel Hoffstaetter, 2009 Copyright (c) Jerome Flesch, 2011-2012 other files: Copyright (c) Jerome Flesch, 2011-2012 http://wiki.github.com/jflesch/python-tesseract
About
A Python wrapper for Tesseract and Cuneiform
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published