Skip to content

Commit

Permalink
update tesseract manpage and INSTALL.SVN
Browse files Browse the repository at this point in the history
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@1131 d0cd1f9f-072b-0410-8dd7-cf729c803f20
  • Loading branch information
zdenop committed Aug 2, 2014
1 parent 1ea3872 commit 19ddc89
Show file tree
Hide file tree
Showing 5 changed files with 628 additions and 183 deletions.
10 changes: 4 additions & 6 deletions INSTALL.SVN
Original file line number Diff line number Diff line change
Expand Up @@ -29,21 +29,19 @@ they are present in tessdata directory)
will install tesseract library, tesseract programs and English, Arabic
and German language datafiles.

To compile ScrollView.jar you need to download piccolo.JAVA[1] from
http://www.piccolo2d.org/download.html and extract
piccolo-1.2/build/piccolo.jar to tesseract/java as piccolo-1.2.jar and
piccolo-1.2/build/piccolox.jar to tesseract/java as piccolox-1.2.jar.
To compile ScrollView.jar you need to download piccolo2d-core-3.0.jar
and piccolo2d-extras-3.0.jar[1] and place them to tesseract/java.

Than run:
$ make ScrollView.jar

and follow instruction on Viewer Debugging wiki[2].

[1] http://www.cs.umd.edu/hcil/jazz/download/piccolo/piccolo-1.2-compiled.zip
[1] http://search.maven.org/#search|ga|1|g%3A%22org.piccolo2d%22
[2] http://code.google.com/p/tesseract-ocr/wiki/ViewerDebugging


WINDOWS
=======

Please read vs2008/doc/index.html
Please read vs2008/doc/index.html and http://vorba.ch/2014/tesseract-3.03-vs2013.html
84 changes: 66 additions & 18 deletions doc/tesseract.1
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
'\" t
.\" Title: tesseract
.\" Author: [see the "AUTHOR" section]
.\" Generator: DocBook XSL Stylesheets v1.75.2 <http://docbook.sf.net/>
.\" Date: 04/09/2012
.\" Generator: DocBook XSL Stylesheets v1.78.1 <http://docbook.sf.net/>
.\" Date: 08/02/2014
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "TESSERACT" "1" "04/09/2012" "\ \&" "\ \&"
.TH "TESSERACT" "1" "08/02/2014" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
Expand All @@ -31,21 +31,52 @@
tesseract \- command\-line OCR engine
.SH "SYNOPSIS"
.sp
\fBtesseract\fR \fIimagename\fR \fIoutbase\fR|\fIstdout\fR [\fI\-l lang\fR] [\fI\-psm N\fR] [\gI\-c configvar=value\fR] [\fIconfigfile\fR \&...]
\fBtesseract\fR \fIimagename\fR|\fIstdin\fR \fIoutputbase\fR|\fIstdout\fR [options\&...] [configfile\&...]
.SH "DESCRIPTION"
.sp
tesseract(1) is a commercial quality OCR engine originally developed at HP between 1985 and 1995\&. In 1995, this engine was among the top 3 evaluated by UNLV\&. It was open\-sourced by HP and UNLV in 2005, and has been developed at Google since then\&.
.SH "OPTIONS"
.SH "IN/OUT ARGUMENTS"
.PP
\fIimagename\fR
.RS 4
The name of the input image\&. Most image file formats (anything readable by Leptonica) are supported\&.
.RE
.PP
\fIoutbase\fR
\fIstdin\fR
.RS 4
Instruction to read data from standard input
.RE
.PP
\fIoutputbase\fR
.RS 4
The basename of the output file (to which the appropriate extension will be appended)\&. By default the output will be named
\fIoutbase\&.txt\fR\&. When stdout is used as outbase, output will be sent to stdout\&.
\fIoutbase\&.txt\fR\&.
.RE
.PP
\fIstdout\fR
.RS 4
Instruction to sent output data to standard output
.RE
.SH "OPTIONS"
.PP
\fI\-\-tessdata\-dir /path\fR
.RS 4
Specify the location of tessdata path
.RE
.PP
\fI\-\-user\-words /path/to/file\fR
.RS 4
Specify the location of user words file
.RE
.PP
\fI\-\-user\-patterns /path/to/file specify\fR
.RS 4
The location of user patterns file
.RE
.PP
\fI\-c configvar=value\fR
.RS 4
Set value for control parameter\&. Multiple \-c arguments are allowed\&.
.RE
.PP
\fI\-l lang\fR
Expand Down Expand Up @@ -80,16 +111,6 @@ are:
.\}
.RE
.PP
\fI\-c configvar=value\fR
.RS 4
Sets a configuration variable\&. Multiple options can be set by using -c multiple times, once for each option\&.
.RE
.PP
\fI\-v\fR
.RS 4
Returns the current version of the tesseract(1) executable\&.
.RE
.PP
\fIconfigfile\fR
.RS 4
The name of a config to use\&. A config is a plaintext file which contains a list of variables and their values, one per line, with a space separating variable from value\&. Interesting config files include:
Expand All @@ -105,9 +126,36 @@ The name of a config to use\&. A config is a plaintext file which contains a lis
.\}
hocr \- Output in hOCR format instead of as a text file\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
pdf \- Output in pdf instead of a text file\&.
.RE
.RE
.sp
\fBNota Bene:\fR The options \fI\-l lang\fR, \fI\-psm N\fR and \fI\-c configvar=value\fR must occur before any \fIconfigfile\fR\&.
\fBNota Bene:\fR The options \fI\-l lang\fR and \fI\-psm N\fR must occur before any \fIconfigfile\fR\&.
.SH "SINGLE OPTIONS"
.PP
\fI\-v\fR
.RS 4
Returns the current version of the tesseract(1) executable\&.
.RE
.PP
\fI\-\-list\-langs\fR
.RS 4
list available languages for tesseract engine\&. Can be used with \-\-tessdata\-dir\&.
.RE
.PP
\fI\-\-print\-parameters\fR
.RS 4
print tesseract parameters to the stdout\&.
.RE
.SH "LANGUAGES"
.sp
There are currently language packs available for the following languages:
Expand Down
52 changes: 43 additions & 9 deletions doc/tesseract.1.asc
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ tesseract - command-line OCR engine

SYNOPSIS
--------
*tesseract* 'imagename' 'outbase' ['-l lang'] ['-psm N'] ['configfile' ...]
*tesseract* 'imagename'|'stdin' 'outputbase'|'stdout' [options...] [configfile...]

DESCRIPTION
-----------
Expand All @@ -18,24 +18,46 @@ UNLV. It was open-sourced by HP and UNLV in 2005, and has been developed
at Google since then.


OPTIONS
-------
IN/OUT ARGUMENTS
----------------
'imagename'::
The name of the input image. Most image file formats (anything
readable by Leptonica) are supported.

'outbase'::
'stdin'::
Instruction to read data from standard input

'outputbase'::
The basename of the output file (to which the appropriate extension
will be appended). By default the output will be named 'outbase.txt'.

'stdout'::
Instruction to sent output data to standard output


OPTIONS
-------
'--tessdata-dir /path'::
Specify the location of tessdata path

'--user-words /path/to/file'::
Specify the location of user words file

'--user-patterns /path/to/file specify'::
The location of user patterns file

'-c configvar=value'::
Set value for control parameter. Multiple -c arguments are allowed.

'-l lang'::
The language to use. If none is specified, English is assumed.
Multiple languages may be specified, separated by plus characters.
Tesseract uses 3-character ISO 639-2 language codes. (See LANGUAGES)

'-psm N'::
Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for *N* are:

Set Tesseract to only run a subset of layout analysis and assume
a certain form of image. The options for *N* are:

0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR.
Expand All @@ -48,19 +70,31 @@ OPTIONS
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.

'-v'::
Returns the current version of the tesseract(1) executable.

'configfile'::
The name of a config to use. A config is a plaintext file which
contains a list of variables and their values, one per line, with a
space separating variable from value. Interesting config files
include: +
* hocr - Output in hOCR format instead of as a text file.
* pdf - Output in pdf instead of a text file.

*Nota Bene:* The options '-l lang' and '-psm N' must occur
before any 'configfile'.


SINGLE OPTIONS
--------------
'-v'::
Returns the current version of the tesseract(1) executable.
'--list-langs'::
list available languages for tesseract engine. Can be used with --tessdata-dir.
'--print-parameters'::
print tesseract parameters to the stdout.
LANGUAGES
---------

Expand Down
Loading

0 comments on commit 19ddc89

Please sign in to comment.