Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
dicts		dicts
fonts		fonts
handwritten_model		handwritten_model
out		out
samples		samples
texts		texts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
background_generator.py		background_generator.py
computer_text_generator.py		computer_text_generator.py
data_generator.py		data_generator.py
handwritten_text_generator.py		handwritten_text_generator.py
requirements.txt		requirements.txt
run.py		run.py

Repository files navigation

TextRecognitionDataGenerator

A synthetic data generator for text recognition

What is it for?

Generating text image samples to train an OCR software

What do I need to make it work?

I use Archlinux so I cannot tell if it works on Windows yet.

Python 3.X
OpenCV 3.2 (It probably works with 2.4)
Pillow
Numpy
Requests
BeautifulSoup

You can simply use pip install -r requirements.txt too.

How does it work?

python run.py -w 5 -f 64

You get 1000 randomly generated images with random text on them like:

What if you want random skewing? Add -k and -rk (python run.py -w 5 -f 64 -k 5 -rk)

But scanned document usually aren't that clear are they? Add -bl and -rbl to get gaussian blur on the generated image with user-defined radius (here 0, 1, 2, 4):

Maybe you want another background? Add -b to define one of the three available backgrounds: gaussian noise (0), plain white (1) or quasicrystal (2).

Or maybe you are working on an OCR for handwritten text? Add -hw! (Experimental)

It uses a Tensorflow model trained using this excellent project by Grzego.

The project does not require TensorFlow to run if you aren't using this feature

The text is chosen at random in a dictionary file (that can be found in the dicts folder) and drawn on a white background made with Gaussian noise. The resulting image is saved as [text]_[index].jpg

New

You can "fake" handwriting using -hw
You can add gaussian blur to the resulting image
Sentences from Wikipedia can be used instead of random words with python run.py -wk 1 (requires an Internet connection)
Sentences can be picked from a file passed as a parameter with python run.py -i ./texts/random_1.txt

There are a lot of parameters that you can tune to get the results you want, therefore I recommand checking out python run.py -h for more informations.

Can I add my own font?

Yes, the script picks a font at random from the fonts directory. Simply add / remove fonts until you get the desired output.

It only supports .ttf for now.

Feature request & issues

If anything is missing, unclear, or simply not working, open an issue on the repository.

What is left to do?

Better background generation
Better handwritten text generation
More customization parameters (mostly regarding background)
Implement --include_symbols
Implement --include_numbers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextRecognitionDataGenerator

What is it for?

What do I need to make it work?

How does it work?

Can I add my own font?

Feature request & issues

What is left to do?

About

Releases

Packages

Languages

License

Crwing/TextRecognitionDataGenerator

Folders and files

Latest commit

History

Repository files navigation

TextRecognitionDataGenerator

What is it for?

What do I need to make it work?

How does it work?

Can I add my own font?

Feature request & issues

What is left to do?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages