Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ankush-me authored Aug 9, 2023
1 parent e694abf commit 2a9a92b
Showing 1 changed file with 2 additions and 22 deletions.
24 changes: 2 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ This script will generate random scene-text image samples and store them in an h
python visualize_results.py
```
### Pre-generated Dataset
A dataset with approximately 800000 synthetic scene-text images generated with this code can be found [here](https://www.robots.ox.ac.uk/~vgg/data/scenetext/).
A dataset with approximately 800000 synthetic scene-text images generated with this code can be found in the `SynthText.zip` file in the torrent [here](https://academictorrents.com/details/2dba9518166cbd141534cbf381aa3e99a087e83c).

### Adding New Images
Segmentation and depth-maps are required to use new images as background. Sample scripts for obtaining these are available [here](https://github.com/ankush-me/SynthText/tree/master/prep_scripts).
Expand All @@ -60,29 +60,9 @@ as the pre-generated dataset under the `bg_data` directory. The files are:
| `depth.h5` | depth maps |
| `seg.h5` | segmentation maps |

#### Downloading without BitTorrent

Downloading with BitTorrent is strongly recommended. If that is not
possible, the files are also available to download over http from
`https://thor.robots.ox.ac.uk/~vgg/data/scenetext/preproc/<filename>`,
where, `<filename>` can be:

| filenames | size | md5 hash |
|:--------------- | ----:|:-------------------------------- |
| `imnames.cp` | 180K | |
| `bg_img.tar.gz` | 8.9G | 3eac26af5f731792c9d95838a23b5047 |
| `depth.h5` | 15G | af97f6e6c9651af4efb7b1ff12a5dc1b |
| `seg.h5` | 6.9G | 1605f6e629b2524a3902a5ea729e86b2 |

Note: due to large size, `depth.h5` is also available for download as 3-part split-files of 5G each.
These part files are named: `depth.h5-00, depth.h5-01, depth.h5-02`. Download using the path above, and put them together using `cat depth.h5-0* > depth.h5`.
To download, use the something like the following:
```
wget --continue https://thor.robots.ox.ac.uk/~vgg/data/scenetext/preproc/<filename>
```
[`use_preproc_bg.py`](https://github.com/ankush-me/SynthText/blob/master/use_preproc_bg.py) provides sample code for reading this data.

Note: I do not own the copyright to these images.
Note: We do not own the copyright to these images.

### Generating Samples with Text in non-Latin (English) Scripts
- @JarveeLee has modified the pipeline for generating samples with Chinese text [here](https://github.com/JarveeLee/SynthText_Chinese_version).
Expand Down

0 comments on commit 2a9a92b

Please sign in to comment.