Skip to content

Commit

Permalink
Add perl command to edit spa.stopwords.txt
Browse files Browse the repository at this point in the history
Signed-off-by: Stefan Weil <[email protected]>
  • Loading branch information
stweil committed Oct 5, 2023
1 parent 85b14a5 commit ec5df67
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions unlvtests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ wget -O spa.stopwords.txt https://raw.githubusercontent.com/stopwords-iso/stopwo
Edit ~/ISRI-OCRtk/stopwords/spa.stopwords.txt
wordacc uses a space delimited stopwords file, not line delimited.
s/\n/ /g
```
perl -pi -e 's/\n/ /' ~/ISRI-OCRtk/stopwords/spa.stopwords.txt
```

Edit ~/ISRI-OCRtk/spn.3B/pages
Delete the line containing the following imagename as it [crashes tesseract](https://github.com/tesseract-ocr/tesseract/issues/1647#issuecomment-395954717).
Expand Down

0 comments on commit ec5df67

Please sign in to comment.