Post-OCR processing with BERT and NMT models

To use this code, just run "python3 run.py" in src/pipeline folder to run the inference of the tagger and translator models. Tagger and translator models can be switched between those available at: https://drive.google.com/drive/folders/16RwyO8deQD9UDbHj_ccElEhHq5gnqKQQ?usp=sharing. After downloading the folder, just paste its content into src/models. Evaluation folder contains the code to run and save evaluation of the models. Evaluations are produced after inference. io/process_data.py can download and pre-process data and create datasets from mC4 or other files provided as .csv files.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Post-OCR processing with BERT and NMT models

About

Releases

Packages

Languages

federico-stacchietti/Post-ocr

Folders and files

Latest commit

History

Repository files navigation

Post-OCR processing with BERT and NMT models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages