This paper is accepted for presentation at ICDAR, 2024. In Proceedings of the ICDAR, 2024.
Data used in the project can be downloaded online.
Dockerfiles are included. Thus, please ensure that Docker is installed.
- Compile the report by navigating to the
master_thesis
folder, and then build the Docker.docker build -t master_thesis/report:v1.0 . -f report/Dockerfile
- Run the Docker image.
docker run -it -v $PWD/report:/work/report master_thesis/report:v1.0
- After getting into the Docker container, navigate to the
report
folder.cd report
- Compile the
.tex
file to get the report pdf.pdflatex master_thesis_YH.tex
- Compile the bibliography.
bibtex master_thesis_YH
- Compile the
.tex
file to get the report pdf.pdflatex master_thesis_YH.tex
- Navigate to
master_thesis
folder, put data indata/raw/data
, and build the docker image.docker build -t master_thesis/ocr_correction:v.1.0 . -f Dockerfile
- Run the container in an interactive mode. Start training or evaluation by modifying the
main.py
.python ./src/main.py
master_thesis
│--- README.md <- Contains an overview of the project, setup
│ instructions, and any additional information
│ relevant to the project.
│--- Dockerfile
│--- run_script.sh <- A shell script for executing common tasks,
│ such as setting up the environment, starting
│ a training run, or evaluating models.
│--- setup.py
│--- requirements.txt
│--- config <- Directory containing configuration files for
│ models, training processes, or application
│ settings.
│--- data <- Datasets used in the thesis.
│--- models <- Contains saved models.
│--- notebook <- Jupyter notebooks for exploratory data analysis.
│--- report <- Stores the final report.
│--- results <- Contains output from model evaluations,
│ including metrics.
└─── src <- Source code for the project.
│--- processor <- Code related to data preprocessing and
│ preparing raw data for training or evaluation.
│--- models <- Definitions of the machine learning models used
│ in the thesis
│--- train <- Scripts and modules for training models.
│--- eval <- Scripts and modules for evaluating models.
│--- utils <- Utility functions and classes that support
│ various tasks across the project, such as data
│ loading, metric calculation, and visualization
│ tools.
└─── tests <- Automated tests for the codebase