TrOCR meets CharBERT

TrOCR meets CharBERT

Paper

This paper is accepted for presentation at ICDAR, 2024. In Proceedings of the ICDAR, 2024.

Data

Data used in the project can be downloaded online.

Get Report

Dockerfiles are included. Thus, please ensure that Docker is installed.

Compile the report by navigating to the master_thesis folder, and then build the Docker.
```
docker build -t master_thesis/report:v1.0 . -f report/Dockerfile
```

Run the Docker image.

docker run -it -v $PWD/report:/work/report master_thesis/report:v1.0

After getting into the Docker container, navigate to the report folder.
```
cd report
```
Compile the .tex file to get the report pdf.
```
pdflatex master_thesis_YH.tex
```
Compile the bibliography.
```
bibtex master_thesis_YH
```
Compile the .tex file to get the report pdf.
```
pdflatex master_thesis_YH.tex
```

Run Source Codes

Navigate to master_thesis folder, put data in data/raw/data, and build the docker image.
```
docker build -t master_thesis/ocr_correction:v.1.0 . -f Dockerfile
```
Run the container in an interactive mode. Start training or evaluation by modifying the main.py.
```
python ./src/main.py
```

Folder Structure

master_thesis
│--- README.md        <- Contains an overview of the project, setup 
│                        instructions, and any additional information 
│                        relevant to the project.
│--- Dockerfile 
│--- run_script.sh    <- A shell script for executing common tasks, 
│                        such as setting up the environment, starting 
│                        a training run, or evaluating models.  
│--- setup.py
│--- requirements.txt
│--- config           <- Directory containing configuration files for
│                        models, training processes, or application 
│                        settings.
│--- data             <- Datasets used in the thesis.
│--- models           <- Contains saved models.
│--- notebook         <- Jupyter notebooks for exploratory data analysis.
│--- report           <- Stores the final report.
│--- results          <- Contains output from model evaluations, 
│                        including metrics.
└─── src              <- Source code for the project.
    │--- processor    <- Code related to data preprocessing and  
    │                    preparing raw data for training or evaluation.
    │--- models       <- Definitions of the machine learning models used
    │                    in the thesis
    │--- train        <- Scripts and modules for training models.
    │--- eval         <- Scripts and modules for evaluating models.
    │--- utils        <- Utility functions and classes that support 
    │                    various tasks across the project, such as data 
    │                    loading, metric calculation, and visualization 
    │                    tools.
    └─── tests        <- Automated tests for the codebase

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrOCR meets CharBERT

Paper

Data

Get Report

Run Source Codes

Folder Structure

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
config		config
data/processed		data/processed
models		models
notebook		notebook
report		report
results/trocr_charbert		results/trocr_charbert
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
TrOCR_meets_CharBERT.pdf		TrOCR_meets_CharBERT.pdf
__init__.py		__init__.py
requirements.txt		requirements.txt
run_script.sh		run_script.sh
setup.py		setup.py

Yung-Hsin-Chen/master_thesis

Folders and files

Latest commit

History

Repository files navigation

TrOCR meets CharBERT

Paper

Data

Get Report

Run Source Codes

Folder Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages