Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
lukas-blecher committed Apr 13, 2022
1 parent bb0269a commit e8ccfd6
Showing 1 changed file with 26 additions and 24 deletions.
50 changes: 26 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,36 +5,28 @@ The goal of this project is to create a learning based system that takes an imag

![header](https://user-images.githubusercontent.com/55287601/109183599-69431f00-778e-11eb-9809-d42b9451e018.png)

## Requirements
### Model
* PyTorch (tested on v1.7.1)
* Python 3.7+ & dependencies (specified in `setup.py`)
```
pip install -e .
```
### Dataset
In order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools:
* [XeLaTeX](https://www.ctan.org/pkg/xetex)
* [ImageMagick](https://imagemagick.org/) with [Ghostscript](https://www.ghostscript.com/index.html). (for converting pdf to png)
* [Node.js](https://nodejs.org/) to run [KaTeX](https://github.com/KaTeX/KaTeX) (for normalizing Latex code)
* [`de-macro`](https://www.ctan.org/pkg/de-macro) >= 1.4 (only for parsing arxiv papers)
* Python 3.7+ & dependencies (specified in `setup.py`)

## Using the model
1. Download/Clone this repository
2. Run `pip install -e .`
To run the model you need Python 3.7+

Install the package `pix2tex`:

```pip install git+https://github.com/lukas-blecher/LaTeX-OCR.git```

Thanks to [@katie-lim](https://github.com/katie-lim), you can use a nice user interface as a quick way to get the model prediction. Just call the GUI with `pix2tex_gui`. From here you can take a screenshot and the predicted latex code is rendered using [MathJax](https://www.mathjax.org/) and copied to your clipboard.
Model checkpoints will be automatically downloaded.

Under linux, it is possible to use the GUI with `gnome-screenshot` which comes with multiple monitor support. You just need to run `pix2tex_gui --gnome` (**Note:** you should install `gnome-screenshot` beforehand).
There are two ways to get a prediction from an image.
1. You can use the command line tool by calling `pix2tex_cli`. Here you can parse already existing images from the disk and images in your clipboard.

2. Thanks to [@katie-lim](https://github.com/katie-lim), you can use a nice user interface as a quick way to get the model prediction. Just call the GUI with `pix2tex_gui`. From here you can take a screenshot and the predicted latex code is rendered using [MathJax](https://www.mathjax.org/) and copied to your clipboard.

Under linux, it is possible to use the GUI with `gnome-screenshot` which comes with multiple monitor support. You just need to run `pix2tex_gui --gnome` (**Note:** you should install `gnome-screenshot` beforehand).

![demo](https://user-images.githubusercontent.com/55287601/117812740-77b7b780-b262-11eb-81f6-fc19766ae2ae.gif)

If the model is unsure about the what's in the image it might output a different prediction every time you click "Retry". With the `temperature` parameter you can control this behavior (low temperature will produce the same result).

Alternatively you can use `pix2tex_cli` with similar functionality as `pix2tex_gui`, only as command line tool. In this case you don't need to install PyQt5. Using this script you can also parse already existing images from the disk.

The model works best with images of smaller resolution. That's why I added a preprocessing step where another neural network predicts the optimal resolution of the input image. This model will automatically resize the custom image to best resemble the training data and thus increase performance of images found in the wild. Still it's not perfect might not be able to handle huge images optimally, so don't zoom in all the way before taking a picture.
The model works best with images of smaller resolution. That's why I added a preprocessing step where another neural network predicts the optimal resolution of the input image. This model will automatically resize the custom image to best resemble the training data and thus increase performance of images found in the wild. Still it's not perfect and might not be able to handle huge images optimally, so don't zoom in all the way before taking a picture.

Always double check the result carefully. You can try to redo the prediction with an other resolution if the answer was wrong.

Expand Down Expand Up @@ -69,9 +61,17 @@ The model consist of a ViT [[1](#References)] encoder with a ResNet backbone and
| 0.88 | 0.10 |

## Data
We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. [wikipedia](https://www.wikipedia.org), [arXiv](https://www.arxiv.org). We also use the formulae from the [im2latex-100k](https://zenodo.org/record/56198#.V2px0jXT6eA) dataset.
We need paired data for the network to learn. Luckily there is a lot of LaTeX code on the internet, e.g. [wikipedia](https://www.wikipedia.org), [arXiv](https://www.arxiv.org). We also use the formulae from the [im2latex-100k](https://zenodo.org/record/56198#.V2px0jXT6eA) [[3](#References)] dataset.
All of it can be found [here](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO)

### Dataset Requirements
In order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools:
* [XeLaTeX](https://www.ctan.org/pkg/xetex)
* [ImageMagick](https://imagemagick.org/) with [Ghostscript](https://www.ghostscript.com/index.html). (for converting pdf to png)
* [Node.js](https://nodejs.org/) to run [KaTeX](https://github.com/KaTeX/KaTeX) (for normalizing Latex code)
* [`de-macro`](https://www.ctan.org/pkg/de-macro) >= 1.4 (only for parsing arxiv papers)
* Python 3.7+ & dependencies (specified in `setup.py`)

### Fonts
Latin Modern Math, GFSNeohellenicMath.otf, Asana Math, XITS Math, Cambria Math

Expand All @@ -80,12 +80,12 @@ Latin Modern Math, GFSNeohellenicMath.otf, Asana Math, XITS Math, Cambria Math
- [x] add more evaluation metrics
- [x] create a GUI
- [ ] add beam search
- [ ] support handwritten formulae
- [ ] support handwritten formulae (kinda done, see training colab notebook)
- [ ] reduce model size (distillation)
- [ ] find optimal hyperparameters
- [ ] tweak model structure
- [ ] fix data scraping and scrape more data
- [ ] trace the model
- [ ] trace the model (#2)


## Contribution
Expand All @@ -98,3 +98,5 @@ Code taken and modified from [lucidrains](https://github.com/lucidrains), [rwigh
[1] [An Image is Worth 16x16 Words](https://arxiv.org/abs/2010.11929)

[2] [Attention Is All You Need](https://arxiv.org/abs/1706.03762)

[3] [Image-to-Markup Generation with Coarse-to-Fine Attention](https://arxiv.org/abs/1609.04938v2)

0 comments on commit e8ccfd6

Please sign in to comment.