Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Data		Data
Models		Models
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
ctc_model.py		ctc_model.py
ctc_predict.py		ctc_predict.py
ctc_training.py		ctc_training.py
ctc_utils.py		ctc_utils.py
primus.py		primus.py

Repository files navigation

tf-deep-omr

TensorFlow code to perform end-to-end Optical Music Recognition on monophonic scores through Convolutional Recurrent Neural Networks and CTC-based training.

Citation

This repository was used for the experiments reported in the paper:

End-to-End Neural Optical Music Recognition of Monophonic Scores

@Article{Calvo-Zaragoza2018,
  AUTHOR = {Calvo-Zaragoza, Jorge and Rizo, David},
  TITLE = {End-to-End Neural Optical Music Recognition of Monophonic Scores},
  JOURNAL = {Applied Sciences},
  VOLUME = {8},
  YEAR = {2018},
  NUMBER = {4},
  ARTICLE NUMBER = {606},
  URL = {http://www.mdpi.com/2076-3417/8/4/606},
  ISSN = {2076-3417},
  DOI = {10.3390/app8040606}
}

Corpora

This repository is intended for the Printed Images of Music Staves (PrIMuS) dataset.

PrIMuS can be donwloaded from https://grfia.dlsi.ua.es/primus/

Using the repository

Training

Assuming that PrIMuS dataset has been downloaded, and all its samples has been placed in the same folder, the training of the models can be done with ctc_training.py. It is necessary to build a list of training samples and the set of symbols (vocabulary). Examples of these files are given in Datafolder.

Semantic

python ctc_training.py -semantic -corpus <path_to_PrIMuS> -set Data/train.txt -vocabulary Data/vocabulary_semantic.txt -save_model ./trained_semantic_model

Agnostic

python ctc_training.py -corpus <path_to_PrIMuS> -set Data/train.txt -vocabulary Data/vocabulary_agnostic.txt -save_model ./trained_agnostic_model

Prediction

For running inference over an input image, ctc_predict.py can be used, along with an input image, a trained model, and the corresponding vocabulary file. The repository is provided with trained models with both agnostic and semantic representations in Models folder. These models were the result of the traning process for one of the folds of the 10-fold cross-validation considered in the paper.

Let's see an example for the sample from PrIMuS provided in Data/Example:

This sample belongs to the test set of the aforementioned fold, so it was not seen by the networks during their training stage.

Semantic

Running

python ctc_predict.py -image Data/Example/000051652-1_2_1.png -model Models/agnostic_model.meta -vocabulary Data/vocabulary_agnostic.txt

should get the following prediction

clef-C1 	keySignature-EbM 	timeSignature-2/4 	multirest-23 	barline 	rest-quarter 	rest-eighth 	note-Bb4_eighth 	barline 	note-Bb4_quarter. 	note-G4_eighth 	barline 	note-Eb5_quarter. 	note-D5_eighth 	barline 	note-C5_eighth 	note-C5_eighth 	rest-quarter 	barline

The ground-truth of is given in Data/Example/000051652-1_2_1.semantic:

clef-C1	    keySignature-EbM	timeSignature-2/4	multirest-23	barline	rest-quarter	rest-eighth	        note-Bb4_eighth	    barline	    note-Bb4_quarter.	note-G4_eighth	barline	    note-Eb5_quarter.	note-D5_eighth	barline	    note-C5_eighth	note-C5_eighth	rest-quarter	barline

It can be observed that the staff section is perfectly recognized by the model.

Agnostic

Running

python ctc_predict.py -image Data/Example/000051652-1_2_1.png -model Models/agnostic_model.meta -vocabulary Data/vocabulary_agnostic.txt

should get the following prediction

clef.C-L1 	accidental.flat-L4 	accidental.flat-L2 	accidental.flat-S3 	digit.2-L4 	digit.4-L2 	digit.2-S5 	digit.3-S5 	multirest-L3 	barline-L1 	rest.quarter-L3 	rest.eighth-L3 	note.eighth-L4 	barline-L1 	note.quarter-L4 	dot-S4 	note.eighth-L3 	barline-L1 	note.quarter-S5 	dot-S5 	note.eighth-L5 	barline-L1 	note.eighth-S4 	note.eighth-S4 	rest.quarter-L3

The ground-truth of this example is given in Data/Example/000051652-1_2_1.agnostic:

clef.C-L1	accidental.flat-L4	accidental.flat-L2	accidental.flat-S3	digit.2-L4	digit.4-L2	digit.2-S5	digit.3-S5	multirest-L3	barline-L1	rest.quarter-L3	    rest.eighth-L3	note.eighth-L4	barline-L1	note.quarter-L4	    dot-S4	note.eighth-L3	barline-L1	note.quarter-S5	    dot-S5	note.eighth-L5	barline-L1	note.eighth-S4	note.eighth-S4	rest.quarter-L3	barline-L1

As discussed in the paper, this representation often misses the last barline.

Contact:

Jorge Calvo Zaragoza ([email protected])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tf-deep-omr

Citation

Corpora

Using the repository

Training

Semantic

Agnostic

Prediction

Semantic

Agnostic

Contact:

About

Releases

Packages

Languages

License

OMR-Research/tf-end-to-end

Folders and files

Latest commit

History

Repository files navigation

tf-deep-omr

Citation

Corpora

Using the repository

Training

Semantic

Agnostic

Prediction

Semantic

Agnostic

Contact:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages