Computes GOP (Goodness of Pronunciation) and do forced alignment bases on Kaldi with nnet3 support. The acoustic model is trained using librispeech database (960 hours data) with the scripts under kaldi/egs/librispeech.
- Download Kaldi. Don't compile.
- Put the folders under src into kaldi/src (replace Makefile).
- Compile the code as compiling kaldi (kaldi/src/INSTALL).
- Change KALDI_ROOT in egs/gop-compute/path.sh to your own KALDI_ROOT
cd egs/gop-compute
./run.sh --dnn true/false audio_dir data_dir result_dir
See meaning of arguments in run.sh
To use this tool, audio files (.wav) and corresponding transcript (.lab) needs to be prepared and stored in following format:
.
├── ...
├── data_dir
│ ├── speaker1 # indicate speaker ID
│ ├── speaker2
│ └── speaker3
| ├── utt1.wav # indicate utterance ID
| ├── utt1.lab
└── ...
Do not use space in speaker folder name or utterance file name, using underscore instead. Make sure different speakers have different folder names (speaker ID) and different audio files have different file name (utt ID).
- Add GPU support
- Convert alignment results to readable format (textgrid)
- Add comparison between GMM and DNN (nnet3)
- Add feature extraction script