Skip to content

Forced alignment and Goodness of Pronunciation (GOP) with DNN support. Bases on Kaldi.

License

Notifications You must be signed in to change notification settings

chenchy/kaldi-dnn-ali-gop

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 

Repository files navigation

kaldi-gop

Computes GOP (Goodness of Pronunciation) and do forced alignment bases on Kaldi with nnet3 support. The acoustic model is trained using librispeech database (960 hours data) with the scripts under kaldi/egs/librispeech.

How to build

  1. Download Kaldi.
  2. Put the folders under src into kaldi/src.
  3. Compile the code as compiling kaldi (kaldi/src/INSTALL):

Run the example

cd egs/gop-compute
./run.sh --dnn true/false audio_dir data_dir result_dir

See meaning of arguments in run.sh

Notes on data preparation

To use this tool, audio files (.wav) and corresponding transcript (.lab) needs to be prepared and stored in following format:

.
├── ...
├── data_dir                   
│   ├── speaker1 # indicate speaker ID          
│   ├── speaker2         
│   └── speaker3
|       ├── utt1.wav # indicate utterance ID
|       ├── utt1.lab 
└── ...

Do not use space in speaker folder name or utterance file name, using underscore instead. Make sure different speakers have different folder names (speaker ID) and different audio files have different file name (utt ID).

To-do

  • Add GPU support
  • Convert alignment results to readable format (textgrid)
  • Add comparison between GMM and DNN (nnet3)
  • Add feature extraction script

About

Forced alignment and Goodness of Pronunciation (GOP) with DNN support. Bases on Kaldi.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 52.7%
  • Shell 21.4%
  • Python 13.8%
  • Makefile 12.1%