speaker conditioned voice activity detection replicated from https://arxiv.org/abs/1908.04284
Classifier: {non-speech, target speaker, and non-target speaker}
-
Synthetic dataset generation
prep4kaldi.sh
flac_to_wav.sh
concat.sh concat.py
augment.py -
Prepare target speaker embeddings
extract_embeddings.py -
Extract features and labels
correct_target_labels.py
fbank.py
feature_labels.py -
Data loader
dataloader.py
dataloader_test.py -
Model definition and traning
pvad_training.py -
Saved model
checkpoint_oct22_coswarm.t7 -
Test
test.py