Below are the step to setup the code and perform training
After setting up the code as below, update the paths appropriately
git clone https://github.com/ksasi/sapa.git
cd sapa
pip install -r requirements.txt
- Create and change directory to dataset under Speaker_Verification
- Download [VoxCeleb1-H (small subset)] (https://iitjacin-my.sharepoint.com/:u:/g/personal/d22cs051_iitj_ac_in/EVhTqG7PeDFBlkgHrG7WSJoB63ievtSFmE-PLdSxHtSNqA?e=Nlf8fX)
- Download [Kathbath dataset] (https://github.com/AI4Bharat/IndicSUPERB)
Dataset Kathbath structure after extraction :
Audio Data
data
├── telugu
│ ├── <split_name>
│ │ ├── 844483828886543-594-f.m4a
│ │ ├── 765429982765376-973-f.m4a
│ │ ├── ...
├── tamil
├── ...
Transcripts
data
├── telugu
│ ├── <split_name>
│ │ ├── transcription_n3w.txt
├── tamil
├── ...
Convert m4a to wav format as below :
python utilities/structure.py \
<dataset_root_path>/kb_data_clean_m4a \
<dataset_root_path>/kb_data_clean_wav \
<lang>
Execute the below script to evaluate models (XLSR-Wav2Vec2, UniSpeech-SAT and WavLM-Base) with EER(%) using VoxCeleb1-H (small subset)
cd Speaker_Verification
nohup python eval_voxceleb.py > <root_path>/log/eval_log_voxceleb.out &
Execute the below script to evaluate models (XLSR-Wav2Vec2, UniSpeech-SAT and WavLM-Base) with EER(%) using test partition of Kathbath - Telugu dataset
cd Speaker_Verification
nohup python eval_kathbath.py > <root_path>/log/eval_log_kathbath.out &
Execute the below script to fine-tune WavLM model on valid partition of Kathbath - Telugu dataset
cd Speaker_Verification
nohup python train_WavLM.py > <root_path>/log/WavLM_log_finetune.out &
Execute the below script to evaluate the fin-tuned WavLM mode on test partition of Kathbath - Telugu dataset
cd Speaker_Verification
nohup python eval_WavLM.py > <root_path>/log/WavLM_log_ft_eval.out &
- Setup LibriMix repo as below
git clone https://github.com/JorisCos/LibriMix.git
cd <root_folder>/LibriMix/metadata/Libri2Mix
Delete all folders , except libri2mix_test-clean_info.csv and libri2mix_test-clean.csv
- Execute the below script to generate LibriMix dataset
cd <root_folder>/LibriMix
./generate_librimix.sh storage_dir
where storage_dir = <root_folder>/dataset
Execute the below script to perform evaluation of SepFormer on test split of (70-30 of LibriMix - LibriSpeech test clean partition)
cd Source_Separation
nohup python eval_separator.py > <root_path>/log/eval_sepformer_librimix_batch_size8.out &
Execute the below steps to fine-tune and evaluate SepFormer :
-
Adopt the speechbrain recipe and fine-tune as below:
-
Generate train and test csv files by executing
csv_generator.py
as below :
cd Source_Separation
python csv_generator.py
- Clone
speechbrain
repo and updatetrain.py
,sepformer.yaml
as below:
git clone https://github.com/speechbrain/speechbrain.git
cp <root_path>/Source_Separation/train.py <root_path>/Source_Separation/speechbrain/recipes/WSJ0Mix/separation/train.py
cp <root_path>/Source_Separation/sepformer.yaml <root_path>/Source_Separation/speechbrain/recipes/WSJ0Mix/separation/hparams/sepformer.yaml
- Fine-tune sepformer with LibriMix dataset by running
train.py
as below:
cd <root_pat>/Source_Separation
nohup <root_path>/Source_Separation/speechbrain/recipes/WSJ0Mix/separation/train.py <root_path>/Source_Separation/speechbrain/recipes/WSJ0Mix/separation/hparams/sepformer.yaml > <root_path>/log/sepformer_ft.out &
Demo of Speaker Verification from audio inputs can be executed by running Speaker_Verification_Demo.ipynb
ipython notebook in the Demo folder
- LibriMix - Github Link
- Speechbrain - Github Link
- EER Metric - blog
- VoxCeleb dataset - Link
- Kathbath dataset - Link
- UniSpeech - Github Link
- SepFormer Huggingface - Link
- Torchmetrics - Link