This Repository contains the code and pretrained models for the following INTERSPEECH 2024 paper:
- Title : Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection
- Autor : Duc-Tuan Truong, Ruijie Tao, Tuan Nguyen, Hieu-Thi Luong, Kong Aik Lee, Eng Siong Chng
The pretrained model XLSR can be found at link.
We have uploaded pretrained models of our experiments. You can download pretrained models from OneDrive.
Python version: 3.7.16
Install PyTorch
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
Install other libraries:
pip install -r requirements.txt
Install fairseq:
git clone https://github.com/facebookresearch/fairseq.git fairseq_dir
cd fairseq_dir
git checkout a54021305d6b3c
pip install --editable ./
To train and produce the score for LA set evaluation, run:
python main.py --algo 5
To train and produce the score for DF set evaluation, run:
python main.py --algo 3
To get evaluation results of minimum t-DCF and EER (Equal Error Rate), follow these steps:
cd 2021/eval-package
python main.py --cm-score-file your_LA_score.txt --track LA --subset eval # For LA track evaluation
python main.py --cm-score-file your_DF_score.txt --track DF --subset eval # For DF track evaluation
To run inference on a single wav file with the pretrained model, run:
python inference.py --ckpt_path=path_to/model.pth --threshold=-3.73 --wav_path=path_to/audio.flac
The threshold can be obtained when calculating EER on LA or DF set. In this example, the threshold is from DF set evaluation.
If you find our repository valuable for your work, please consider giving a start to this repo and citing our paper:
@inproceedings{truong24b_interspeech,
title = {Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection},
author = {Duc-Tuan Truong and Ruijie Tao and Tuan Nguyen and Hieu-Thi Luong and Kong Aik Lee and Eng Siong Chng},
year = {2024},
booktitle = {Interspeech 2024},
pages = {537--541},
doi = {10.21437/Interspeech.2024-659},
issn = {2958-1796},
}
Our work is built upon the conformer-based-classifier-for-anti-spoofing We also follow some parts of the following codebases:
SSL_Anti-spoofing (for training pipeline).
conformer (for Conformer model architechture).
DHVT (for Head Token desgin).
Thanks for these authors for sharing their work!