A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement @ Interspeech2022

This is a PyTorch implementation of the pipeline presented in A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement paper published @ Interspeech2022.
Project page

Abstract

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement
Or Tal , Moshe Mandel , Felix Kreuk , Yossi Adi
23rd INTERSPEECH conference

Speech enhancement has seen great improvement in recent years using end-to-end neural networks. However, most models are agnostic to the spoken phonetic content.
Recently, several studies suggested phonetic-aware speech enhancement, mostly using perceptual supervision. Yet, injecting phonetic features during model optimization can take additional forms (e.g., model conditioning).
In this paper, we conduct a systematic comparison between different methods of incorporating phonetic information in a speech enhancement model.
By conducting a series of controlled experiments, we observe the influence of different phonetic content models as well as various feature-injection techniques on enhancement performance, considering both causal and non-causal models.
Specifically, we evaluate three settings for injecting phonetic information, namely: i) feature conditioning; ii) perceptual supervision; and iii) regularization.
Phonetic features are obtained using an intermediate layer of either a supervised pre-trained Automatic Speech Recognition (ASR) model or by using a pre-trained Self-Supervised Learning (SSL) model.
We further observe the effect of choosing different embedding layers on performance, considering both manual and learned configurations. Results suggest that using a SSL model as phonetic features outperforms the ASR one in most cases. Interestingly, the conditioning setting performs best among the evaluated configurations. Paper.

Cite

If you find this implementation useful please consider citing our work:

@article{tal2022systematic,
  title={A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement},
  author={Tal, Or and Mandel, Moshe and Kreuk, Felix and Adi, Yossi},
  journal={arXiv preprint arXiv:2206.11000},
  year={2022}
}

Usage

Clone the repository

git clone https://github.com/slp-rl/SC-PhASE.git
cd SC-PhASE

Installation

pip install -r requirements.txt.

Note: torch installation may depend on your cuda version. see Install torch

Data preprocessing

Download all and unzip the Valentini dataset
Down-sample each directory using:
bash data_preprocessing_scripts/general/audio_resample_using_sox.sh <path to data dir> <path to target dir>
Generate json files:
python data_preprocessing_scripts/speech_enhancement/valentini_egs_script.py --project_dir <full path to current project root> --dataset_base_dir <full path to the downsampled audio root, containing all downsampled dirs> --spk <num speakers in {28,56}, default=28>

Note: valentini_egs_script.py assumes that following dataset structure:

root dir
│
└─── noisy_trainset_{28/56}spk_wav
│   └─  Downsampled audio files
│
└─── clean_trainset_{28/56}spk_wav
│   └─  Downsampled audio files
│
└─── noisy_testset_wav
│   └─  Downsampled audio files
│
└─── clean_testset_wav
    └─  Downsampled audio files

Download pretrained HuBERT model weights

Download pretrained weights link
Copy full path to the pretrained .pt file to features_config.state_dict_path field in: configurations/main_config.yaml

Train

Example of running commands could be found in: run_commands_examples/
Train Demucs(hidden:48, stride:4, resample:4) baseline example:

python train.py \
dset=noisy_clean \
experiment_name=h48u4s4_baseline \
hidden=48 \
stride=4 \
resample=4 \
features_dim=768 \
features_dim_for_conditioning=768 \
include_ft=False \
get_ft_after_lstm=False \
use_as_conditioning=False \
use_as_supervision=False \
learnable=False \
ddp=True \
batch_size=16

Test

Test the whole pipeline for a single epoch:
python train.py dset=debug eval_every=1 epochs=1 experiment_name=test_pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
configurations		configurations
custom_loss_functions		custom_loss_functions
data_preprocessing_scripts		data_preprocessing_scripts
dset_builders		dset_builders
evaluation_metrics		evaluation_metrics
external_files		external_files
img		img
models		models
run_commands_examples		run_commands_examples
solvers		solvers
utils		utils
LICENSE-META.txt		LICENSE-META.txt
LICENSE-MIT.txt		LICENSE-MIT.txt
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement @ Interspeech2022

Abstract

Cite

Usage

Clone the repository

Installation

Data preprocessing

Download pretrained HuBERT model weights

Train

Test

About

Licenses found

Releases

Packages

Languages

License

Licenses found

slp-rl/SC-PhASE

Folders and files

Latest commit

History

Repository files navigation

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement @ Interspeech2022

Abstract

Cite

Usage

Clone the repository

Installation

Data preprocessing

Download pretrained HuBERT model weights

Train

Test

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages