MelHuBERT: A simplified HuBERT on Mel spectrogram

This is the official implementation of ASRU 2023 accepted paper.

Paper link: https://arxiv.org/abs/2211.09944

Paper introduction video: https://www.youtube.com/watch?v=S_t2TROKu6o

MelHuBERT, is able to achieve favorable performance on phone recognition, speaker identification, and automatic speech recognition against HuBERT, while saving 31.2% of the pretraining time, or equivalently 33.5% MACs per one second speech.

Environment

python=3.9

pip install -r requirement.txt

Data Preparing

First, please execute the following command to prepare LibriSpeech 360 horus and paired cluster labels (K-means on log Mel feature)

bash preprocess.sh [DATA_DIR]

Then, please adjust datarc.sets in ./config/config_runner_20ms.yaml and ./config/config_runner_10ms.yaml to [ DATA_DIR/libri-360-data-cluster-pair.csv ]

The mean and std of LibriSpeech 360 hours is saved at DATA_DIR/mean-std.npy (You won't need it during pre-training, but you might need it when fine-tuning on downstream.)

Pre-training MelHuBERT from scratch

Execute the following command to pretrain MelHuBERT from scratch with default configuration

20 ms frame period:

python3 train.py -f 20 -g ./config/config_model_20ms.yaml -c ./config/config_runner_20ms.yaml -n EXP_DIR_PATH

10 ms frame period:

python3 train.py -f 10 -g ./config/config_model_10ms.yaml -c ./config/config_runner_10ms.yaml -n EXP_DIR_PATH

-f: frame period
-g: Model config
-c: Runner config
-n: The model checkpoints, log file, and the pre-training config you used will be saved at this directory

Pretrained Models

Extracting feature

Please execute the following command to extract feature from two example waveforms

python3 extract_feature.py -c [CHECKPOINT] -f [FRAME_PERIOD]

-c: Model checkpoint path -f: Choice from 20 or 10 (ms)

Acknowledgement

Our implementation of pre-training interface is based on S3PRL toolkit

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
config		config
example		example
fairseq_code		fairseq_code
preprocess		preprocess
pytorch_code		pytorch_code
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
extract_feature.py		extract_feature.py
model.py		model.py
module.py		module.py
preprocess.sh		preprocess.sh
pretrain_expert.py		pretrain_expert.py
requirement.txt		requirement.txt
runner.py		runner.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MelHuBERT: A simplified HuBERT on Mel spectrogram

Environment

Data Preparing

Pre-training MelHuBERT from scratch

Pretrained Models

Extracting feature

Acknowledgement

About

Releases

Packages

Languages

ishine/MelHuBERT

Folders and files

Latest commit

History

Repository files navigation

MelHuBERT: A simplified HuBERT on Mel spectrogram

Environment

Data Preparing

Pre-training MelHuBERT from scratch

Pretrained Models

Extracting feature

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages