Skip to content

cogmhear/lightweight-AV-SE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Lightweight Real-time Audio-Visual Speech Enhancement Framework

Requirements

Usage

Update DATA_ROOT in config.py

# Expected folder structure
|-- train
|   `-- scenes
|-- dev
|   `-- scenes
|-- eval
|   `-- scenes

Data Preparation

The data was preprocessed using the preprocessing scripts provided in the AVSEC Preprocessing.

Training on AVSEC 3 dataset

To train the model in the paper, run this command:

python train.py --log_dir ./logs --batch_size 2 --lr 0.001 --gpu 1 --max_epochs 20

optional arguments:
  -h, --help            show this help message and exit
  --batch_size 4        Batch size for training
  --lr 0.001            Learning rate for training
  --log_dir LOG_DIR     Path to save tensorboard logs

Test

To evaluate the model on AVSEC3 data, run:

usage: test.py [-h] --ckpt_path ./model.pth --save_root ./enhanced --model_uid avse [--dev_set False] [--eval_set True] [--cpu True]

optional arguments:
  -h, --help             show this help message and exit
  --ckpt_path CKPT_PATH  Path to model checkpoint
  --save_root SAVE_ROOT  Path to save enhanced audio
  --model_uid MODEL_UID  Folder name to save enhanced audio
  --dev_set True         Evaluate model on dev set
  --eval_set False       Evaluate model on eval set
  --cpu True             Evaluate on CPU (default is GPU)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages