SelM [Paper]
This repository contains code for "SelM: Selective Mechanism based Audio-Visual Segmentation" (ACM MM 2024 Oral, 3.97%).
IIAU Lab @ Dalian University of Technology
†equal contribution
Our Code was tested upon a conda environment.
You can install conda by this link Conda and then create an environment as follows:
conda create -n selm python=3.9
conda activate selm
We use Pytorch 2.0.1 with CUDA-11.7 as our default setting, install Pytorch by pip as below
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
Notice : Mamba-ssm Link require CUDA 11.6+ , you might have to update your CUDA.
for other required packages:
pip install -r requirements.txt
For AVSBench Dataset ,please refer to this link AVSBench to download the datasets
For Pretrained Backbone(ResNet50、PVT-v2、VGGish),please refer to this link to download.
You can place the dataset and pretrained backbone to the directory data
pretrained backbone
Notice : Don't forget to change the paths of data and model in config.py
You can download our pretrained SelM models by Google Drive and place it to the directory pretrained model
Method | Subset | mIoU | F-score | Download |
---|---|---|---|---|
SelM-R50 | S4 | 76.6 | 86.2 | pth |
SelM-PVTv2 | S4 | 83.5 | 91.2 | pth |
SelM-R50 | MS3 | 54.5 | 65.6 | pth |
SelM-PVTv2 | MS3 | 60.3 | 71.3 | pth |
SelM-R50 | AVSS | 31.9 | 37.2 | pth |
SelM-PVTv2 | AVSS | 41.3 | 46.9 | pth |
For S4 and MS3 settings, we supply single-gpu train, run the command below :
#S4
cd avs_s4
bash train.sh
#MS3
cd avs_ms3
bash train.sh
Note that for AVSS setting, we supply muti-gpu train, to train SelM on 8 GPUs run:
cd avss
bash train.sh
For test, remember to change the path of weights ,run:
#S4
cd avs_s4
bash test.sh
#MS3
cd avs_ms3
bash test.sh
#AVSS
cd avss
bash test.sh
This repo is based on AVSBench,RIS-DMMI,CGFormer,many thanks to these wonderful works.
If you are interested in our work, you can cite our work by below bibtex, thank you !
@inproceedings{li2024selm,
title={SelM: Selective Mechanism based Audio-Visual Segmentation},
author={Li, Jiaxu and Yu, Songsong and Wang, Yifan and Wang, Lijun and Lu, Huchuan},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={3926--3935},
year={2024}
}