Skip to content

Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018

Notifications You must be signed in to change notification settings

AmmieQi/AVE-ECCV18

Repository files navigation

Audio-Visual Event Localization in Unconstrained Videos (To appear in ECCV 2018) (not ready)

AVE Dataset & Features

AVE dataset can be downloaded from https://drive.google.com/open?id=1FjKwe79e0u96vdjIVwfRQ1V6SoDHe7kK.

Audio feature and visual feature (7.7GB) are also released. Please put videos of AVE dataset into /data/AVE folder and features into /data folder before running the code.

Requirements

Pytorch, Keras, ffmpeg.

Visualize attention maps

Run: python attention_visualization.py to generate audio-guided visual attention maps.

image

Supervised audio-visual event localization

Testing:

A+V-att model in the paper: python supervised_main.py --model_name AV_att

DMRN model in the paper: python supervised_main.py --model_name DMRN

Training:

python supervised_main.py --model_name AV_att --train

Weakly-supervised audio-visual event localization

Cross-modality localization

Citation

If you find this work useful, please consider citing it.

@inproceedings{AVE2018,
title={Audio-Visual Event Localization in Unconstrained Videos},
author={Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, and Chenliang Xu},
booktitle={ECCV},
year={2018}
}

Acknowledgements

Audio features are extracted using vggish and the audio-guided visual attention model was implemented highly based on adaptive attention. We thank the authors for sharing their codes.

About

Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%