Audio-Visual Event Localization in Unconstrained Videos (To appear in ECCV 2018) (not ready)
AVE dataset can be downloaded from https://drive.google.com/open?id=1FjKwe79e0u96vdjIVwfRQ1V6SoDHe7kK.
Audio feature and visual feature are also released. Please put videos of AVE dataset into /data/AVE folder and features into /data folder before running the code.
Pytorch, Keras, ffmpeg.
Run: python attention_visualization.py to generate audio-guided visual attention maps.
If you find this work useful, please consider citing it.
@inproceedings{AVE2018,
title={Audio-Visual Event Localization in Unconstrained Videos},
author={Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, and Chenliang Xu},
booktitle={ECCV},
year={2018}
}
Audio features are extracted using vggish and the audio-guided visual attention model was implemented highly based on adaptive attention. We thank the authors for sharing their codes.