audio-visual tracking method
This repo is the official implementation of 'Audio-Visual Cross-Attention Network for Robotic Speaker Tracking', TASLP 2022.
You can download the constructed features from https://drive.google.com/drive/folders/1mLgvflJ2MKYz2WIZAx5H_XStYwSMc88I?usp=share_link to data/
To run the source code, simply conduct:
python hritrain.py -model [model name] -datapath [data path]
The raw data will be released soon.
(Due to personal privacy, the raw face images will not be released.)
@ARTICLE{qian2023avri,
author={Qian, Xinyuan and Wang, Zhengdong and Wang, Jiadong and Guan, Guohui and Li, Haizhou},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title={Audio-Visual Cross-Attention Network for Robotic Speaker Tracking},
year={2023},
volume={31},
number={},
pages={550-562},
doi={10.1109/TASLP.2022.3226330}}