TASLP2022-AVRI

audio-visual tracking method

This repo is the official implementation of 'Audio-Visual Cross-Attention Network for Robotic Speaker Tracking', TASLP 2022.

You can download the constructed features from https://drive.google.com/drive/folders/1mLgvflJ2MKYz2WIZAx5H_XStYwSMc88I?usp=share_link to data/

To run the source code, simply conduct:

python hritrain.py -model [model name] -datapath [data path]

The raw data will be released soon.

(Due to personal privacy, the raw face images will not be released.)

Citation

@ARTICLE{qian2023avri,
  author={Qian, Xinyuan and Wang, Zhengdong and Wang, Jiadong and Guan, Guohui and Li, Haizhou},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Audio-Visual Cross-Attention Network for Robotic Speaker Tracking}, 
  year={2023},
  volume={31},
  number={},
  pages={550-562},
  doi={10.1109/TASLP.2022.3226330}}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
finalmodel		finalmodel
GCC_PHAT.m		GCC_PHAT.m
README.md		README.md
fun.py		fun.py
hritrain.py		hritrain.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TASLP2022-AVRI

Citation

About

Releases

Packages

Languages

catherine-qian/TASLP2022-AVRI

Folders and files

Latest commit

History

Repository files navigation

TASLP2022-AVRI

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages