In this paper, we propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection. To effectively incorporate an external KG, we transfer triples into text and propose a late injection mechanism. Finally we address VQA as a text generation task with an effective encoder-decoder paradigm.
2024-02
We preprint our Survey Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey [Repo
].
- Python 3
- PyTorch (>= 1.6.0)
- Transformers (version 3.0.2)
- NumPy
- faiss-cpu
Training
Data
and
KGs
is available here- In contrast to
data_source.zip
, we provide a processing script and some source data for both vqa2 and okvqa datasets. We provided Baidu Cloud (password:r42d) and Google Link.
bash run_okvqa_train.sh
or try full training process to get the Attention signal for iterative training
bash run_okvqa_full.sh
bash run_okvqa_test.sh
(Optional)
You can first pre-train LaKo (large version) onVQA2.0
then re-train onOKVQA
for better performance.- You can open the
.sh
file for parameter modification. - The latest Transformers (e.g., 4.XX.XX) have some differences from the older version, which may lead to some unexpected error.
- Distilling Knowledge from Reader to Retriever:https://arxiv.org/abs/2012.04584.
- Github link to FiD
Please condiser citing this paper if you use the code
or data
from our work.
Thanks a lot :)
@inproceedings{DBLP:conf/jist/0007HCGFP0Z22,
author = {Zhuo Chen and
Yufeng Huang and
Jiaoyan Chen and
Yuxia Geng and
Yin Fang and
Jeff Z. Pan and
Ningyu Zhang and
Wen Zhang},
title = {LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text
Injection},
booktitle = {{IJCKG}},
pages = {20--29},
publisher = {{ACM}},
year = {2022}
}