This is an official PyTorch implementation of LR2PPO, the ECCV2024 paper is available at here. The introduction video can be found at here.
Download the proposed LRMovieNet
Dataset from here.
[Optional]
You can also download MovieNet
Dataset from its Official Website.
Download roberta_base_en_model
and vit_base_patch16_224_model
weights from this link or from its official repositories, and save it in ./pretrained_models/
folder.
pip3 install -r requirements.txt
- 4 GPUs
Before running the following commands, make sure the data path is correct and the GPUs are sufficient (e.g., 4 GPUs).
sh pointwise.sh <your_stage1>
sh reward_pair_dataloader.sh <your_stage2>
sh ppo.sh <your_stage3>
sh ppo_eval.sh <your_eval>
We provide logs and checkpoints for the LRMovieNet
dataset in the logs/
folder and through this link, respectively.
For more details, please refer to the LICENSE file.
Part of our code is borrowed from the following repositories:
We are grateful for these excellent works and repositories.
If you find our work helpful for your research, please consider citing it.
@inproceedings{guo2024multimodal,
title={Multimodal Label Relevance Ranking via Reinforcement Learning},
author={Guo, Taian and Zhang, Taolin and Wu, Haoqian and Li, Hanjun and Qiao, Ruizhi and Sun, Xing},
booktitle={European Conference on Computer Vision},
pages={391--408},
year={2024},
organization={Springer}
}