BLIP: Modification of sota video-text similarity with Cross-transformer Encoder

Introduction

BLIP backbone, Cross-transformer Encoder

Dependencies

Our model was developed and evaluated using the following package dependencies:

PyTorch 1.8.1
Transformers 4.6.1
OpenCV 4.5.3

Datasets

We trained models on the MSR-VTT, MSVD and LSMDC datasets. To download the datasets, refer to this repository.

For LSMDC, you must obtain permission from MPII to download and use the data, so we do not provide the split and caption files in the data/ directory.

Evaluation

The following commands can be used to reproduce the main results of our paper using the supplied checkpoint files for each dataset. The commands will by default generate results for text-to-video retrieval (t2v). For video-to-text retrieval (v2t) results, add the argument --metric=v2t to the command.

If the outputs/ folder does not exist, first run mkdir outputs to create the directory. For each dataset, create a directory in outputs/ and store the corresponding checkpoint file. For each command below, replace {exp_name} with the name of that directory.

Also, replace {videos_dir} with the path to the dataset's videos.

For evaluation, you can change the batch_size without affecting results.

Dataset	Command
MSR-VTT-9k	`python test.py --exp_name={exp_name} --videos_dir={videos_dir} --batch_size=32 --huggingface --load_epoch=-1 --dataset_name=MSRVTT --msrvtt_train_file=9k`
MSR-VTT-7k	`python test.py --exp_name={exp_name} --videos_dir={videos_dir} --batch_size=32 --huggingface --load_epoch=-1 --dataset_name=MSRVTT --msrvtt_train_file=7k`
MSVD	`python test.py --exp_name={exp_name} --videos_dir={videos_dir} --batch_size=32 --huggingface --load_epoch=-1 --dataset_name=MSVD`
LSMDC	`python test.py --exp_name={exp_name} --videos_dir={videos_dir} --batch_size=32 --huggingface --load_epoch=-1 --dataset_name=LSMDC`

Training

The following commands can be used to train the model for each dataset. Again, the evaluation is by default set to generate results for text-to-video retrieval (t2v). For video-to-text retrieval (v2t) results, add the argument --metric=v2t to the command.

For each command below, replace {exp_name} with your choice name of experiment. Also, replace {videos_dir} with the path to the dataset's videos.

Dataset	Command
MSR-VTT-9k	`python train.py --exp_name={exp_name} --videos_dir={videos_dir} --batch_size=32 --noclip_lr=3e-5 --transformer_dropout=0.3 --huggingface --dataset_name=MSRVTT --msrvtt_train_file=9k`
MSR-VTT-7k	`python train.py --exp_name={exp_name} --videos_dir={videos_dir} --batch_size=32 --noclip_lr=1e-5 --transformer_dropout=0.4 --huggingface --dataset_name=MSRVTT --msrvtt_train_file=7k`
MSVD	`python train.py --exp_name={exp_name} --videos_dir={videos_dir} --batch_size=32 --noclip_lr=1e-5 --transformer_dropout=0.4 --huggingface --dataset_name=MSVD`
LSMDC	`python train.py --exp_name={exp_name} --videos_dir={videos_dir} --batch_size=32 --noclip_lr=1e-5 --transformer_dropout=0.3 --huggingface --dataset_name=LSMDC`

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
config		config
data		data
datasets		datasets
model		model
modules		modules
preprocess		preprocess
trainer		trainer
README.md		README.md
justScratch.py		justScratch.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BLIP: Modification of sota video-text similarity with Cross-transformer Encoder

Introduction

Dependencies

Datasets

Evaluation

Training

About

Releases

Packages

Languages

RAHUL13-13/BLIP

Folders and files

Latest commit

History

Repository files navigation

BLIP: Modification of sota video-text similarity with Cross-transformer Encoder

Introduction

Dependencies

Datasets

Evaluation

Training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages