SGAE/ pytorch 0.4.0

Auto-Encoding Scene Graphs for Image Captioning, CVPR 2019

Acknowledgement

This repository is derived from work of Yang Xu et al. at https://github.com/yangxuntu/SGAE

This code is implemented based on Ruotian Luo's implementation of self critical sequence training in https://github.com/ruotianluo/self-critical.pytorch.

And we use the visual features provided by paper Bottom-up and top-down attention for image captioning and visual question answering in https://github.com/peteanderson80/bottom-up-attention.

Training the model

1.After downloading the codes and meta data, you can train the model by using the following code:

python train_mem.py --id id66 --caption_model lstm_mem4 --input_json data/cocobu2.json --input_fc_dir data/cocobu_fc --input_att_dir data/cocobu_att --input_rela_dir data/cocobu_sg_img --input_ssg_dir data/coco_spice_sg2 --input_label_h5 data/cocobu2_label.h5 --sg_dict_path data/spice_sg_dict2.npz --batch_size 50 --accumulate_number 2 --learning_rate 5e-4 --learning_rate_decay_start 0 --learning_rate_decay_every 5 --scheduled_sampling_start 0 --checkpoint_path id66 --save_checkpoint_every 5000 --val_images_use 5000 --max_epochs 150 --rnn_size 1000 --input_encoding_size 1000 --att_feat_size 2048 --att_hid_size 512 --self_critical_after 40 --train_split train --memory_size 10000 --memory_index c --step2_train_after 10 --step3_train_after 20 --use_rela 0 --gpu 5

Important notes: I reorganized and optimized the code recently and found that even without MGCN, this code can achieve 127.8 CIDEr-D score. But you need to have a 16G gpu like DGX. If your memory is not enough, you should change --batch_size from 50 to 25, and --accumulate_number from 2 to 4, which can make the batch size be 100. But I found that these two different settings will lead to different performances.

2.You can also go to my google drive to download two well-trained models which are modelid740072 and modelid640075 for getting about 128.3 CIDEr-D scores, and these two models are trained by using the above scripts.

3.The details of parameters:

--id: the id of your model, which is usually set as the same as check point, which is helpful for you to train from the check point.

--batch_size, --accumulate_number: these two parameters are set for users who do not have large gpu, if you want to set batch size as 100, you can set batch_size as 50, and set accumulate_number as 2, also you can set batch_size as 20 and accumulate_number as 5. Importantly, they are not totally equal to set batch_size as 100 and accumulate_number as 1, the bigger the bathc_size is, the higher the performance.

--self_critical_after: when reinforcement leanring begins, if this value is set as 40, it means that after training 40 epoches, the reinforcement loss is used. Generally, if you want to have a good CIDEr-D score, you should use cross entropy loss first and then use reinforcement loss.

--step2_train_after: when the dictionary is learned, for example, if this value is set as 10, then before 10 epochs, only the decoder is trained by sentence scene graphs and the dictionary is not learned.

--step3_train_after: when image captioning encoder-decoder is learned, for example, if this value is set as 20, then before 20 epochs, only sentence scene graphs are used to learn the dictionary, and after 20 epochs, the sentence scene graphs are no longer used and the image encoder-decoder is trained.

--use_rela: whether use image scene graph

4.Tranining from checkpoints. The codes provide the ability of training from checkpoints. For example, if you want to train the model from one checkpoint, say, 22, you can use the following code to continute:

python train_mem.py --id id66 --caption_model lstm_mem4 --input_json data/cocobu2.json --input_fc_dir data/cocobu_fc --input_att_dir data/cocobu_att --input_rela_dir data/cocobu_sg_img --input_ssg_dir data/coco_spice_sg2 --input_label_h5 data/cocobu2_label.h5 --sg_dict_path data/spice_sg_dict2.npz --batch_size 50 --accumulate_number 2 --learning_rate 5e-4 --learning_rate_decay_start 0 --learning_rate_decay_every 5 --scheduled_sampling_start 0 --checkpoint_path id66 --save_checkpoint_every 5000 --val_images_use 5000 --max_epochs 150 --rnn_size 1000 --input_encoding_size 1000 --att_feat_size 2048 --att_hid_size 512 --self_critical_after 40 --train_split train --memory_size 10000 --memory_index c --step2_train_after 10 --step3_train_after 20 --use_rela 0 --gpu 5 --start_from 22 --memory_cell_path id66/memory_cellid660022.npz

the parameter start_from and memory_cell_path are used for training from checkpoints.

Evaluating the model

1.After training the model or downloading the well-trained model, you can evaluate them by using the following code:

python eval_mem.py --dump_images 0 --num_images 5000 --model id66/modelid660066.pth --infos_path id66/infos_id660066.pkl --language_eval 1 --beam_size 5 --split test --index_eval 1 --use_rela 0 --training_mode 2 --memory_cell_path id66/memory_cellid660066.npz --sg_dict_path data/spice_sg_dict2.npz --input_ssg_dir data/coco_spice_sg2 --batch_size 50

what you need to do is to switch the model id with your id, like 66 to 01, and change the number like 0066 to your trained model, like 0066 to 0001.

Generating Scene graphs:

1.For sentence scene graph, you can directly download the revised code in spice-1.0.jar and create_coco_sg.py, put spice-1.0.jar in /coco-caption/pycocoevalcap/spice, then you should set coco_use as coco_train or coco_val in file create_coco_sg.py, then run this code and the sentence scene graphs are generated in /coco-caption/pycocoevalcap/spice/sg.json.

2.Then use process_spice_sg.py to process sg.json.

3.For image scene graph, you can directly download the code provided by https://github.com/rowanz/neural-motifs for generating image scene graphs.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.vscode		.vscode
captioning		captioning
cider-master		cider-master
coco-caption		coco-caption
config		config
eval_results		eval_results
misc		misc
models		models
scripts		scripts
.gitignore		.gitignore
README.md		README.md
collect-env.py		collect-env.py
create_coco_sg.py		create_coco_sg.py
dataloader_extend.py		dataloader_extend.py
dataloader_extend_test.py		dataloader_extend_test.py
environment.yml		environment.yml
eval_ensemble_mem.py		eval_ensemble_mem.py
eval_ensemble_mem_test.py		eval_ensemble_mem_test.py
eval_mem.py		eval_mem.py
eval_utils_mem.py		eval_utils_mem.py
eval_utils_mem_ensemble.py		eval_utils_mem_ensemble.py
eval_utils_mem_ensemble_test.py		eval_utils_mem_ensemble_test.py
modeleval.sh		modeleval.sh
notes.txt		notes.txt
opts.py		opts.py
process_sg_extend.py		process_sg_extend.py
process_spice_sg.py		process_spice_sg.py
py3-sgae.zip		py3-sgae.zip
requirements.txt		requirements.txt
show-captions.py		show-captions.py
spice-1.0.jar		spice-1.0.jar
train_mem.py		train_mem.py
trainmodel.py		trainmodel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SGAE/ pytorch 0.4.0

Acknowledgement

Training the model

Evaluating the model

Generating Scene graphs:

About

Releases

Packages

Languages

thejasvi204/ImageCaptioning

Folders and files

Latest commit

History

Repository files navigation

SGAE/ pytorch 0.4.0

Acknowledgement

Training the model

Evaluating the model

Generating Scene graphs:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages