Name	Name	Last commit message	Last commit date
Latest commit History 7 Commits
VQA_LSTM_CNN @ ffd1d75	VQA_LSTM_CNN @ ffd1d75
images	images
misc	misc
models	models
.gitignore	.gitignore
.gitmodules	.gitmodules
README.md	README.md
classification.lua	classification.lua
visual_question_answering.lua	visual_question_answering.lua

Name

Last commit message

Last commit date

7 Commits

VQA_LSTM_CNN @ ffd1d75

visual_question_answering.lua

Grad-CAM: Gradient-based Discriminative Localization & Visualization

Usage

Download Caffe model(s) and prototxt for VGG-16/VGG-19/AlexNet using sh models/download_models.sh.

Classification

th classification.lua -input_image_path images/cat_dog.jpg -label 243 -gpuid 0
th classification.lua -input_image_path images/cat_dog.jpg -label 283 -gpuid 0

Options

proto_file: Path to the deploy.prototxt file for the CNN Caffe model. Default is models/VGG_ILSVRC_16_layers_deploy.prototxt.
model_file: Path to the .caffemodel file for the CNN Caffe model. Default is models/VGG_ILSVRC_16_layers.caffemodel.
input_image_path: Path to the input image. Default is images/cat_dog.jpg.
input_sz: Input image size. Default is 224 (Change to 227 if using AlexNet).
layer_name: Layer to use for Grad-CAM. Default is relu5_3 (use relu5_4 for VGG-19 and relu5 for AlexNet).
label: Class label to generate grad-CAM for. Default is 243 (283 = Tiger cat, 243 = Boxer). These correspond to ILSVRC synset IDs.
out_path: Path to save images in. Default is output/.
gpuid: 0-indexed id of GPU to use. Default is -1 = CPU.
backend: Backend to use with loadcaffe. Default is cudnn.

Visual Question Answering

Clone the VQA (http://arxiv.org/abs/1505.00468) sub-repository (git submodule init && git submodule update), and download and unzip the provided extracted features and pretrained model.

th visual_question_answering.lua -input_image_path images/cat_dog.jpg -question 'What animal?' -answer 'cat' -gpuid 0
th visual_question_answering.lua -input_image_path images/cat_dog.jpg -question 'What animal?' -answer 'dog' -gpuid 0

Options

proto_file: Path to the deploy.prototxt file for the CNN Caffe model. Default is models/VGG_ILSVRC_19_layers_deploy.prototxt.
model_file: Path to the .caffemodel file for the CNN Caffe model. Default is models/VGG_ILSVRC_19_layers.caffemodel.
input_image_path: Path to the input image. Default is images/cat_dog.jpg.
input_sz: Input image size. Default is 224 (Change to 227 if using AlexNet).
layer_name: Layer to use for Grad-CAM. Default is relu5_4 (use relu5_3 for VGG-16 and relu5 for AlexNet).
question: Input question. Default is What animal?.
answer: Answer to generate grad-CAM for. Default is 'cat'.
out_path: Path to save images in. Default is output/.
model_path: Path to VQA model checkpoint. Default is VQA_LSTM_CNN/lstm.t7.
gpuid: 0-indexed id of GPU to use. Default is -1 = CPU.
backend: Backend to use with loadcaffe. Default is cudnn.

Image Captioning

License

BSD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grad-CAM: Gradient-based Discriminative Localization & Visualization

Usage

Classification

Options

Visual Question Answering

Options

Image Captioning

License

About

Releases

Packages

Languages

TDL77/grad-cam

Folders and files

Latest commit

History

Repository files navigation

Grad-CAM: Gradient-based Discriminative Localization & Visualization

Usage

Classification

Options

Visual Question Answering

Options

Image Captioning

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages