This repo contains following CNN operations implemented in Pytorch:
- Gradient visualization with vanilla backpropagation
- Gradient visualization with guided backpropagation [1]
- Gradient visualization with saliency maps [4]
- Gradient-weighted [3] class activation mapping [2]
- Guided gradient-weighted class activation mapping [3]
- CNN filter visualization [9]
- Deep Dream [10]
- Class specific image generation (A generated image that maximizes a certain class) [4]
- Fooling images (Unrecognizable images predicted as classes with high confidence) [7]
- Fooling images disguised as another image (Picture of ipod being predicted as horse) [7]
It will also include following operations in near future as well:
- Inverted Image Representations [5]
- Weakly supervised object segmentation [4]
- Semantic Segmentation with Deconvolutions [6]
- Smooth Grad [8]
The code uses pretrained VGG19, VGG16 and AlexNet in the model zoo. Some of the code assumes that the layers in the model are separated into two sections; features, which contains the convolutional layers and classifier, that contains the fully connected layer (after flatting out convolutions). If you want to port this code to use it on your model that does not have such separation, you just need to do some editing on parts where it calls model.features and model.classifier.
All images are pre-processed with mean and std of the ImageNet dataset before being fed to the model. None of the code uses GPU as these operations are quite fast (for a single image). You can make use of gpu with very little effort. The examples below include numbers in the brackets after the description, like Mastiff (243), this number represents the class id in the ImageNet dataset.
I tried to comment on the code as much as possible, if you have any issues understanding it or porting it, don't hesitate to reach out.
Below, are some sample results for each operation.
CNN filters can be visualized when we optimize the input image with respect to output of the specific convolution operation. For this example I used a pre-trained VGG16. Visualizations of layers start with basic color and direction filters at lower levels. As we approach towards the final layer the complexity of the filters also increases. If you employ techniques like blurring, gradient clipping etc. you will probably produce better images.
Layer 2 (Conv 1-2) |
|||
Layer 10 (Conv 2-1) |
|||
Layer 17 (Conv 3-1) |
|||
Layer 24 (Conv 4-1) |
Deep dream is technically the same operation as layer visualization the only difference is that you don't start with a random image but use another picture. The samples below were created with VGG19, the produced result is entirely up to the filter so it is kind of hit or miss. The more complex models produce mode high level features, meaning that If you replace VGG19 with an Inception variant you will get more noticable shapes when you target higher conv layers. Like layer visualization, if you employ additional techniques like gradient clipping, blurring etc. you might get better visualizations.
Original Image | |
VGG19 Layer: 34 (Final Conv. Layer) Filter: 94 |
|
VGG19 Layer: 34 (Final Conv. Layer) Filter: 103 |
This operation produces different outputs based on the model and the applied regularization method. Below, are some samples produced with L2 regularization from VGG19. Note that these images are generated with regular CNNs with optimizing the input (rather than the model weights) and not with GANs.
Target class: Worm Snake (52) - (VGG19) | Target class: Spider (72) - (VGG19) |
The samples below show the produced image with no regularization, l1 and l2 regularizations on target class: flamingo (130) to show the differences between regularization methods. These images are generated with a pretrained AlexNet.
No Regularization | L1 Regularization | L2 Regularization |
Produced samples can further be optimized to resemble the desired target class, some of the operations you can incorporate to improve quality are; blurring, clipping gradients that are below a certain treshold, random color swaps on some parts, random cropping the image, forcing generated image to follow a path to force continuity.
This operation is quite similar to generating class specific images, we start with a random image and continously update the image with targeted backpropagation (for a certain class) and stop when we achieve target confidence for that class. All of the below images are generated from pretrained AlexNet to fool it.
Predicted as Zebra (340) Confidence: 0.94 |
Predicted as Bow tie (457) Confidence: 0.95 |
Predicted as Castle (483) Confidence: 0.99 |
For this operation we start with an image and perform gradient updates on the image for a specific class but with smaller learning rates so that the original image does not change too much. As it can be seen from samples, on some images it is almost impossible to recognize the difference between two images but on others it can clearly be observed that something is wrong. All of the examples below were created from and tested on AlexNet to fool it.
torch >= 0.2.0.post4
torchvision >= 0.1.9
numpy >= 1.13.0
opencv >= 3.1.0
[1] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for Simplicity: The All Convolutional Net, https://arxiv.org/abs/1412.6806
[2] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba. Learning Deep Features for Discriminative Localization, https://arxiv.org/abs/1512.04150
[3] R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, https://arxiv.org/abs/1610.02391
[4] K. Simonyan, A. Vedaldi, A. Zisserman. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, https://arxiv.org/abs/1312.6034
[5] A. Mahendran, A. Vedaldi. Understanding Deep Image Representations by Inverting Them, https://arxiv.org/abs/1412.0035
[6] H. Noh, S. Hong, B. Han, Learning Deconvolution Network for Semantic Segmentation https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Noh_Learning_Deconvolution_Network_ICCV_2015_paper.pdf
[7] A. Nguyen, J. Yosinski, J. Clune. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images https://arxiv.org/abs/1412.1897
[8] D. Smilkov, N. Thorat, N. Kim, F. Viégas, M. Wattenberg. SmoothGrad: removing noise by adding noise https://arxiv.org/abs/1706.03825
[9] MD. Zeiler, R. Fergus. Visualizing and Understanding Convolutional Networks https://arxiv.org/abs/1311.2901
[10] A. Mordvintsev, C. Olah, M. Tyka. Inceptionism: Going Deeper into Neural Networks https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html