Antialiased CNNs [Project Page] [Paper]
Making Convolutional Networks Shift-Invariant Again
Richard Zhang.
To appear in ICML, 2019.
This repository contains examples of anti-aliased convnets. We build off publicly available PyTorch ImageNet and model repositories, with antialiasing add-ons:
- antialiased AlexNet, VGG, ResNet, DenseNet architectures, along with weights. Few lines to load a pretrained antialiased model
import models_lpf
model = models_lpf.resnet.resnet50(filter_size=3)
model.load_state_dict(torch.load('./weights/resnet50_lpf3.pth.tar')['state_dict'])
-
an antialiasing layer (called
BlurPool
in the paper), which can be easily plugged into your favorite architecture as a downsampling substitute -
ImageNet training code and evaluation code. This includes shift-invariant benchmarking code (
-es
flag). Achieving better consistency, while maintaining or improving accuracy, is an open problem. Help improve the results!
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
All material is made available under Creative Commons BY-NC-SA 4.0 license by Adobe Inc. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicating any changes that you've made.
The repository builds off the PyTorch examples repository and torchvision models repository. It is BSD-style licensed.
- Install PyTorch (pytorch.org)
pip install -r requirements.txt
- Run
bash weights/get_antialiased_models.sh
If you'd just like to load a pretrained antialiased model, perhaps as a backbone for your application, just do the following.
Run bash weights/get_antialiased_models.sh
to get model weights. The following gives you an anti-aliased ResNet50 (filter size 3).
import torch
import models_lpf.resnet
model = models_lpf.resnet.resnet50(filter_size=3)
model.load_state_dict(torch.load('weights/resnet50_lpf3.pth.tar')['state_dict'])
We also provide weights for antialiased AlexNet
, VGG16(bn)
, Resnet18,34,50,101
, Densenet121
(see example_usage.py).
The methodology is simple -- first evaluate with stride 1, and then use our Downsample
layer to do antialiased downsampling.
- Copy
models_lpf
into your codebase. This file This contains theDownsample
class which does blur+subsampling. Put the following into your header to get theDownsample
class.
from models_lpf import *
- Make the following architectural changes to antialias your strided layers.
Original | Anti-aliased replacement | |
---|---|---|
MaxPool --> MaxBlurPool |
[nn.MaxPool2d(kernel_size=2, stride=2),] |
[nn.MaxPool2d(kernel_size=2, stride=1), Downsample(filt_size=M, stride=2, channels=C)] |
StridedConv --> ConvBlurPool |
[nn.Conv2d(Cin, C, kernel_size=3, stride=2, padding=1), nn.ReLU(inplace=True)] |
[nn.Conv2d(Cin, C, kernel_size=3, stride=1, padding=1), nn.ReLU(inplace=True), Downsample(filt_size=M, stride=2, channels=C)] |
AvgPool --> BlurPool |
nn.AvgPool2d(kernel_size=2, stride=2) |
Downsample(filt_size=M, stride=2, channels=C) |
We assume tensor has C
channels. For blur kernel size M
, 3 or 5 is typical.
Note that this requires computing a layer at stride 1 instead of stride 2, which adds memory and run-time. We typically skip this step at the highest-resolution (early in the network), to prevent large increases.
We show consistency (y-axis) vs accuracy (x-axis) for various networks. Up and to the right is good. Training and testing instructions are here.
We italicize a variant if it is not on the Pareto front -- that is, it is strictly dominated in both aspects by another variant. We bold a variant if it is on the Pareto front. We bold highest values per column.
Note that the current arxiv paper is slightly out of date; we will update soon.
AlexNet (plot)
Accuracy | Consistency | |
---|---|---|
Baseline | 56.55 | 78.18 |
Rect-2 | 57.24 | 81.33 |
Tri-3 | 56.90 | 82.15 |
Bin-5 | 56.58 | 82.51 |
VGG16 (plot)
Accuracy | Consistency | |
---|---|---|
Baseline | 71.59 | 88.52 |
Rect-2 | 72.15 | 89.24 |
Tri-3 | 72.20 | 89.60 |
Bin-5 | 72.33 | 90.19 |
VGG16bn (plot)
Accuracy | Consistency | |
---|---|---|
Baseline | 73.36 | 89.24 |
Rect-2 | 74.01 | 90.72 |
Tri-3 | 73.91 | 91.10 |
Bin-5 | 74.05 | 91.35 |
ResNet18 (plot)
Accuracy | Consistency | |
---|---|---|
Baseline | 69.74 | 85.11 |
Rect-2 | 71.39 | 86.90 |
Tri-3 | 71.69 | 87.51 |
Bin-5 | 71.38 | 88.25 |
ResNet34 (plot)
Accuracy | Consistency | |
---|---|---|
Baseline | 73.30 | 87.56 |
Rect-2 | 74.46 | 89.14 |
Tri-3 | 74.33 | 89.32 |
Bin-5 | 74.20 | 89.49 |
ResNet50 (plot)
Accuracy | Consistency | |
---|---|---|
Baseline | 76.16 | 89.20 |
Rect-2 | 76.81 | 89.96 |
Tri-3 | 76.83 | 90.91 |
Bin-5 | 77.04 | 91.31 |
ResNet101 (plot)
Accuracy | Consistency | |
---|---|---|
Baseline | 77.37 | 89.81 |
Rect-2 | 77.82 | 91.04 |
Tri-3 | 78.13 | 91.62 |
Bin-5 | 77.92 | 91.74 |
DenseNet121 (plot)
Accuracy | Consistency | |
---|---|---|
Baseline | 74.43 | 88.81 |
Rect-2 | 75.04 | 89.53 |
Tri-3 | 75.14 | 89.78 |
Bin-5 | 75.03 | 90.39 |
This repository is built off the PyTorch ImageNet training and torchvision models repositories.
If you find this useful for your research, please consider citing this bibtex. Please contact Richard Zhang <rizhang at adobe dot com> with any comments or feedback.