Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

small input size #4

Closed
dribnet opened this issue Feb 17, 2018 · 1 comment
Closed

small input size #4

dribnet opened this issue Feb 17, 2018 · 1 comment

Comments

@dribnet
Copy link

dribnet commented Feb 17, 2018

I am trying to independently replicate the LPIPS metric in Keras, initially focusing on uncalibrated VGG. Following the README I was getting the test_network.py working, but am a little confused by the three example images ex_ref.png, ex_p0.png, and ex_p1.png and how they are processed.

Each of these images are 64x64, and in test_network.py they are passed to the vgg network without scaling. But the native input size of VGG is 224x224 and the pytorch models documentation clearly states that input sizes are expected to that size (or larger):

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224.

Notably, when provided with 224x224 inputs, the layer sizes are:

  • (64, 224, 224)
  • (128, 112, 112)
  • (256, 56, 56)
  • (512, 28, 28)
  • (512, 14, 14)

However when they left at 64x64 without scaling, the layer sizes are smaller at each stage:

  • (64, 64, 64)
  • (128, 32, 32)
  • (256, 16, 16)
  • (512, 8, 8)
  • (512, 4, 4)

I'm not familiar with pytorch internals and so it's not clear to me how to interpret this behaviour in porting this to Keras. So my questions are:

  • Are these smaller inputs in fact valid ways of using these pre-trained VGG weights?
  • Could the LPIPS metric alternatively be implemented by always scaling inputs to the expected WxH sizes?
@richzhang
Copy link
Owner

Convolutional layers can take on any size spatial input. The sizes of the feature maps will simply be scaled appropriately. You can pass any spatial size input to the metric.*

*above 16x16, so that the conv5 layer will at least be 1x1. I wouldn't recommend using anything below 64x64 though

@dribnet dribnet closed this as completed Feb 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants