Reimplemtation and Model Architecture issues #3

junkeun-yi · 2022-07-06T21:47:03Z

Hello, I am currently trying to reimplement your project in PyTorch in my repository at https://github.com/junkeun-yi/SAVi-pytorch

I have been having a few issues with matching the performance of SAVi-small in the MOVi-C dataset, and wanted some clarifications on the implementation.

I realized that in the paper https://arxiv.org/pdf/2111.12594.pdf, the small variant uses a 64 features for the encoded features, but the code

slot-attention-video/savi/configs/movi/savi_conditional_small.py

Line 111 in 60cd8c1

"features": [32, 32, 32, 32],

suggests that it is only using 32 features, so I made my implementation that way, but I wanted clarification on if this was correct.

Also, my reimplementation has repeatedly achieved subpar performance on the SAVi-small, MOVi-C experiment, where my code gets 62.6 points whereas the official code achieves 64.7 points. As far as I can tell, my code is exactly the same as the original code except for some convolution padding (due to torch not having 'same' padding in strided convolutions and transposed convolutions).

In addition, would it be possible for the authors to provide loss and evaluation curves, as well as number of gpus used for the different experiments? It would be very appreciated and helpful in creating a reimplementation.

Thank you!

The text was updated successfully, but these errors were encountered:

tkipf · 2022-07-19T11:29:37Z

Thank you for your comment!

First of all, thank you for sharing you PyTorch re-implementation. I will have a closer look soon, maybe I can spot some difference (apart from the ones you mention).

Regarding the small SAVi model variant: the code here is correct, but there was indeed a typo in the paper, Appendix Table A.5 -- the number of channels in the very last line (Conv 1x1) should be 32 for the small model instead of 64. The second-to-last line is indeed 64 (we implement this using an MLP after the position embedding w/ a hidden layer size of 64).

I am unable to share training/evaluation curves right now, but what I can say is that they look very unsurprising: the FG-ARI steadily goes up over the course of training and converges fast at first (e.g. around 85% eval FG-ARI for SAVi-S on MOVi-A after 10k steps) and then converges more slowly for the remainder of training (e.g. ~91% eval FG-ARI for SAVi-S on MOVi-A after 50k steps).

Maybe the difference in results can be explained by the conv padding -- you should be able to try this by changing the config in our JAX/Flax codebase.

In terms of #GPUs: you should be able to run this model on a single GPU or on a multi-GPU setting (with data parallelism -- which the code base supports using pmap) -- ultimately it depends on the GPU memory what you can fit on a single GPU. A single V100 GPU should fit both the SAVi-S and SAVi-M models.

tkipf mentioned this issue Jul 19, 2022

Is there a Pytorch version available? #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplemtation and Model Architecture issues #3

Reimplemtation and Model Architecture issues #3

junkeun-yi commented Jul 6, 2022

tkipf commented Jul 19, 2022

Reimplemtation and Model Architecture issues #3

Reimplemtation and Model Architecture issues #3

Comments

junkeun-yi commented Jul 6, 2022

tkipf commented Jul 19, 2022