Skip to content
This repository has been archived by the owner on Apr 9, 2024. It is now read-only.

Unconditional setting does not yield reasonable results on the challenging MOVi-E #12

Open
ruili3 opened this issue Jan 30, 2023 · 2 comments

Comments

@ruili3
Copy link

ruili3 commented Jan 30, 2023

Thanks for sharing the code for this impressive work!

Since there are no unconditional configs in the codebase, I write it by commenting out the "config.conditioning_key" and setting the "initializer" to "savi.modules.GaussianStateInit," where the "shape" param is set to [config.num_slots, 128].

I got reasonable unconditional results (with FG-ARI) on MOVi-A for SAVi (conditional res: 91.0; unconditional res: 70.3). However, when I changed the dataset to MOVi-E, I got much worse unconditional results for both SAVi (conditional res: 8.7; unconditional res: 1.8) and SAVi++ (conditional res: 81.5; unconditional res: 1.8). The code for setting the "initializer" in my unconditional SAVi++ is shown below

 "initializer": ml_collections.ConfigDict({
     "module": "savi.modules.GaussianStateInit",
     "shape": [config.num_slots,128] # num_slots==24 -- for movi-e dataset
 }),

Since there is no quantitative unconditional result on MOVi-E shown in the SAVi++ paper, I wonder whether you also observe the same results on MOVi-E during your experiments, or if it is caused by some wrong configurations made in my unconditional setting. Thank you if you can provide further information!

@tkipf
Copy link
Contributor

tkipf commented Jan 30, 2023

Thanks for your question!

I think both savi.modules.GaussianStateInit and savi.modules.ParamStateInit are worth trying in this setting. You can also try to reduce the number of slots (e.g. to 11), as otherwise the model might easily oversegment. This will no longer allow the model to capture all objects, but it will likely avoid some other failure modes.

Further, it can be helpful to increase the number of Slot Attention iterations in the unconditional case, e.g. to 2 instead of 1. I can also recommend replacing the TransformerBlock with a simple MLP, which typically results in more stable training (at the expense of not being able to model physical interactions between objects). You can achieve this by setting the module to savi.modules.MLP. Recommended configuration options: hidden_size = 256, layernorm = "pre", and residual = True. Alternatively, you can also try the savi.modules.Identity module for potentially even more stable training.

@hnyu
Copy link

hnyu commented Aug 2, 2023

Thanks for sharing the code for this impressive work!

Since there are no unconditional configs in the codebase, I write it by commenting out the "config.conditioning_key" and setting the "initializer" to "savi.modules.GaussianStateInit," where the "shape" param is set to [config.num_slots, 128].

I got reasonable unconditional results (with FG-ARI) on MOVi-A for SAVi (conditional res: 91.0; unconditional res: 70.3). However, when I changed the dataset to MOVi-E, I got much worse unconditional results for both SAVi (conditional res: 8.7; unconditional res: 1.8) and SAVi++ (conditional res: 81.5; unconditional res: 1.8). The code for setting the "initializer" in my unconditional SAVi++ is shown below

 "initializer": ml_collections.ConfigDict({
     "module": "savi.modules.GaussianStateInit",
     "shape": [config.num_slots,128] # num_slots==24 -- for movi-e dataset
 }),

Since there is no quantitative unconditional result on MOVi-E shown in the SAVi++ paper, I wonder whether you also observe the same results on MOVi-E during your experiments, or if it is caused by some wrong configurations made in my unconditional setting. Thank you if you can provide further information!

Hi @ruili3 , are you predicting RGB or optical flow or depth on MOVI-A?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants