You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HI, thank you for your great work and open-sourced code.
We are trying to reproduce your performance on the MineSweeperMedium, on the PopGym dataset. First, we just run the 03_popgym_suite.py code, without changing any hyper-parameters, but it results in the pink run in the attached figure, which barely exceeds -0.32.
Later, we also try to change the hyper-parameters according to the hyparam table (Table 3 and Table 4) in the amago paper. We made the following changes:
Changing the timestep encoder from input dim-256-256-256 to input dim-512-512-200
Gradient norm clipping from 2 to 1
Learning rate from 3e-4 to 1e-4
Buffer size from 20k to 80k.
Exploration step annealing from 400k to 1m.
However, this results in the teal run in the attached figure. We wonder if you could provide an example code to run to reproduce your result in the amago paper, that reaches 0.06 at 18M timesteps? For instance, what are the max_seq_len, train_timesteps_per_epoch? Are they the default values in the repo?
Thanks very much for your help in advance!
The text was updated successfully, but these errors were encountered:
Hi, I think the surface-level problem is that the popgym command is the one listed in the "Example wandb" on the readme: python 03_popgym_suite.py --env MineSweeperMedium --parallel_actors 24 --epochs 650 --dset_max_size 80_000 --env_mode sync --memory_layers 3 --memory_size 256 --run_name <your_run_name> --buffer_dir <your_buffer_dir>. So you might be using a higher update:data than intended?
However I did some tests myself and found the performance has dropped to about -.2 (edit: even when changing from the more recent lr hparams back to the orignals). The codebase has gone through large changes but I do try to maintain POPGym results, and test on some of the envs before almost every merge. For example here are some recent reference runs, including MineSweeprEasy I put in the public wandb. These would have used the command above and the current 03_popgym_suite example script.
So it would help me to know whether this was the first env you tried, or are you sweeping many and this is the one of a few that didn't match?
I dug up one of the old official seeds from the paper here https://wandb.ai/jakegrigsby/amagov3-public-popgym-regression/runs/zgy5ho9w/overview and have been throwing a bunch of GPUs at ablating every minor change from the original codebase I can think of. This will take me a bit more time to finish and I'll get back to you with more info/results tonight or tomorrow.
HI, thank you for your great work and open-sourced code.
We are trying to reproduce your performance on the MineSweeperMedium, on the PopGym dataset. First, we just run the
03_popgym_suite.py
code, without changing any hyper-parameters, but it results in the pink run in the attached figure, which barely exceeds -0.32.Later, we also try to change the hyper-parameters according to the hyparam table (Table 3 and Table 4) in the amago paper. We made the following changes:
However, this results in the teal run in the attached figure. We wonder if you could provide an example code to run to reproduce your result in the amago paper, that reaches 0.06 at 18M timesteps? For instance, what are the
max_seq_len
,train_timesteps_per_epoch
? Are they the default values in the repo?Thanks very much for your help in advance!
The text was updated successfully, but these errors were encountered: