Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing the Amago Performance on PopGym MineSweeperMedium #79

Open
Leiay opened this issue Jan 27, 2025 · 1 comment
Open

Reproducing the Amago Performance on PopGym MineSweeperMedium #79

Leiay opened this issue Jan 27, 2025 · 1 comment

Comments

@Leiay
Copy link

Leiay commented Jan 27, 2025

HI, thank you for your great work and open-sourced code.

We are trying to reproduce your performance on the MineSweeperMedium, on the PopGym dataset. First, we just run the 03_popgym_suite.py code, without changing any hyper-parameters, but it results in the pink run in the attached figure, which barely exceeds -0.32.

Later, we also try to change the hyper-parameters according to the hyparam table (Table 3 and Table 4) in the amago paper. We made the following changes:

  • Changing the timestep encoder from input dim-256-256-256 to input dim-512-512-200
  • Gradient norm clipping from 2 to 1
  • Learning rate from 3e-4 to 1e-4
  • Buffer size from 20k to 80k.
  • Exploration step annealing from 400k to 1m.

However, this results in the teal run in the attached figure. We wonder if you could provide an example code to run to reproduce your result in the amago paper, that reaches 0.06 at 18M timesteps? For instance, what are the max_seq_len, train_timesteps_per_epoch? Are they the default values in the repo?

Thanks very much for your help in advance!

Image

@jakegrigsby
Copy link
Collaborator

jakegrigsby commented Jan 28, 2025

Hi, I think the surface-level problem is that the popgym command is the one listed in the "Example wandb" on the readme: python 03_popgym_suite.py --env MineSweeperMedium --parallel_actors 24 --epochs 650 --dset_max_size 80_000 --env_mode sync --memory_layers 3 --memory_size 256 --run_name <your_run_name> --buffer_dir <your_buffer_dir>. So you might be using a higher update:data than intended?

However I did some tests myself and found the performance has dropped to about -.2 (edit: even when changing from the more recent lr hparams back to the orignals). The codebase has gone through large changes but I do try to maintain POPGym results, and test on some of the envs before almost every merge. For example here are some recent reference runs, including MineSweeprEasy I put in the public wandb. These would have used the command above and the current 03_popgym_suite example script.

Image

So it would help me to know whether this was the first env you tried, or are you sweeping many and this is the one of a few that didn't match?

I dug up one of the old official seeds from the paper here https://wandb.ai/jakegrigsby/amagov3-public-popgym-regression/runs/zgy5ho9w/overview and have been throwing a bunch of GPUs at ablating every minor change from the original codebase I can think of. This will take me a bit more time to finish and I'll get back to you with more info/results tonight or tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants