PyTorch implementation of ExpGen [Paper] (NeurIPS'23).
Please cite using the following entry (bibtex):
@article{zisselman2024explore,
title={Explore to Generalize in Zero-Shot RL},
author={Zisselman, Ev and Lavie, Itai and Soudry, Daniel and Tamar, Aviv},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}
Recommended setup:
- Ubuntu +18.4
- Python +3.7
Clone repo and install dependencies:
git clone https://github.com/EvZissel/expgen.git
cd expgen
conda env create -f environment.yml
conda activate expgen_env
pip install procgen
Note: If you face an issue with libffi/cffi
ImportError: libffi.so.7: cannot open shared object file: No such file or directory
try installing cffi
directly using pip install cffi==1.13.0
Example of training ExpGen on ProcGen environments:
Use a range of different seed values to produce an ensemble of agents:
python train_ppo.py --env-name maze --seed 0 --use_backgrounds
Note: Use seed values 0-9 to reproduce the results presented in the paper.
python train_maxEnt.py --env-name maze --seed 0 --use_backgrounds
python expgen_ensemble.py --env-name maze --use_backgrounds
Note: The hyperparameters used in the paper are set as default values in code.
Using PPO ensemble, ExpGen demonstrates a notable performance gain on games that were not solvable using invariant based approaches.
Alternatively, ExpGen can also utilize invariance based approaches such as an ensemble of IDAAC agents (instead of PPO) to attain invariance and test time exploration, achieving state-of-the-art results.
Hidden Maze Experiment
To reproduce the results of the hidden maze experiment (see Appendix A), use the following command:
python train_ppo.py --env-name maze --seed 0 --num-level 128 --recurrent-policy --mask_all --use_generated_assets --restrict_themes --use_monochrome_assets
This code is based on the open-source PyTorch implementation of PPO.