This code is based on the A3C implementaiton by Ilya Kostrikov.
TODO
- Give a summary of the project (maybe from report)
- Add report
- Add link to defense
I used c5.xlarge
configuration with 16 vCPUs and run A3C with 16 processes.
ssh -L localhost:8888:localhost:8888 -i key_pair.pem [email protected]
source activate pytorch_p36
git clone https://github.com/utanashati/curiosity-recast.git
pip install --upgrade pip
pip install opencv-python tensorboard tensorboard_logger
sudo apt-get update
sudo apt-get upgrade
(If you get "Recourse temporarily unavailable", wait until the machine has 2/2 checks (or switch to the other pip installation in the meantime).)
pip install gym gym[atari]
sudo apt-get install libav-tools
sudo apt-get install default-jdk pulseaudio
ZDoom dependencies
sudo apt-get install build-essential zlib1g-dev libsdl2-dev libjpeg-dev \
nasm tar libbz2-dev libgtk2.0-dev cmake git libfluidsynth-dev libgme-dev \
libopenal-dev timidity libwildmidi-dev unzip
Cmake Issue
sudo rm -r /usr/local/bin/cmake
sudo /home/ubuntu/anaconda3/envs/pytorch_p36/bin/pip install vizdoom
pytorch-a3c: Pong Deterministic
python main.py --game "atari" --env-name "PongDeterministic-v4" --num-processes 16 --save-model-again-eps 5 --save-video-again-eps 1 --max-episodes 20 --random-seed --no-curiosity --short-description "pong-nocuriosity" --num-stack 1
noreward-rl: VizDoom
python main.py --num-processes 16 --game "doom" --env-name "dense" --time-sleep 60 --save-model-again-eps 5 --save-video-again-eps 1 --max-episodes 250 --short-description "doom-curiosity"
For Picolmaze, we did not train an RL algorithm, just the inverse and then forward models, and compared the baseline to the one with a new loss.
To train the inverse model for 9 rooms with a periodic arena in 'ascending entropies' setting:
python main_uniform.py --num-processes 1 --time-sleep 20 --save-model-again-eps 5 --max-episodes 100 --short-description "uniform-9-diff-periodic-same-env" --beta 0 --num-rooms 9 --colors "diff_1_num_rooms" --periodic
Same for deterministic setting:
python main_uniform.py --num-processes 1 --time-sleep 20 --save-model-again-eps 5 --max-episodes 100 --short-description "uniform-9-same-1-periodic-same-env" --beta 0 --num-rooms 9 --colors "same_1" --periodic
Same for 8 options per room:
python main_uniform.py --num-processes 1 --time-sleep 20 --save-model-again-eps 5 --max-episodes 100 --short-description "uniform-9-same-8-periodic-same-env" --beta 0 --num-rooms 9 --colors "same_8" --periodic
Now, to train the baseline forward model for the same settings in the same order:
python main_uniform.py --num-processes 1 --time-sleep 20 --save-model-again-eps 5 --max-episodes 100 --short-description "uniform-9-diff-periodic-same-env-forw" --beta 1 --num-rooms 9 --colors "diff_1_num_rooms" --curiosity-file "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-diff-periodic-same-env)/models/curiosity_XXXX.XX.XX-XX.XX.XX_XXXXXX.pth" --periodic --env-folder "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-diff-periodic-same-env)"
python main_uniform.py --num-processes 1 --time-sleep 20 --save-model-again-eps 5 --max-episodes 100 --short-description "uniform-9-same-1-periodic-same-env-forw" --beta 1 --num-rooms 9 --colors "same_1" --curiosity-file "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-same-1-periodic-same-env)/models/curiosity_XXXX.XX.XX-XX.XX.XX_XXXXXX.pth" --periodic --env-folder "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-same-1-periodic-same-env)/"
python main_uniform.py --num-processes 1 --time-sleep 20 --save-model-again-eps 5 --max-episodes 100 --short-description "uniform-9-same-8-periodic-same-env-forw" --beta 1 --num-rooms 9 --colors "same_8" --curiosity-file "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-same-8-periodic-same-env)/models/curiosity_XXXX.XX.XX-XX.XX.XX_XXXXXX.pth" --periodic --env-folder "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-same-8-periodic-same-env)/"
Use the inverse model file you got as input.
Note that for the inverse model, beta = 0
, and now beta = 1
, following the equation
from Pathak et al. (beta == 0
<=> only the inverse model is being trained, beta == 1
<=> only the forward model is being trained, ).
Finally, to train the modified forward model for the same settings in the same order:
python main_uniform.py --num-processes 1 --time-sleep 20 --save-model-again-eps 5 --max-episodes 100 --short-description "uniform-9-diff-periodic-same-env-forw" --beta 1 --num-rooms 9 --colors "diff_1_num_rooms" --curiosity-file "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-diff-periodic-same-env)/models/curiosity_XXXX.XX.XX-XX.XX.XX_XXXXXX.pth" --periodic --env-folder "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-diff-periodic-same-env)" --new-curiosity
python main_uniform.py --num-processes 1 --time-sleep 20 --save-model-again-eps 5 --max-episodes 100 --short-description "uniform-9-same-1-periodic-same-env-forw" --beta 1 --num-rooms 9 --colors "same_1" --curiosity-file "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-same-1-periodic-same-env)/models/curiosity_XXXX.XX.XX-XX.XX.XX_XXXXXX.pth" --periodic --env-folder "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-same-1-periodic-same-env)/" --new-curiosity
python main_uniform.py --num-processes 1 --time-sleep 20 --save-model-again-eps 5 --max-episodes 100 --short-description "uniform-9-same-8-periodic-same-env-forw" --beta 1 --num-rooms 9 --colors "same_8" --curiosity-file "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-same-8-periodic-same-env)/models/curiosity_XXXX.XX.XX-XX.XX.XX_XXXXXX.pth" --periodic --env-folder "runs/picolmaze/XXXX.XX.XX-XX.XX.XX(uniform-9-same-8-periodic-same-env)/" --new-curiosity