Beating Montezuma's Revenge

This repository is a fork of the PAAC algorithm presented in Efficient Parallel Methods for Deep Reinforcement Learning.

Training the agent

To train an agent to play, for example, pong run

python3 train.py -g <game-name> -df logs/<game-name>/ -algo paac_cts

Todos

Goals

number of rooms/level explored:
- level 2+, 20+ rooms
- current record#1: 19 rooms & 1 level
score:
- 3500+
- current record#1: 3500
- current record#2: The paper suggests this is probably the best trial and the agent wasn't able to reproduce this level of result easily. 6600

Montezuma's Revenge

Montezuma's Revenge is an early example of the Metroidvania genre.[1] The player controls a character called Panama Joe (a.k.a. Pedro), moving him from room to room in the labyrinthine underground pyramid of the 16th century Aztec temple of emperor Montezuma II, filled with enemies, obstacles, traps, and dangers. The objective is to score points by gathering jewels and killing enemies along the way. Panama Joe must find keys to open doors, collect and use equipment such as torches, swords, amulets, etc., and avoid or defeat the challenges in his path. Obstacles are laser gates, conveyor belts, disappearing floors and fire pits.[2][3][4]

Movement is achieved by jumping, running, sliding down poles, and climbing chains and ladders. Enemies are skulls, snakes, and spiders. A further complication arises in the bottom-most floors of each pyramid, which must be played in total darkness unless a torch is found.

The pyramid is nine floors deep, not counting the topmost entry room that the player drops into at the start of each level, and has 99 rooms to explore. The goal is to reach the Treasure Chamber, whose entrance is in the center room of the lowest level. After jumping in here, the player has a short time to jump from one chain to another and pick up as many jewels as possible. However, jumping onto a fireman's pole will immediately take the player to the next level; when time runs out, the player is automatically thrown onto the pole.

There are nine difficulty levels in all. Though the basic layout of the pyramid remains the same from one level to the next, small changes in details force the player to rethink strategy. These changes include:

Blocking or opening up certain paths (by adding/removing walls or ladders) Adding enemies and obstacles Rearrangement of items More dark rooms and fewer torches (in level 9, the entire pyramid is dark and there are no torches) Enemies that do not disappear after they kill Panama Joe (starting with level 5) The player can reach only the left half of the pyramid in level 1, and only the right half in level 2. Starting with level 3, the entire pyramid is open for exploration.

source

Items

Hammer: If Joe touches a hammer, the killer creatures get harmless for a short time and are displayed in grey.
Jewel: Joe gets 1000 points for each jewel collected.
Key: Only the key with the same colour fits into each door. Their colours are red, dark blue and bright blue.
Sword: If Joe has a sword, he can eliminate a killer creature when touching it.
Torch: The torch lights the dark rooms.

Resources

RL algorithms practice

https://github.com/dennybritz/reinforcement-learning

slides

papers

Intrinsic Motivation: https://arxiv.org/pdf/1606.01868.pdf
Hierarchical RL: https://arxiv.org/pdf/1611.05397.pdf
Reinforcement learning with unsupervised auxiliary tasks(unreal): https://arxiv.org/pdf/1611.05397.pdf
FeUdal Networks for Hierarchical Reinforcement Learning: https://arxiv.org/abs/1703.01161v1
attention
ES Evolution strategies: https://arxiv.org/pdf/1703.03864.pdf
option critic...
https://arxiv.org/abs/1703.01732

implementation

Runing via docker (recommended)

Follow the instructions to install nvidia-docker
Clone this repository
Run the container with nvidia-docker run -it -v <absolute-path>/paac:/root/paac -p 6006:6006 alfredvc/tf1-ale.

A CPU version of the docker container is also provided and can be run with docker run -it -v <absolute-path>/paac:/root/paac -p 6006:6006 alfredvc/tf1-ale:cpu. When running on the CPU pass the device flag -d '/cpu:0' to the training script.

Runing locally

If you use Anaconda, you can try conda env create -f environment.yml.

Requirements

Python 3.4+
TensorFlow 1.0+ (choose a GPU version, if you have GPU)
Arcade-Learning-Environment
cython (pip3 package)
scikit-image (pip3 package)
python3-tk
opencv (opencv-python)

Training the agent

To train an agent to play, for example, pong run

python3 train.py -g <game-name> -df logs/<game-name>/

Visualizing training

Open a new terminal
Attach to the running docker container with docker exec -it CONTAINER_NAME bash
Run tensorboard --logdir=<absolute-path>/paac/logs/tf.
In your browser navigate to localhost:6006/

If running locally, skip step 2.

Testing the agent

To test the performance of a trained agent run python3 test.py -f logs/ -tc 5 Output:

Performed 5 tests for seaquest.
Mean: 1704.00
Min: 1680.00
Max: 1720.00
Std: 14.97

Generating gifs

python3 test.py -f logs/<game-name>/ -gn breakout

This may take a few minutes.

ideas

segmentation -> objects detection
attention model
transfer learning
life -> give larger penalty for losing a life

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
algorithms		algorithms
environments		environments
logs		logs
networks		networks
utilities		utilities
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beating Montezuma's Revenge

Training the agent

Todos

Goals

Montezuma's Revenge

Items

Resources

RL algorithms practice

slides

papers

other readings

implementation

Runing via docker (recommended)

Runing locally

Training the agent

Visualizing training

Testing the agent

Generating gifs

ideas

About

Releases

Packages

Languages

License

sangjin-park/beating-montezuma

Folders and files

Latest commit

History

Repository files navigation

Beating Montezuma's Revenge

Training the agent

Todos

Goals

Montezuma's Revenge

Items

Resources

RL algorithms practice

slides

papers

other readings

implementation

Runing via docker (recommended)

Runing locally

Training the agent

Visualizing training

Testing the agent

Generating gifs

ideas

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages