Reinforcement Learning for Mega Man (NES)

This repository contain a full reproducible pipeline on how to train a Deep Reinforcement Learning agent to complete the Cut Man stage of the first game of the Mega Man series from NES.

I started it with the intention to make a full run of the game using AI only (except for some parts where I intended to hardcode a few commands, like selecting a stage). It was much harder than I expected. I always worked alone on it and I don't have the time nor the energy to make a full run of the game by myself.

All being said, contributions are very welcome! I'd be glad to answer any questions and work on solutions with other people. If you're interested, contact me on stable-retro chat from Farama Foundation Discord Server (my handler is @el_sevs).

🇧🇷: Providenciarei um vídeo sobre o projeto assim que terminá-lo; enquanto isso, veja outros vídeos no meu canal do YouTube!

Environment specifications

Observations

The observation space is a Box(0, 255, (3, 84, 84), uint8) gymnasium space instance.

Observation preprocessing is done in this order:

We use "classical" frame skipping (as in [1]) and an additional dynamic frame skipping is applied when the character is not visible on the screen (when it just took damage) until we have a visual clue of where the character is
The frames are warped to 84x84 greyscale images
We stack 3 frames (as in [1]) to give the model a sense of movement: we show to the agent the current frame and the past two frames after frame skipping

Actions

The action space is a MultiDiscrete([2 5 2]) gymnasium space instance, which means it has 3 axes: one for jumping, one for directionals; and one for shooting (for details, check megai_man/custom_integrations/MegaMan-v1-Nes/scenario.json).

Shooting action is masked during pre-training (when the agent is learning on the environment without enemies) since it only needs it when there are shootable objects on the screen. We decrease entropy that way for faster learning.

Rewards

This is where things get really messy. There are two types of reward that act exclusively to one another: stage reward and boss reward. Stage reward is applied during most of the stage and boss reward is applied as soon as the player enters the boss chamber.

Stage reward is a composition of two main factors: distance progress and damage taken. A distance map was made using megai_man/collision_mapper.py, which applies Wavefront Expansion with heuristics to calculate which paths the agent can take from the start of the stage to the boss chamber (the end of the stage) and how good is each path. The agent doesn't have direct access to it, but it's used to give positive or negative rewards depending on how much closer the agent is getting to the goal. There are lots of details that don't fit here (but were important to achieve stage completion), but the most important thing is that the positive reward it gets from going forwards is smaller in magnitude than the negative reward it gets from going backwards. This is critical to progress (the agent gets stuck in an infinite loop going forwards and backwards without it). This idea was taken from [8].
Boss reward is as simple as a positive reward for damaging the enemy and a negative reward for taking damage.

Termination and Truncation

Episode will be terminated if any of these conditions apply:

Player health drops to 0;
A life was lost;
Player touched a spike;
During pre-training, the agent falls from a higher to a lower previous screen;
Player reached boss chamber (this was done to clearly separate these two situations and train them separately).

Episode will be truncated if any of these conditions apply:

After (60 * 60) // frameskip_count = 900 timesteps (roughly equivalent to 60 seconds of real gameplay) since the last time the agent got to the closest point to the boss chamber it ever got in that episode (not valid on the boss fight);
After (60 * 360) // frameskip = 5400 timesteps (roughly equivalent to 360 seconds of real gameplay) since the start of the episode.

How to reproduce

Extract ROM image from your own Mega Man (US version) original NES cartridge and save it as megai_man/custom_integrations/MegaMan-v1-Nes/rom.nes
Create a virtual environment and install dependencies from pyproject.toml with Poetry or install dependencies directly from requirements.txt with pip
Make a modified version of the ROM using Rock and Roll Level Editor. You need to remove all the enemies from Cut Man's stage for the pre-training step. Save the new ROM as megai_man/custom_integrations/MegaMan-noEnemies-Nes/rom.nes
Run megai_man/train.py for at least 25M steps (I ran it for 27M)
Rename the resulting model to models/no_enemies_complete.zip
Run megai_man/train_pretrained.py for at least 30M steps. The best model will be stored at models/cutman_pretrained_noTermBackScreen_gamma95_10spikepunish_enemies_curriculum_best/best_model.zip
You can see it in action by running megai_man/test.py. If you want to record a movie, change the record parameter of the make_venv function to record=".". The concept of movie here is taken from the Tool Assisted Speedrun community, as it's only a button press record, not the actual frames from the game. It will be stored as a bk2 file
If you want to render the movie into an actual video, run megai_man/playback_movie.py MegaMan-v1-Nes-CutMan-000000.bk2
For boss fight, run megai_man/train_boss.py for 400k steps. You can see it in action by running megai_man/test_boss.py

Results of any training step will show up inside logs directory. You can visualize them with TensorBoard by running tensorboard --logdir logs inside project root directory.

PS: you might get away with the third step (modifying the original ROM) by skipping to step 6, which will remove enemies from early screens (and progressively add more enemies as training goes on) by directly modifying RAM as the AI plays it. I didn't test it this way and, even if it works, it will probably be much slower since there will be no action masking. This could be a future improvement. Feel free to contribute!

Credits

Images in images/bg were ripped by Rick N. Bruns (taken from here).

Tilesets in images/tiles were ripped by Mister Mike (taken from here).

References

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (n.d.). Playing Atari with Deep Reinforcement Learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1312.5602
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy optimization Algorithms. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1707.06347
Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., De Oliveira Pinto, H. P., Raiman, J., Salimans, T., Schlatter, J., . . . Zhang, S. (2019). Dota 2 with Large Scale Deep Reinforcement Learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1912.06680
Anssi, K., Christian, S., & Ville, H. (2020). Action space shaping in deep reinforcement learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2004.00980
Andrychowicz, M., Raichuk, A., Stańczyk, P., Orsini, M., Girgin, S., Marinier, R., Hussenot, L., Geist, M., Pietquin, O., Michalski, M., Gelly, S., & Bachem, O. (2020). What matters in On-Policy Reinforcement Learning? A Large-Scale Empirical Study. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2006.05990
Huang, S., Fernand Julien Dossa, R., Raffin, A., Kanervisto, A., & Wang, W. (2022, March 25). The 37 Implementation Details of Proximal Policy Optimization. The ICLR Blog Track. Retrieved October 10, 2024, from https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
Jones, A. (n.d.). Debugging reinforcement learning systems. andy jones. Retrieved October 10, 2024, from https://andyljones.com/posts/rl-debugging.html
DarkAutumn/triforce: A deep learning agent for The Legend of Zelda (nes). (n.d.). GitHub. https://github.com/DarkAutumn/triforce
Mega Man (NES)/RAM map - Data Crystal. (2024). In Mega Man (NES)/RAM map. Retrieved October 10, 2024, from https://datacrystal.tcrf.net/wiki/Mega_Man_(NES)/RAM_map
Yliluoma, J. (2013, October 6). Rockman / Mega Man Source Code: Main disassembly (banks 5,6,7). Joel Yliluoma. Retrieved October 10, 2024, from https://bisqwit.iki.fi/jutut/megamansource/maincode.txt

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
checkpoints		checkpoints
dataset		dataset
gifs		gifs
images		images
layers_activations		layers_activations
megai_man		megai_man
models		models
movies		movies
notebooks		notebooks
studies		studies
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning for Mega Man (NES)

Environment specifications

Observations

Actions

Rewards

Termination and Truncation

How to reproduce

Credits

References

About

Releases

Packages

Languages

victorsevero/megai_man

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning for Mega Man (NES)

Environment specifications

Observations

Actions

Rewards

Termination and Truncation

How to reproduce

Credits

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages