This repository contain a full reproducible pipeline on how to train a Deep Reinforcement Learning agent to complete the Cut Man stage of the first game of the Mega Man series from NES.
I started it with the intention to make a full run of the game using AI only (except for some parts where I intended to hardcode a few commands, like selecting a stage). It was much harder than I expected. I always worked alone on it and I don't have the time nor the energy to make a full run of the game by myself.
All being said, contributions are very welcome! I'd be glad to answer any questions and work on solutions with other people. If you're interested, contact me on stable-retro
chat from Farama Foundation Discord Server (my handler is @el_sevs
).
🇧🇷: Providenciarei um vídeo sobre o projeto assim que terminá-lo; enquanto isso, veja outros vídeos no meu canal do YouTube!
The observation space is a Box(0, 255, (3, 84, 84), uint8)
gymnasium space instance.
Observation preprocessing is done in this order:
- We use "classical" frame skipping (as in [1]) and an additional dynamic frame skipping is applied when the character is not visible on the screen (when it just took damage) until we have a visual clue of where the character is
- The frames are warped to 84x84 greyscale images
- We stack 3 frames (as in [1]) to give the model a sense of movement: we show to the agent the current frame and the past two frames after frame skipping
The action space is a MultiDiscrete([2 5 2])
gymnasium space instance, which means it has 3 axes: one for jumping, one for directionals; and one for shooting (for details, check megai_man/custom_integrations/MegaMan-v1-Nes/scenario.json
).
Shooting action is masked during pre-training (when the agent is learning on the environment without enemies) since it only needs it when there are shootable objects on the screen. We decrease entropy that way for faster learning.
This is where things get really messy. There are two types of reward that act exclusively to one another: stage reward and boss reward. Stage reward is applied during most of the stage and boss reward is applied as soon as the player enters the boss chamber.
-
Stage reward is a composition of two main factors: distance progress and damage taken. A distance map was made using
megai_man/collision_mapper.py
, which applies Wavefront Expansion with heuristics to calculate which paths the agent can take from the start of the stage to the boss chamber (the end of the stage) and how good is each path. The agent doesn't have direct access to it, but it's used to give positive or negative rewards depending on how much closer the agent is getting to the goal. There are lots of details that don't fit here (but were important to achieve stage completion), but the most important thing is that the positive reward it gets from going forwards is smaller in magnitude than the negative reward it gets from going backwards. This is critical to progress (the agent gets stuck in an infinite loop going forwards and backwards without it). This idea was taken from [8]. -
Boss reward is as simple as a positive reward for damaging the enemy and a negative reward for taking damage.
Episode will be terminated if any of these conditions apply:
- Player health drops to 0;
- A life was lost;
- Player touched a spike;
- During pre-training, the agent falls from a higher to a lower previous screen;
- Player reached boss chamber (this was done to clearly separate these two situations and train them separately).
Episode will be truncated if any of these conditions apply:
- After
(60 * 60) // frameskip_count = 900 timesteps
(roughly equivalent to 60 seconds of real gameplay) since the last time the agent got to the closest point to the boss chamber it ever got in that episode (not valid on the boss fight); - After
(60 * 360) // frameskip = 5400 timesteps
(roughly equivalent to 360 seconds of real gameplay) since the start of the episode.
- Extract ROM image from your own Mega Man (US version) original NES cartridge and save it as
megai_man/custom_integrations/MegaMan-v1-Nes/rom.nes
- Create a virtual environment and install dependencies from
pyproject.toml
with Poetry or install dependencies directly fromrequirements.txt
with pip - Make a modified version of the ROM using Rock and Roll Level Editor. You need to remove all the enemies from Cut Man's stage for the pre-training step. Save the new ROM as
megai_man/custom_integrations/MegaMan-noEnemies-Nes/rom.nes
- Run
megai_man/train.py
for at least 25M steps (I ran it for 27M) - Rename the resulting model to
models/no_enemies_complete.zip
- Run
megai_man/train_pretrained.py
for at least 30M steps. The best model will be stored atmodels/cutman_pretrained_noTermBackScreen_gamma95_10spikepunish_enemies_curriculum_best/best_model.zip
- You can see it in action by running
megai_man/test.py
. If you want to record a movie, change therecord
parameter of themake_venv
function torecord="."
. The concept of movie here is taken from the Tool Assisted Speedrun community, as it's only a button press record, not the actual frames from the game. It will be stored as abk2
file - If you want to render the movie into an actual video, run
megai_man/playback_movie.py MegaMan-v1-Nes-CutMan-000000.bk2
- For boss fight, run
megai_man/train_boss.py
for 400k steps. You can see it in action by runningmegai_man/test_boss.py
Results of any training step will show up inside logs
directory. You can visualize them with TensorBoard by running tensorboard --logdir logs
inside project root directory.
PS: you might get away with the third step (modifying the original ROM) by skipping to step 6, which will remove enemies from early screens (and progressively add more enemies as training goes on) by directly modifying RAM as the AI plays it. I didn't test it this way and, even if it works, it will probably be much slower since there will be no action masking. This could be a future improvement. Feel free to contribute!
Images in images/bg
were ripped by Rick N. Bruns (taken from here).
Tilesets in images/tiles
were ripped by Mister Mike (taken from here).
- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (n.d.). Playing Atari with Deep Reinforcement Learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1312.5602
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy optimization Algorithms. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1707.06347
- Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., De Oliveira Pinto, H. P., Raiman, J., Salimans, T., Schlatter, J., . . . Zhang, S. (2019). Dota 2 with Large Scale Deep Reinforcement Learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1912.06680
- Anssi, K., Christian, S., & Ville, H. (2020). Action space shaping in deep reinforcement learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2004.00980
- Andrychowicz, M., Raichuk, A., Stańczyk, P., Orsini, M., Girgin, S., Marinier, R., Hussenot, L., Geist, M., Pietquin, O., Michalski, M., Gelly, S., & Bachem, O. (2020). What matters in On-Policy Reinforcement Learning? A Large-Scale Empirical Study. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2006.05990
- Huang, S., Fernand Julien Dossa, R., Raffin, A., Kanervisto, A., & Wang, W. (2022, March 25). The 37 Implementation Details of Proximal Policy Optimization. The ICLR Blog Track. Retrieved October 10, 2024, from https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
- Jones, A. (n.d.). Debugging reinforcement learning systems. andy jones. Retrieved October 10, 2024, from https://andyljones.com/posts/rl-debugging.html
- DarkAutumn/triforce: A deep learning agent for The Legend of Zelda (nes). (n.d.). GitHub. https://github.com/DarkAutumn/triforce
- Mega Man (NES)/RAM map - Data Crystal. (2024). In Mega Man (NES)/RAM map. Retrieved October 10, 2024, from https://datacrystal.tcrf.net/wiki/Mega_Man_(NES)/RAM_map
- Yliluoma, J. (2013, October 6). Rockman / Mega Man Source Code: Main disassembly (banks 5,6,7). Joel Yliluoma. Retrieved October 10, 2024, from https://bisqwit.iki.fi/jutut/megamansource/maincode.txt