Skip to content

Commit

Permalink
Update README.md, documentation, and data.
Browse files Browse the repository at this point in the history
  • Loading branch information
joeryjoery committed Feb 26, 2021
1 parent 322401a commit 196c8e7
Show file tree
Hide file tree
Showing 24 changed files with 250,981 additions and 1,001,190 deletions.
2 changes: 1 addition & 1 deletion Games/hex/HexLogic.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
:version: FINAL
:date:
:author: Aske Plaat
:edited by: Joery de Vries
:edited by: Joery de Vries and Ken Voskuil
"""
import numpy as np

Expand Down
4 changes: 2 additions & 2 deletions Games/hex/legacy/experimenter.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
:version: FINAL
:date: 07-02-2020
:author: Joery de Vries
:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
:author: Joery de Vries and Ken Voskuil
:edited by: Joery de Vries and Ken Voskuil
:references: https://trueskill.org/
"""

Expand Down
4 changes: 2 additions & 2 deletions Games/hex/legacy/hex_game.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
:version: FINAL
:date: 07-02-2020
:author: Joery de Vries
:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
:author: Joery de Vries and Ken Voskuil
:edited by: Joery de Vries and Ken Voskuil
"""

from Games.hex.HexLogic import HexBoard
Expand Down
4 changes: 2 additions & 2 deletions Games/hex/legacy/hex_heuristics.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
:version: FINAL
:date: 07-02-2020
:author: Joery de Vries
:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
:author: Joery de Vries and Ken Voskuil
:edited by: Joery de Vries and Ken Voskuil
"""

import numpy as np
Expand Down
4 changes: 2 additions & 2 deletions Games/hex/legacy/hex_player.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
:version: FINAL
:date: 07-02-2020
:author: Joery de Vries
:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
:author: Joery de Vries and Ken Voskuil
:edited by: Joery de Vries and Ken Voskuil
"""


Expand Down
4 changes: 2 additions & 2 deletions Games/hex/legacy/hex_policies.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
:version: FINAL
:date: 15-05-2020
:author: Joery de Vries
:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
:author: Joery de Vries and Ken Voskuil
:edited by: Joery de Vries and Ken Voskuil
"""

from ast import literal_eval
Expand Down
4 changes: 2 additions & 2 deletions Games/hex/legacy/hex_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
:version: FINAL
:date: 07-02-2020
:author: Joery de Vries
:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
:author: Joery de Vries and Ken Voskuil
:edited by: Joery de Vries and Ken Voskuil
:bibliography:
1. Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A Modern
Expand Down
4 changes: 2 additions & 2 deletions Games/hex/legacy/hex_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@
:version: FINAL
:date: 15-05-2020
:author: Joery de Vries
:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
:author: Joery de Vries and Ken Voskuil
:edited by: Joery de Vries and Ken Voskuil
"""

from Games.hex.HexLogic import HexBoard
Expand Down
4 changes: 2 additions & 2 deletions Games/tictactoe/TicTacToeLogic.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
:version: FINAL
:date:
:author: Aske Plaat
:edited by: Joery de Vries
:edited by: Ken Voskuil
:edited by: Joery de Vries and Ken Voskuil
:edited by: Joery de Vries and Ken Voskuil
"""
import numpy as np

Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2020 Ken Voskuil and Joery de Vries
Copyright (c) 2020 Joery de Vries and Ken Voskuil

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ The codebase provides a modular framework to design your own AlphaZero and MuZer
This API also allows MuZero agents to more strongly rely on their learned model during interaction with the environment; the programmer can e.g., specify the sparsity of observations to a *learned* MuZero agent during a trial.
Our interface also provides sufficient abstraction to extend the MuZero or AlphaZero algorithm for research purposes.

**beta phase**: Most of the codebase is done regarding development, we are currently working on finishing up this project and making it easily available for other users.
Note that we did not perform extensive testing on the boardgames, we experienced that this was very time intensive and difficult to tune.
Well tested environments include the Gym environments: CartPole-v1, MountainCar-v0, Pendulum-v0

## How to run:
In order to run experiments/ train agents, you first need a .json configuration file (see [Configurations/ModelConfigs](Configurations/ModelConfigs)) for specifying the agent's parameters.
Expand All @@ -20,6 +21,7 @@ See the [wiki](https://github.com/kaesve/muzero/wiki) for a more elaborate overv
* Python 3.7+
- tensorflow
- keras standalone (until tensorflow 2.3 is available on anaconda windows)
- tqdm

#### Tested Versions (Windows and Linux)
* Python 3.7.9
Expand All @@ -31,12 +33,12 @@ This codebase was designed for a Masters Course at Leiden University, we utilize
We did this exclusively for MountainCar, the visualization tool can be viewed here: https://kaesve.nl/projects/muzero-model-inspector/#/; an example illustration of this is shown below.
This figure illustrates the entire state-space from the MountainCar being embedded by MuZero's encoding network projected to the 3-PC space of the embedding's neural activation values.

![example](publish/figures/MC_l4kl_MDPAbstractionCombined.png)
![example](publish/figures/MC_MDP_l8_illustration.png)

We quantified the efficacy of our MuZero and AlphaZero implementations also on the CartPole environment over numerous hyperparameters.
The canonical MuZero can be quite unstable depending on the hyperparameters, the figure shows this through median and mean training rewards over 8 training runs.

![example2](publish/figures/CP_NumericalResultsSplit.png)
![example2](publish/figures/CP_NumericalResults.png)

The figure below illustrates the efficacy of learned models on MountainCar, when we only provide the MuZero agent observations every n'th environment step along with the agent's learning progress with dense observations.

Expand All @@ -46,7 +48,7 @@ No boardgames were tested for MuZero as computation time quickly became an issue
We did find that AlphaZero could learn good policies on boardgames, we found that it depends on the observation encoding.
Heuristic encoding as used in AlphaZero seemed less effective to the canonicalBoard representation used in AlphaZero-General.

Our paper can be read for more details *here* (Will be added later).
Our paper can be read for more details here: [arxiv:2102.12924](https://arxiv.org/abs/2102.12924).

## Our Contributions
There are already a variety of MuZero and AlphaZero implementations available:
Expand Down
Loading

0 comments on commit 196c8e7

Please sign in to comment.