Update README.md, documentation, and data.

hiive · Feb 26, 2021 · 196c8e7 · 196c8e7
1 parent 322401a
commit 196c8e7
Show file tree

Hide file tree

Showing 24 changed files with 250,981 additions and 1,001,190 deletions.
diff --git a/Games/hex/HexLogic.py b/Games/hex/HexLogic.py
@@ -7,7 +7,7 @@
 :version: FINAL
 :date:
 :author: Aske Plaat
-:edited by: Joery de Vries
+:edited by: Joery de Vries and Ken Voskuil
 """
 import numpy as np
 

diff --git a/Games/hex/legacy/experimenter.py b/Games/hex/legacy/experimenter.py
@@ -7,8 +7,8 @@
 
 :version: FINAL
 :date: 07-02-2020
-:author: Joery de Vries
-:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
+:author: Joery de Vries and Ken Voskuil
+:edited by: Joery de Vries and Ken Voskuil
 :references: https://trueskill.org/
 """
 

diff --git a/Games/hex/legacy/hex_game.py b/Games/hex/legacy/hex_game.py
@@ -4,8 +4,8 @@
 
 :version: FINAL
 :date: 07-02-2020
-:author: Joery de Vries
-:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
+:author: Joery de Vries and Ken Voskuil
+:edited by: Joery de Vries and Ken Voskuil
 """
 
 from Games.hex.HexLogic import HexBoard

diff --git a/Games/hex/legacy/hex_heuristics.py b/Games/hex/legacy/hex_heuristics.py
@@ -11,8 +11,8 @@
 
 :version: FINAL
 :date: 07-02-2020
-:author: Joery de Vries
-:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
+:author: Joery de Vries and Ken Voskuil
+:edited by: Joery de Vries and Ken Voskuil
 """
 
 import numpy as np

diff --git a/Games/hex/legacy/hex_player.py b/Games/hex/legacy/hex_player.py
@@ -7,8 +7,8 @@
 
 :version: FINAL
 :date: 07-02-2020
-:author: Joery de Vries
-:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
+:author: Joery de Vries and Ken Voskuil
+:edited by: Joery de Vries and Ken Voskuil
 """
 
 

diff --git a/Games/hex/legacy/hex_policies.py b/Games/hex/legacy/hex_policies.py
@@ -14,8 +14,8 @@
 
 :version: FINAL
 :date: 15-05-2020
-:author: Joery de Vries
-:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
+:author: Joery de Vries and Ken Voskuil
+:edited by: Joery de Vries and Ken Voskuil
 """
 
 from ast import literal_eval

diff --git a/Games/hex/legacy/hex_search.py b/Games/hex/legacy/hex_search.py
@@ -5,8 +5,8 @@
 
 :version: FINAL
 :date: 07-02-2020
-:author: Joery de Vries
-:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
+:author: Joery de Vries and Ken Voskuil
+:edited by: Joery de Vries and Ken Voskuil
 
 :bibliography:
 1. Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A Modern

diff --git a/Games/hex/legacy/hex_utils.py b/Games/hex/legacy/hex_utils.py
@@ -12,8 +12,8 @@
 
 :version: FINAL
 :date: 15-05-2020
-:author: Joery de Vries
-:edited by: Joery de Vries, Oliver Konig, Siyuan Dong
+:author: Joery de Vries and Ken Voskuil
+:edited by: Joery de Vries and Ken Voskuil
 """
 
 from Games.hex.HexLogic import HexBoard

diff --git a/Games/tictactoe/TicTacToeLogic.py b/Games/tictactoe/TicTacToeLogic.py
@@ -7,8 +7,8 @@
 :version: FINAL
 :date:
 :author: Aske Plaat
-:edited by: Joery de Vries
-:edited by: Ken Voskuil
+:edited by: Joery de Vries and Ken Voskuil
+:edited by: Joery de Vries and Ken Voskuil
 """
 import numpy as np
 

diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2020 Ken Voskuil and Joery de Vries
+Copyright (c) 2020 Joery de Vries and Ken Voskuil
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README.md b/README.md
@@ -5,7 +5,8 @@ The codebase provides a modular framework to design your own AlphaZero and MuZer
 This API also allows MuZero agents to more strongly rely on their learned model during interaction with the environment; the programmer can e.g., specify the sparsity of observations to a *learned* MuZero agent during a trial. 
 Our interface also provides sufficient abstraction to extend the MuZero or AlphaZero algorithm for research purposes.
 
-**beta phase**: Most of the codebase is done regarding development, we are currently working on finishing up this project and making it easily available for other users.
+Note that we did not perform extensive testing on the boardgames, we experienced that this was very time intensive and difficult to tune. 
+Well tested environments include the Gym environments: CartPole-v1, MountainCar-v0, Pendulum-v0
 
 ## How to run:
 In order to run experiments/ train agents, you first need a .json configuration file (see [Configurations/ModelConfigs](Configurations/ModelConfigs)) for specifying the agent's parameters.
@@ -20,6 +21,7 @@ See the [wiki](https://github.com/kaesve/muzero/wiki) for a more elaborate overv
 * Python 3.7+
  - tensorflow
  - keras standalone (until tensorflow 2.3 is available on anaconda windows)
+ - tqdm
 
 #### Tested Versions (Windows and Linux)
 * Python 3.7.9
@@ -31,12 +33,12 @@ This codebase was designed for a Masters Course at Leiden University, we utilize
 We did this exclusively for MountainCar, the visualization tool can be viewed here: https://kaesve.nl/projects/muzero-model-inspector/#/; an example illustration of this is shown below.
 This figure illustrates the entire state-space from the MountainCar being embedded by MuZero's encoding network projected to the 3-PC space of the embedding's neural activation values. 
 
-![example](publish/figures/MC_l4kl_MDPAbstractionCombined.png)
+![example](publish/figures/MC_MDP_l8_illustration.png)
 
 We quantified the efficacy of our MuZero and AlphaZero implementations also on the CartPole environment over numerous hyperparameters. 
 The canonical MuZero can be quite unstable depending on the hyperparameters, the figure shows this through median and mean training rewards over 8 training runs.
 
-![example2](publish/figures/CP_NumericalResultsSplit.png)
+![example2](publish/figures/CP_NumericalResults.png)
 
 The figure below illustrates the efficacy of learned models on MountainCar, when we only provide the MuZero agent observations every n'th environment step along with the agent's learning progress with dense observations.
 
@@ -46,7 +48,7 @@ No boardgames were tested for MuZero as computation time quickly became an issue
 We did find that AlphaZero could learn good policies on boardgames, we found that it depends on the observation encoding. 
 Heuristic encoding as used in AlphaZero seemed less effective to the canonicalBoard representation used in AlphaZero-General.
 
-Our paper can be read for more details *here* (Will be added later).
+Our paper can be read for more details here: [arxiv:2102.12924](https://arxiv.org/abs/2102.12924).
 
 ## Our Contributions
 There are already a variety of MuZero and AlphaZero implementations available: