Fix seeding interface

Former-commit-id: 62c97f2
chenmich · Apr 29, 2020 · d328faf · d328faf
1 parent cdcb9e7
commit d328faf
Show file tree

Hide file tree

Showing 51 changed files with 106 additions and 167 deletions.
diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@ RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports m
 *   Paper: [https://arxiv.org/abs/1910.04376](https://arxiv.org/abs/1910.04376)
 
 **News:**
-*   Now support environment local seeding and multiprocessing.
+*   Now RLCard supports environment local seeding and multiprocessing. Thanks for the testing scripts provided by [@weepingwillowben](https://github.com/weepingwillowben).
 *   Human interface of NoLimit Holdem available. The action space of NoLimit Holdem has been abstracted. Thanks for the contribution of [@AdrianP-](https://github.com/AdrianP-).
 *   New game Gin Rummy and human GUI available. Thanks for the contribution of [@billh0420](https://github.com/billh0420).
 *   PyTorch implementation available. Thanks for the contribution of [@mjudell](https://github.com/mjudell).
@@ -128,15 +128,23 @@ We provide a complexity estimation for the games on several aspects. **InfoSet N
 ## API Cheat Sheet
 ### How to create an environment
 You can use the the following interface. You can specify some configurations with a dictionary.
-*   **rlcard.make(env_id, config={}, env_num=1)**: Make an environment. `env_id` is a string of a environment; `env_num` is specifies how many environments running in parallel. If the number is larger than 1, then the tasks will be assigned to multiple processes for acceleration. `config` is a dictionary specifying some environment configurations, which are as follows.
+*   **env = rlcard.make(env_id, config={}, env_num=1)**: Make an environment. `env_id` is a string of a environment; `config` is a dictionary specifying some environment configurations, which are as follows.
+	*	`seed`: Default `None`. Set a environment local random seed for reproducing the results.
+	* 	`env_num`: Default `1`. It specifies how many environments running in parallel. If the number is larger than 1, then the tasks will be assigned to multiple processes for acceleration.
 	*   `allow_step_back`: Defualt `False`. `True` if allowing `step_back` function to traverse backward in the tree.
 	*   `allow_raw_data`: Default `False`. `True` if allowing raw data in the `state`.
 	*   `single_agent_mode`: Default `False`. `True` if using single agent mode, i.e., Gym style interface with other players as pretrained/rule models.
 	*   `active_player`: Defualt `0`. If `single_agent_mode` is `True`, `active_player` will specify operating on which player in single agent mode.
 	*   `record_action`: Default `False`. If `True`, a field of `action_record` will be in the `state` to record the historical actions. This may be used for human-agent play.
 
+Once the environemnt is made, we can access some information of the game.
+*   **env.action_num**: The number of actions.
+*   **env.player_num**: The number of players.
+*   **env.state_space**: Ther state space of the observations.
+*   **env.timestep**: The number of timesteps stepped by the environment.
+
 ### What is state in RLCard
-State will always have observation `state['obs']` and legal actions `state['legal_actions']`. If `allow_raw_data` is `True`, state will have raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.
+State is a Python dictionary. It will always have observation `state['obs']` and legal actions `state['legal_actions']`. If `allow_raw_data` is `True`, state will have raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.
 
 ### Basic interfaces
 The following interfaces provide a basic usage. It is easy to use but it has assumtions on the agent. The agent must follow [agent template](docs/developping-algorithms.md). 
@@ -145,7 +153,6 @@ The following interfaces provide a basic usage. It is easy to use but it has ass
 
 ### Advanced interfaces
 For advanced usage, the following interfaces allow flexible operations on the game tree. These interfaces do not make any assumtions on the agent.
-*   **env.seed(seed)**: Set a environment local random seed for reproducing the results.
 *   **env.reset()**: Initialize a game. Return the state and the first player ID.
 *   **env.step(action, raw_action=False)**: Take one step in the environment. `action` can be raw action or integer; `raw_action` should be `True` if the action is raw action (string).
 *   **env.step_back()**: Available only when `allow_step_back` is `True`. Take one step backward. This can be used for algorithms that operate on the game tree, such as CFR.
@@ -154,10 +161,6 @@ For advanced usage, the following interfaces allow flexible operations on the ga
 *	**env.get_state(player_id)**: Return the state corresponds to `player_id`.
 *   **env.get_payoffs()**: In the end of the game, return a list of payoffs for all the players.
 *   **env.get_perfect_information()**: (Currently only support some of the games) Obtain the perfect information at the current state.
-*   **env.action_num**: The number of actions.
-*   **env.player_num**: The number of players.
-*   **env.state_space**: Ther state space of the observations.
-*   **env.timestep**: The number of timesteps stepped by the environment.
 
 ### Running with multiple processes
 RLCard now supports acceleration with multiple processes. Simply change `env_num` when making the environment to indicate how many processes would be used. Currenly we only support `run()` function with multiple processes. An example is [DQN on blackjack](docs/toy-examples.md#running-multiple-processes)  

diff --git a/examples/blackjack_dqn.py b/examples/blackjack_dqn.py
@@ -10,8 +10,8 @@
 from rlcard.utils import Logger
 
 # Make environment
-env = rlcard.make('blackjack')
-eval_env = rlcard.make('blackjack')
+env = rlcard.make('blackjack', config={'seed': 0})
+eval_env = rlcard.make('blackjack', config={'seed': 0})
 
 # Set the iterations numbers and how frequently we evaluate performance
 evaluate_every = 100
@@ -29,8 +29,6 @@
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
-eval_env.seed(0)
 
 with tf.Session() as sess:
 

diff --git a/examples/blackjack_dqn_multi_process.py b/examples/blackjack_dqn_multi_process.py
@@ -12,8 +12,8 @@
 
 def main():
     # Make environment
-    env = rlcard.make('blackjack', env_num=4)
-    eval_env = rlcard.make('blackjack', env_num=4)
+    env = rlcard.make('blackjack', config={'env_num': 4, 'seed': 0})
+    eval_env = rlcard.make('blackjack', config={'env_num': 4, 'seed': 0})
 
     # Set the iterations numbers and how frequently we evaluate performance
     evaluate_every = 100
@@ -31,8 +31,6 @@ def main():
 
     # Set a global seed
     set_global_seed(0)
-    env.seed(0)
-    eval_env.seed(0)
 
     with tf.Session() as sess:
 

diff --git a/examples/blackjack_random.py b/examples/blackjack_random.py
@@ -6,12 +6,11 @@
 from rlcard.utils import set_global_seed
 
 # Make environment
-env = rlcard.make('blackjack')
+env = rlcard.make('blackjack', config={'seed': 0})
 episode_num = 2
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
 
 # Set up agents
 agent_0 = RandomAgent(action_num=env.action_num)

diff --git a/examples/blackjack_random_multi_process.py b/examples/blackjack_random_multi_process.py
@@ -8,12 +8,11 @@
 
 def main():
     # Make environment
-    env = rlcard.make('blackjack', env_num=4)
+    env = rlcard.make('blackjack', config={'seed': 0, 'env_num': 4})
     iterations = 1
 
     # Set a global seed
     set_global_seed(0)
-    env.seed(0)
 
     # Set up agents
     agent = RandomAgent(action_num=env.action_num)

diff --git a/examples/doudizhu_dqn.py b/examples/doudizhu_dqn.py
@@ -11,8 +11,8 @@
 from rlcard.utils import Logger
 
 # Make environment
-env = rlcard.make('doudizhu')
-eval_env = rlcard.make('doudizhu')
+env = rlcard.make('doudizhu', config={'seed': 0})
+eval_env = rlcard.make('doudizhu', config={'seed': 0})
 
 # Set the iterations numbers and how frequently we evaluate the performance
 evaluate_every = 100
@@ -30,8 +30,6 @@
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
-eval_env.seed(0)
 
 with tf.Session() as sess:
 

diff --git a/examples/doudizhu_nfsp.py b/examples/doudizhu_nfsp.py
@@ -11,8 +11,8 @@
 from rlcard.utils import Logger
 
 # Make environment
-env = rlcard.make('doudizhu')
-eval_env = rlcard.make('doudizhu')
+env = rlcard.make('doudizhu', config={'seed': 0})
+eval_env = rlcard.make('doudizhu', config={'seed': 0})
 
 # Set the iterations numbers and how frequently we evaluate the performance
 evaluate_every = 1000
@@ -30,8 +30,6 @@
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
-eval_env.seed(0)
 
 with tf.Session() as sess:
 

diff --git a/examples/doudizhu_random.py b/examples/doudizhu_random.py
@@ -6,12 +6,11 @@
 from rlcard.agents import RandomAgent
 
 # Make environment
-env = rlcard.make('doudizhu')
+env = rlcard.make('doudizhu', config={'seed': 0})
 episode_num = 2
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
 
 # Set up agents
 agent = RandomAgent(action_num=env.action_num)

diff --git a/examples/gin_rummy_dqn.py b/examples/gin_rummy_dqn.py
@@ -17,8 +17,8 @@
 from rlcard.utils import Logger
 
 # Make environment
-env = rlcard.make('gin-rummy')
-eval_env = rlcard.make('gin-rummy')
+env = rlcard.make('gin-rummy', config={'seed': 0})
+eval_env = rlcard.make('gin-rummy', config={'seed': 0})
 env.game.settings.print_settings()
 
 # Set the iterations numbers and how frequently we evaluate/save plot
@@ -37,8 +37,6 @@
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
-eval_env.seed(0)
 
 with tf.Session() as sess:
     # Set agents

diff --git a/examples/gin_rummy_nfsp.py b/examples/gin_rummy_nfsp.py
@@ -17,8 +17,8 @@
 from rlcard.utils import Logger
 
 # Make environment
-env = rlcard.make('gin-rummy')
-eval_env = rlcard.make('gin-rummy')
+env = rlcard.make('gin-rummy', config={'seed': 0})
+eval_env = rlcard.make('gin-rummy', config={'seed': 0})
 env.game.settings.print_settings()
 
 # Set the iterations numbers and how frequently we evaluate/save plot
@@ -37,8 +37,6 @@
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
-eval_env.seed(0)
 
 with tf.Session() as sess:
     # Initialize a global step

diff --git a/examples/gin_rummy_random.py b/examples/gin_rummy_random.py
@@ -15,13 +15,12 @@
 from rlcard.games.gin_rummy.utils.move import DealHandMove
 
 # Make environment
-env = rlcard.make('gin-rummy')
+env = rlcard.make('gin-rummy', config={'seed': 0})
 episode_num = 1
 env.game.settings.print_settings()
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
 
 # Set up agents
 agents = models.load("gin-rummy-novice-rule").agents  # use novice agents rather than random agents

diff --git a/examples/leduc_holdem_cfr.py b/examples/leduc_holdem_cfr.py
@@ -9,8 +9,8 @@
 from rlcard.utils import Logger
 
 # Make environment and enable human mode
-env = rlcard.make('leduc-holdem', config={'allow_step_back':True})
-eval_env = rlcard.make('leduc-holdem')
+env = rlcard.make('leduc-holdem', config={'seed': 0, 'allow_step_back':True})
+eval_env = rlcard.make('leduc-holdem', config={'seed': 0})
 
 # Set the iterations numbers and how frequently we evaluate the performance and save model
 evaluate_every = 100
@@ -23,8 +23,6 @@
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
-eval_env.seed(0)
 
 # Initilize CFR Agent
 agent = CFRAgent(env)

diff --git a/examples/leduc_holdem_dqn.py b/examples/leduc_holdem_dqn.py
@@ -11,8 +11,8 @@
 from rlcard.utils import Logger
 
 # Make environment
-env = rlcard.make('leduc-holdem')
-eval_env = rlcard.make('leduc-holdem')
+env = rlcard.make('leduc-holdem', config={'seed': 0})
+eval_env = rlcard.make('leduc-holdem', config={'seed': 0})
 
 # Set the iterations numbers and how frequently we evaluate the performance
 evaluate_every = 100
@@ -30,8 +30,6 @@
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
-eval_env.seed(0)
 
 with tf.Session() as sess:
 

diff --git a/examples/leduc_holdem_dqn_pytorch.py b/examples/leduc_holdem_dqn_pytorch.py
@@ -10,8 +10,8 @@
 from rlcard.utils import Logger
 
 # Make environment
-env = rlcard.make('leduc-holdem')
-eval_env = rlcard.make('leduc-holdem')
+env = rlcard.make('leduc-holdem', config={'seed': 0})
+eval_env = rlcard.make('leduc-holdem', config={'seed': 0})
 
 # Set the iterations numbers and how frequently we evaluate the performance
 evaluate_every = 100
@@ -29,8 +29,6 @@
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
-eval_env.seed(0)
 
 agent = DQNAgent(scope='dqn',
                  action_num=env.action_num,

diff --git a/examples/leduc_holdem_nfsp.py b/examples/leduc_holdem_nfsp.py
@@ -11,8 +11,8 @@
 from rlcard.utils.logger import Logger
 
 # Make environment
-env = rlcard.make('leduc-holdem')
-eval_env = rlcard.make('leduc-holdem')
+env = rlcard.make('leduc-holdem', config={'seed': 0})
+eval_env = rlcard.make('leduc-holdem', config={'seed': 0})
 
 # Set the iterations numbers and how frequently we evaluate/save plot
 evaluate_every = 10000

diff --git a/examples/leduc_holdem_nfsp_load_model.py b/examples/leduc_holdem_nfsp_load_model.py
@@ -9,11 +9,10 @@
 from rlcard.utils import set_global_seed, tournament
 
 # Make environment
-env = rlcard.make('leduc-holdem')
+env = rlcard.make('leduc-holdem', config={'seed': 0})
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
 
 # Load pretrained model
 graph = tf.Graph()

diff --git a/examples/leduc_holdem_nfsp_load_model_2.py b/examples/leduc_holdem_nfsp_load_model_2.py
@@ -7,11 +7,10 @@
 from rlcard import models
 
 # Make environment
-env = rlcard.make('leduc-holdem')
+env = rlcard.make('leduc-holdem', config={'seed': 0})
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
 
 # Here we directly load NFSP models from /models module
 nfsp_agents = models.load('leduc-holdem-nfsp').agents

diff --git a/examples/leduc_holdem_nfsp_pytorch.py b/examples/leduc_holdem_nfsp_pytorch.py
@@ -10,8 +10,8 @@
 from rlcard.utils import Logger
 
 # Make environment
-env = rlcard.make('leduc-holdem')
-eval_env = rlcard.make('leduc-holdem')
+env = rlcard.make('leduc-holdem', config={'seed': 0})
+eval_env = rlcard.make('leduc-holdem', config={'seed': 0})
 
 # Set the iterations numbers and how frequently we evaluate/save plot
 evaluate_every = 10000
@@ -29,8 +29,6 @@
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
-eval_env.seed(0)
 
 # Set agents
 agents = []

diff --git a/examples/leduc_holdem_nfsp_pytorch_load_model.py b/examples/leduc_holdem_nfsp_pytorch_load_model.py
@@ -9,7 +9,7 @@
 from rlcard.utils.utils import set_global_seed, tournament
 
 # Make environment
-env = rlcard.make('leduc-holdem')
+env = rlcard.make('leduc-holdem', config={'seed': 0})
 
 # Set a global seed
 set_global_seed(0)

diff --git a/examples/leduc_holdem_nfsp_pytorch_load_model_2.py b/examples/leduc_holdem_nfsp_pytorch_load_model_2.py
@@ -7,7 +7,7 @@
 from rlcard import models
 
 # Make environment
-env = rlcard.make('leduc-holdem')
+env = rlcard.make('leduc-holdem', config={'seed': 0})
 
 # Set a global seed
 set_global_seed(0)

diff --git a/examples/leduc_holdem_random.py b/examples/leduc_holdem_random.py
@@ -6,12 +6,11 @@
 from rlcard.utils import set_global_seed
 
 # Make environment
-env = rlcard.make('leduc-holdem')
+env = rlcard.make('leduc-holdem', config={'seed': 0})
 episode_num = 2
 
 # Set a global seed
 set_global_seed(0)
-env.seed(0)
 
 # Set up agents
 agent = RandomAgent(action_num=env.action_num)

diff --git a/examples/leduc_holdem_random_multi_process.py b/examples/leduc_holdem_random_multi_process.py
@@ -8,12 +8,11 @@
 
 def main():
     # Make environment
-    env = rlcard.make('leduc-holdem', env_num=4)
+    env = rlcard.make('leduc-holdem', config={'seed': 0, 'env_num': 4})
     iterations = 1
 
     # Set a global seed
     set_global_seed(0)
-    env.seed(0)
 
     # Set up agents
     agent = RandomAgent(action_num=env.action_num)