Skip to content

Commit

Permalink
Fix seeding interface
Browse files Browse the repository at this point in the history
Former-commit-id: 62c97f2
  • Loading branch information
daochenzha committed Apr 29, 2020
1 parent cdcb9e7 commit d328faf
Show file tree
Hide file tree
Showing 51 changed files with 106 additions and 167 deletions.
19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports m
* Paper: [https://arxiv.org/abs/1910.04376](https://arxiv.org/abs/1910.04376)

**News:**
* Now support environment local seeding and multiprocessing.
* Now RLCard supports environment local seeding and multiprocessing. Thanks for the testing scripts provided by [@weepingwillowben](https://github.com/weepingwillowben).
* Human interface of NoLimit Holdem available. The action space of NoLimit Holdem has been abstracted. Thanks for the contribution of [@AdrianP-](https://github.com/AdrianP-).
* New game Gin Rummy and human GUI available. Thanks for the contribution of [@billh0420](https://github.com/billh0420).
* PyTorch implementation available. Thanks for the contribution of [@mjudell](https://github.com/mjudell).
Expand Down Expand Up @@ -128,15 +128,23 @@ We provide a complexity estimation for the games on several aspects. **InfoSet N
## API Cheat Sheet
### How to create an environment
You can use the the following interface. You can specify some configurations with a dictionary.
* **rlcard.make(env_id, config={}, env_num=1)**: Make an environment. `env_id` is a string of a environment; `env_num` is specifies how many environments running in parallel. If the number is larger than 1, then the tasks will be assigned to multiple processes for acceleration. `config` is a dictionary specifying some environment configurations, which are as follows.
* **env = rlcard.make(env_id, config={}, env_num=1)**: Make an environment. `env_id` is a string of a environment; `config` is a dictionary specifying some environment configurations, which are as follows.
* `seed`: Default `None`. Set a environment local random seed for reproducing the results.
* `env_num`: Default `1`. It specifies how many environments running in parallel. If the number is larger than 1, then the tasks will be assigned to multiple processes for acceleration.
* `allow_step_back`: Defualt `False`. `True` if allowing `step_back` function to traverse backward in the tree.
* `allow_raw_data`: Default `False`. `True` if allowing raw data in the `state`.
* `single_agent_mode`: Default `False`. `True` if using single agent mode, i.e., Gym style interface with other players as pretrained/rule models.
* `active_player`: Defualt `0`. If `single_agent_mode` is `True`, `active_player` will specify operating on which player in single agent mode.
* `record_action`: Default `False`. If `True`, a field of `action_record` will be in the `state` to record the historical actions. This may be used for human-agent play.

Once the environemnt is made, we can access some information of the game.
* **env.action_num**: The number of actions.
* **env.player_num**: The number of players.
* **env.state_space**: Ther state space of the observations.
* **env.timestep**: The number of timesteps stepped by the environment.

### What is state in RLCard
State will always have observation `state['obs']` and legal actions `state['legal_actions']`. If `allow_raw_data` is `True`, state will have raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.
State is a Python dictionary. It will always have observation `state['obs']` and legal actions `state['legal_actions']`. If `allow_raw_data` is `True`, state will have raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.

### Basic interfaces
The following interfaces provide a basic usage. It is easy to use but it has assumtions on the agent. The agent must follow [agent template](docs/developping-algorithms.md).
Expand All @@ -145,7 +153,6 @@ The following interfaces provide a basic usage. It is easy to use but it has ass

### Advanced interfaces
For advanced usage, the following interfaces allow flexible operations on the game tree. These interfaces do not make any assumtions on the agent.
* **env.seed(seed)**: Set a environment local random seed for reproducing the results.
* **env.reset()**: Initialize a game. Return the state and the first player ID.
* **env.step(action, raw_action=False)**: Take one step in the environment. `action` can be raw action or integer; `raw_action` should be `True` if the action is raw action (string).
* **env.step_back()**: Available only when `allow_step_back` is `True`. Take one step backward. This can be used for algorithms that operate on the game tree, such as CFR.
Expand All @@ -154,10 +161,6 @@ For advanced usage, the following interfaces allow flexible operations on the ga
* **env.get_state(player_id)**: Return the state corresponds to `player_id`.
* **env.get_payoffs()**: In the end of the game, return a list of payoffs for all the players.
* **env.get_perfect_information()**: (Currently only support some of the games) Obtain the perfect information at the current state.
* **env.action_num**: The number of actions.
* **env.player_num**: The number of players.
* **env.state_space**: Ther state space of the observations.
* **env.timestep**: The number of timesteps stepped by the environment.

### Running with multiple processes
RLCard now supports acceleration with multiple processes. Simply change `env_num` when making the environment to indicate how many processes would be used. Currenly we only support `run()` function with multiple processes. An example is [DQN on blackjack](docs/toy-examples.md#running-multiple-processes)
Expand Down
6 changes: 2 additions & 4 deletions examples/blackjack_dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
from rlcard.utils import Logger

# Make environment
env = rlcard.make('blackjack')
eval_env = rlcard.make('blackjack')
env = rlcard.make('blackjack', config={'seed': 0})
eval_env = rlcard.make('blackjack', config={'seed': 0})

# Set the iterations numbers and how frequently we evaluate performance
evaluate_every = 100
Expand All @@ -29,8 +29,6 @@

# Set a global seed
set_global_seed(0)
env.seed(0)
eval_env.seed(0)

with tf.Session() as sess:

Expand Down
6 changes: 2 additions & 4 deletions examples/blackjack_dqn_multi_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@

def main():
# Make environment
env = rlcard.make('blackjack', env_num=4)
eval_env = rlcard.make('blackjack', env_num=4)
env = rlcard.make('blackjack', config={'env_num': 4, 'seed': 0})
eval_env = rlcard.make('blackjack', config={'env_num': 4, 'seed': 0})

# Set the iterations numbers and how frequently we evaluate performance
evaluate_every = 100
Expand All @@ -31,8 +31,6 @@ def main():

# Set a global seed
set_global_seed(0)
env.seed(0)
eval_env.seed(0)

with tf.Session() as sess:

Expand Down
3 changes: 1 addition & 2 deletions examples/blackjack_random.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,11 @@
from rlcard.utils import set_global_seed

# Make environment
env = rlcard.make('blackjack')
env = rlcard.make('blackjack', config={'seed': 0})
episode_num = 2

# Set a global seed
set_global_seed(0)
env.seed(0)

# Set up agents
agent_0 = RandomAgent(action_num=env.action_num)
Expand Down
3 changes: 1 addition & 2 deletions examples/blackjack_random_multi_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,11 @@

def main():
# Make environment
env = rlcard.make('blackjack', env_num=4)
env = rlcard.make('blackjack', config={'seed': 0, 'env_num': 4})
iterations = 1

# Set a global seed
set_global_seed(0)
env.seed(0)

# Set up agents
agent = RandomAgent(action_num=env.action_num)
Expand Down
6 changes: 2 additions & 4 deletions examples/doudizhu_dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
from rlcard.utils import Logger

# Make environment
env = rlcard.make('doudizhu')
eval_env = rlcard.make('doudizhu')
env = rlcard.make('doudizhu', config={'seed': 0})
eval_env = rlcard.make('doudizhu', config={'seed': 0})

# Set the iterations numbers and how frequently we evaluate the performance
evaluate_every = 100
Expand All @@ -30,8 +30,6 @@

# Set a global seed
set_global_seed(0)
env.seed(0)
eval_env.seed(0)

with tf.Session() as sess:

Expand Down
6 changes: 2 additions & 4 deletions examples/doudizhu_nfsp.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
from rlcard.utils import Logger

# Make environment
env = rlcard.make('doudizhu')
eval_env = rlcard.make('doudizhu')
env = rlcard.make('doudizhu', config={'seed': 0})
eval_env = rlcard.make('doudizhu', config={'seed': 0})

# Set the iterations numbers and how frequently we evaluate the performance
evaluate_every = 1000
Expand All @@ -30,8 +30,6 @@

# Set a global seed
set_global_seed(0)
env.seed(0)
eval_env.seed(0)

with tf.Session() as sess:

Expand Down
3 changes: 1 addition & 2 deletions examples/doudizhu_random.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,11 @@
from rlcard.agents import RandomAgent

# Make environment
env = rlcard.make('doudizhu')
env = rlcard.make('doudizhu', config={'seed': 0})
episode_num = 2

# Set a global seed
set_global_seed(0)
env.seed(0)

# Set up agents
agent = RandomAgent(action_num=env.action_num)
Expand Down
6 changes: 2 additions & 4 deletions examples/gin_rummy_dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
from rlcard.utils import Logger

# Make environment
env = rlcard.make('gin-rummy')
eval_env = rlcard.make('gin-rummy')
env = rlcard.make('gin-rummy', config={'seed': 0})
eval_env = rlcard.make('gin-rummy', config={'seed': 0})
env.game.settings.print_settings()

# Set the iterations numbers and how frequently we evaluate/save plot
Expand All @@ -37,8 +37,6 @@

# Set a global seed
set_global_seed(0)
env.seed(0)
eval_env.seed(0)

with tf.Session() as sess:
# Set agents
Expand Down
6 changes: 2 additions & 4 deletions examples/gin_rummy_nfsp.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
from rlcard.utils import Logger

# Make environment
env = rlcard.make('gin-rummy')
eval_env = rlcard.make('gin-rummy')
env = rlcard.make('gin-rummy', config={'seed': 0})
eval_env = rlcard.make('gin-rummy', config={'seed': 0})
env.game.settings.print_settings()

# Set the iterations numbers and how frequently we evaluate/save plot
Expand All @@ -37,8 +37,6 @@

# Set a global seed
set_global_seed(0)
env.seed(0)
eval_env.seed(0)

with tf.Session() as sess:
# Initialize a global step
Expand Down
3 changes: 1 addition & 2 deletions examples/gin_rummy_random.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,12 @@
from rlcard.games.gin_rummy.utils.move import DealHandMove

# Make environment
env = rlcard.make('gin-rummy')
env = rlcard.make('gin-rummy', config={'seed': 0})
episode_num = 1
env.game.settings.print_settings()

# Set a global seed
set_global_seed(0)
env.seed(0)

# Set up agents
agents = models.load("gin-rummy-novice-rule").agents # use novice agents rather than random agents
Expand Down
6 changes: 2 additions & 4 deletions examples/leduc_holdem_cfr.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
from rlcard.utils import Logger

# Make environment and enable human mode
env = rlcard.make('leduc-holdem', config={'allow_step_back':True})
eval_env = rlcard.make('leduc-holdem')
env = rlcard.make('leduc-holdem', config={'seed': 0, 'allow_step_back':True})
eval_env = rlcard.make('leduc-holdem', config={'seed': 0})

# Set the iterations numbers and how frequently we evaluate the performance and save model
evaluate_every = 100
Expand All @@ -23,8 +23,6 @@

# Set a global seed
set_global_seed(0)
env.seed(0)
eval_env.seed(0)

# Initilize CFR Agent
agent = CFRAgent(env)
Expand Down
6 changes: 2 additions & 4 deletions examples/leduc_holdem_dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
from rlcard.utils import Logger

# Make environment
env = rlcard.make('leduc-holdem')
eval_env = rlcard.make('leduc-holdem')
env = rlcard.make('leduc-holdem', config={'seed': 0})
eval_env = rlcard.make('leduc-holdem', config={'seed': 0})

# Set the iterations numbers and how frequently we evaluate the performance
evaluate_every = 100
Expand All @@ -30,8 +30,6 @@

# Set a global seed
set_global_seed(0)
env.seed(0)
eval_env.seed(0)

with tf.Session() as sess:

Expand Down
6 changes: 2 additions & 4 deletions examples/leduc_holdem_dqn_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
from rlcard.utils import Logger

# Make environment
env = rlcard.make('leduc-holdem')
eval_env = rlcard.make('leduc-holdem')
env = rlcard.make('leduc-holdem', config={'seed': 0})
eval_env = rlcard.make('leduc-holdem', config={'seed': 0})

# Set the iterations numbers and how frequently we evaluate the performance
evaluate_every = 100
Expand All @@ -29,8 +29,6 @@

# Set a global seed
set_global_seed(0)
env.seed(0)
eval_env.seed(0)

agent = DQNAgent(scope='dqn',
action_num=env.action_num,
Expand Down
4 changes: 2 additions & 2 deletions examples/leduc_holdem_nfsp.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
from rlcard.utils.logger import Logger

# Make environment
env = rlcard.make('leduc-holdem')
eval_env = rlcard.make('leduc-holdem')
env = rlcard.make('leduc-holdem', config={'seed': 0})
eval_env = rlcard.make('leduc-holdem', config={'seed': 0})

# Set the iterations numbers and how frequently we evaluate/save plot
evaluate_every = 10000
Expand Down
3 changes: 1 addition & 2 deletions examples/leduc_holdem_nfsp_load_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,10 @@
from rlcard.utils import set_global_seed, tournament

# Make environment
env = rlcard.make('leduc-holdem')
env = rlcard.make('leduc-holdem', config={'seed': 0})

# Set a global seed
set_global_seed(0)
env.seed(0)

# Load pretrained model
graph = tf.Graph()
Expand Down
3 changes: 1 addition & 2 deletions examples/leduc_holdem_nfsp_load_model_2.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,10 @@
from rlcard import models

# Make environment
env = rlcard.make('leduc-holdem')
env = rlcard.make('leduc-holdem', config={'seed': 0})

# Set a global seed
set_global_seed(0)
env.seed(0)

# Here we directly load NFSP models from /models module
nfsp_agents = models.load('leduc-holdem-nfsp').agents
Expand Down
6 changes: 2 additions & 4 deletions examples/leduc_holdem_nfsp_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
from rlcard.utils import Logger

# Make environment
env = rlcard.make('leduc-holdem')
eval_env = rlcard.make('leduc-holdem')
env = rlcard.make('leduc-holdem', config={'seed': 0})
eval_env = rlcard.make('leduc-holdem', config={'seed': 0})

# Set the iterations numbers and how frequently we evaluate/save plot
evaluate_every = 10000
Expand All @@ -29,8 +29,6 @@

# Set a global seed
set_global_seed(0)
env.seed(0)
eval_env.seed(0)

# Set agents
agents = []
Expand Down
2 changes: 1 addition & 1 deletion examples/leduc_holdem_nfsp_pytorch_load_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from rlcard.utils.utils import set_global_seed, tournament

# Make environment
env = rlcard.make('leduc-holdem')
env = rlcard.make('leduc-holdem', config={'seed': 0})

# Set a global seed
set_global_seed(0)
Expand Down
2 changes: 1 addition & 1 deletion examples/leduc_holdem_nfsp_pytorch_load_model_2.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from rlcard import models

# Make environment
env = rlcard.make('leduc-holdem')
env = rlcard.make('leduc-holdem', config={'seed': 0})

# Set a global seed
set_global_seed(0)
Expand Down
3 changes: 1 addition & 2 deletions examples/leduc_holdem_random.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,11 @@
from rlcard.utils import set_global_seed

# Make environment
env = rlcard.make('leduc-holdem')
env = rlcard.make('leduc-holdem', config={'seed': 0})
episode_num = 2

# Set a global seed
set_global_seed(0)
env.seed(0)

# Set up agents
agent = RandomAgent(action_num=env.action_num)
Expand Down
3 changes: 1 addition & 2 deletions examples/leduc_holdem_random_multi_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,11 @@

def main():
# Make environment
env = rlcard.make('leduc-holdem', env_num=4)
env = rlcard.make('leduc-holdem', config={'seed': 0, 'env_num': 4})
iterations = 1

# Set a global seed
set_global_seed(0)
env.seed(0)

# Set up agents
agent = RandomAgent(action_num=env.action_num)
Expand Down
Loading

0 comments on commit d328faf

Please sign in to comment.