RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports multiple card environments with easy-to-use interfaces. The goal of RLCard is to bridge reinforcement learning and imperfect information games, and push forward the research of reinforcement learning in domains with multiple agents, large state and action space, and sparse reward. RLCard is developed by DATA Lab at Texas A&M University.
- Official Website: http://www.rlcard.org
- Paper: https://arxiv.org/abs/1910.04376
News:
- PyTorch implementation available. Thanks for the contribution of @mjudell.
- We have just initialized a list of Awesome-Game-AI resources. Check it out!
Make sure that you have Python 3.5+ and pip installed. We recommend installing rlcard
with pip
as follow:
git clone https://github.com/datamllab/rlcard.git
cd rlcard
pip install -e .
or use PyPI with:
pip install rlcard
To use tensorflow implementation, run the following command:
pip install -e .[tensorflow]
To try out PyTorch implementation for DQN and NFSP, please also run the following command:
pip install -e .[torch]
If you meet any problem with installing PyTorch using the command above, you may follow the instruction on PyTorch official website to manually install PyTorch.
Please refer to examples/. A short example is as below.
import rlcard
from rlcard.agents.random_agent import RandomAgent
env = rlcard.make('blackjack')
env.set_agents([RandomAgent()])
trajectories, payoffs = env.run()
We also recommend the following toy examples.
- Playing with random agents
- Deep-Q learning on Blackjack
- Running multiple processes
- Having fun with pretrained Leduc model
- Leduc Hold'em as single-agent environment
- Training CFR on Leduc Hold'em
With tensorflow
installed, run examples/leduc_holdem_human.py
to play with the pre-trained Leduc Hold'em model. Leduc Hold'em is a simplified version of Texas Hold'em. Rules can be found here.
>> Leduc Hold'em pre-trained model
>> Start a new game!
>> Agent 1 chooses raise
=============== Community Card ===============
┌─────────┐
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
└─────────┘
=============== Your Hand ===============
┌─────────┐
│J │
│ │
│ │
│ ♥ │
│ │
│ │
│ J│
└─────────┘
=============== Chips ===============
Yours: +
Agent 1: +++
=========== Actions You Can Choose ===========
0: call, 1: raise, 2: fold
>> You choose action (integer):
rlcard.make(env_id, config={})
: Make an environment.env_id
is a string of a environment;config
is a dictionary specifying some environment configurations, which are as follows.allow_step_back
defualtFalse
. True if allowingstep_back
function to traverse backward in the tree.allow_raw_data
: defaultFalse
. True if allowing raw data in thestate
.single_agent_mode
: defaultFalse
. True if using single agent mode, i.e., Gym style interface with other players as pretrained/rule models.active_player
: defualt0
. Ifsingle_agent_mode
isTrue
,active_player
will specify operating on which player in single agent mode.human_mode
: DefaultFalse
. True if using human mode.
env.step(action, raw_action=False)
: Take one step in the environment.action
can be raw action or integer;raw_action
should be true if the action is raw action, i,e., string.env.init_game()
: Initialize a game. Return the state and the first player ID.env.run()
: Run a complete game and return trajectories and payoffs. The function can be used after the agents are set up.state
: State will always have observationstate['obs']
and legal actionsstate['legal_actions']
. Ifallow_raw_data
isTrue
, state will have raw observationstate['raw_obs']
and raw legal actionsstate['raw_legal_actions']
.
Please refer to the Documents for general introductions. API documents are available at our website.
We provide a complexity estimation for the games on several aspects. InfoSet Number: the number of information sets; Avg. InfoSet Size: the average number of states in a single information set; Action Size: the size of the action space. Name: the name that should be passed to rlcard.make
to create the game environment.
Game | InfoSet Number | Avg. InfoSet Size | Action Size | Name | Status |
---|---|---|---|---|---|
Blackjack (wiki, baike) | 10^3 | 10^1 | 10^0 | blackjack | Available |
Leduc Hold’em (paper) | 10^2 | 10^2 | 10^0 | leduc-holdem | Available |
Limit Texas Hold'em (wiki, baike) | 10^14 | 10^3 | 10^0 | limit-holdem | Available |
Dou Dizhu (wiki, baike) | 10^53 ~ 10^83 | 10^23 | 10^4 | doudizhu | Available |
Simple Dou Dizhu (wiki, baike) | - | - | simple-doudizhu | Available | |
Mahjong (wiki, baike) | 10^121 | 10^48 | 10^2 | mahjong | Available |
No-limit Texas Hold'em (wiki, baike) | 10^162 | 10^3 | 10^4 | no-limit-holdem | Available |
UNO (wiki, baike) | 10^163 | 10^10 | 10^1 | uno | Available |
Sheng Ji (wiki, baike) | 10^157 ~ 10^165 | 10^61 | 10^11 | - | Developing |
The perfomance is measured by winning rates through tournaments. Example outputs are as follows:
If you find this repo useful, you may cite:
@article{zha2019rlcard,
title={RLCard: A Toolkit for Reinforcement Learning in Card Games},
author={Zha, Daochen and Lai, Kwei-Herng and Cao, Yuanpu and Huang, Songyi and Wei, Ruzhe and Guo, Junyu and Hu, Xia},
journal={arXiv preprint arXiv:1910.04376},
year={2019}
}
Contribution to this project is greatly appreciated! Please create an issue for feedbacks/bugs. If you want to contribute codes, please refer to Contributing Guide.
We would like to thank JJ World Network Technology Co.,LTD for the generous support.