poker-bot

Overview

This repository contains the project for the Deep learning class (course code: VITMAV45) at the Budapest University of Technology and Economics. Our project focuses on reinforcement learning with the aim of training an agent in a poker environment. After training, we can play against our pre-trained agent.

Team name: THE3
Team members: László Barak, Mónika Farsang, Ádám Szukics

Code

First milestone

The presented code for the first milestone is based on the RLcard github repository example code. It is used as a presentation that the chosen environment works and the agent is ready to train.

Second milestone

The code for the second milestone is a DQN agent in PyTorch. We used the RLcard DQN agent written in TensorFlow as a base and created a more powerful, more manageable, and easy to use code in Pytorch. This implementation is an advanced Q-learning agent in two aspects. First, it uses a replay buffer to store past experiences and we can sample training data from it periodically. Second, to make the training more stable, another Q-network is used as a target network in order to backpropagate through it and train the policy Q-network. These features are described in the Nature paper Human-level control through deep reinforcement learning.
Furthermore, as an extra component, we added the opportunity of a more aggressive playing strategy. In case of the given action has the maximum q-value, the agent chooses the Raise action instead if it is a valid action. The possible settings are displayed below:

Strategy settings	Meaning
0	Using action with maximum value (default in DQN)
1	If action Call has the maximum value, we use Raise action if possible
2	If action Check has the maximum value, we use Raise action if possible
3	If action Fold has the maximum value, we use Raise action if possible

The agent can be trained and evaluated against a random agent and a pre-trained agent.

Opponent settings	Meaning
0	Random agent
1	Pre-trained NFSP agent

These can be set in the training code for the DQN agent.

References:

These references were used during the implementation of the DQN agent in PyTorch.
https://github.com/datamllab/rlcard/blob/master/rlcard/agents/dqn_agent.py
https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html
https://towardsdatascience.com/deep-q-network-dqn-ii-b6bf911b6b2c

Final code

In the final code, we saved the best agents after hyperparameter optimization. These pre-trained agents can be set as opponents in the Leduc and Limit Hold'em environment. The playing game code runs in the Leduc Hold'em environment by default. You can choose between running code using Docker or using Colab notebooks. More details are written below.

Using Dockerfile

The Dockerfile contains the list of system dependencies. After building the image, which gives a simple containerization of our application, the game runs successfully in its container.

For building the image use following command:
$ docker build --tag IMAGE_NAME:TAG .
e.g. $ docker build --tag poker-bot:1.0 .

For running the image:

Playing in the Leduc Hold'em environment:

$ docker run -ti IMAGE_NAME:TAG or $ docker run -ti IMAGE_NAME:TAG --env leduc
e.g. $ docker run -ti poker-bot:1.0 or $ docker run -ti poker-bot:1.0 --env leduc

Playing in the Limit Hold'em environment:

$ docker run -ti IMAGE_NAME:TAG --env limit
e.g. $ docker run -ti poker-bot:1.0 --env limit

Using Notebook format

First milestone

A notebook version is presented in the repository as well. If you want to get a quick look at our first milestone results, we recommend to choose this one.

Second milestone

For the second milestone, we present two versions, one in the Leduc Hold'em and the other in the Limit Hold'em environment. After training the DQN agent in the Leduc Hold'em environment, you can play against it.

Final code

Our final code is presented in notebook format as well. You can play game against our pre-trained agents in the Leduc Hold'em and Limit Hold'em environments.

Environment

RLcard is an easy-to-use toolkit that provides Limit Hold’em environment and Leduc Hold’em environment. The latter is a smaller version of Limit Texas Hold’em and it was introduced in the research paper Bayes’ Bluff: Opponent Modeling in Poker in 2012.

Limit Hold'em

52 cards
Each player has 2 hole cards (face-down cards)
5 community cards (3 phases: flop, turn, river)
4 betting rounds
Each player has 4 Raise actions in each round

State Representation in Limit Hold'em

The state is encoded as a vector of length 72. It can be splitted into two parts, the first part is the known cards (hole cards plus the known community cards). The second part is the number of Raise actions in the rounds. The indices and their meaning are presented below.

Index	Meaning
0-12	Spade A - Spade K
13-25	Heart A - Heart K
26-38	Diamond A - Diamond K
39-51	Club A - Club K
52-56	Raise number in round 1
57-61	Raise number in round 2
62-66	Raise number in round 3
67-71	Raise number in round 4

Leduc Hold'em

6 cards: two pairs of King, Queen and Jack
2 players
2 rounds
Raise amounts of 2 in the first round and 4 in the second round
2-bet maximum
0-14 chips for the agent and for the opponent

First round: players put 1 unit in the pot and are dealt 1 card, then start betting.
Second round: 1 public card is revealed, then the players bet again.
End: the player wins, whose hand has the same rank as the public card or has higher rank than the opponent.

State Representation in Leduc Hold'em

The state representation is different from the Limit Hold'em environment, its length is 36. The indices and their meaning are presented below.

Index	Meaning
0	Jack in hand
1	Queen in hand
2	King in hand
3	Jack as public card
4	Queen as public card
5	King as public card
6-20	0-14 chips for the agent
21-35	0-14 chips for the opponent

Actions

Actions are the same in the Limit and the Leduc Hold'em environment. There are 4 action types which are encoded as below.

Action	Meaning
0	Call
1	Raise
2	Fold
3	Check

Payoff

Payoff is the same in the Limit and the Leduc Hold'em environment. The reward is based on big blinds per hand.

Reward	Meaning
R	the player wins R times of the amount of big blind
-R	the player loses R times of the amount of big blind

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
Previous_Papers		Previous_Papers
agent		agent
hyperparameter optimization results		hyperparameter optimization results
notebooks		notebooks
own_models		own_models
special_environment		special_environment
training_agent		training_agent
.gitignore		.gitignore
Building Poker Bot with Reinforcement Learning.docx		Building Poker Bot with Reinforcement Learning.docx
Building Poker Bot with Reinforcement Learning.pdf		Building Poker Bot with Reinforcement Learning.pdf
BuildingPokerBotwithRL.pptx		BuildingPokerBotwithRL.pptx
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
game.py		game.py
requirements.txt		requirements.txt
training_dqn.py		training_dqn.py
training_dqn_special_env.py		training_dqn_special_env.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

poker-bot

Overview

Code

First milestone

Second milestone

References:

Final code

Using Dockerfile

Playing in the Leduc Hold'em environment:

Playing in the Limit Hold'em environment:

Using Notebook format

First milestone

Second milestone

Final code

Environment

Limit Hold'em

State Representation in Limit Hold'em

Leduc Hold'em

State Representation in Leduc Hold'em

Actions

Payoff

About

Releases

Packages

Languages

License

szukiadam/poker-bot

Folders and files

Latest commit

History

Repository files navigation

poker-bot

Overview

Code

First milestone

Second milestone

References:

Final code

Using Dockerfile

Playing in the Leduc Hold'em environment:

Playing in the Limit Hold'em environment:

Using Notebook format

First milestone

Second milestone

Final code

Environment

Limit Hold'em

State Representation in Limit Hold'em

Leduc Hold'em

State Representation in Leduc Hold'em

Actions

Payoff

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages