Skip to content
/ DQBS Public

Deep Learning Course Project at ETH Zurich

Notifications You must be signed in to change notification settings

Jingyu6/DQBS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DQBS: Deep Q-Learning with Backward SARSA

2021 Deep Learning Course Project at ETH Zurich, paper link

Authors:

By alphabetical order of last name
@Jingyu Liu
@Yunshu Ouyang
@Yilei Tu
@Yuyan Zhao

Abstract:

Deep Q-Network (DQN) with Experience Replay (ER) mechanism is the first algorithm to achieve super-human performance in Atari games and numerous works seek to improve upon them. Most previous works focused on designing architectures or updating rules for the Bellman updates to stabilize training. However, the bias of sampling transitions either randomly or weighted to some priority was often ignored and its validity taken for granted, as they usually perform well in practice. In this work, we designed a simple algorithm called Deep Q-Learning with Backward SARSA (DQBS) which splits a single standard update step of DQN into multiple steps with transitions following chronological backward order. DQBS takes advantage of the Markovian properties of transitions, which assume that the estimated Q-values of state-action pairs can gain more information from those of state-action pairs that follow immediately within trajectories. Each iteration now consists of a normal step which computes the Bellman targets with transitions sampled as usual and several backward steps which calculate the SARSA targets with transitions preceding the previous batch within trajectories. We justified the intuitions behind DQBS with illustrations, conducted ablation studies to prove that each design choice leads to a performance increase, and showed that DQBS outperforms DQN with ER in several Gym environments.

Project Sturcture:

.
├── dqbs_paper.pdf
├── run.py <main script for experiment>
├── scripts <run experiments with the best parameters from ablation studies>
│   ├── run_cartpole.sh
│   └── run_acrobot.sh
│   └── run_mountaincar.sh
├── plots <plots used in paper>
│   ├── comparison
│       ├── ...
│   └── ablation
│       ├── ...
└── core <main implementation>
    ├── algorithms.py
    └── models.py
    └── replay_buffer.py

How to Run Experiments:

Set up environment for the first time:

pip install -r requirements.txt

To run experiments, go to the root directory and type (the default parameter can be used for result reproduction):

python run.py

The environment can be specified with --env={cartpole,acrobot,mountaincar} and the produced plots are stored in ./plots.

To test the best parameters for DQBS chosen by ablation studies:

bash scripts/run_{cartpole, acrobot, mountaincar}.sh

Contact information

We are looking forward to your feedback and advice:
{liujin, ouyangy, yileitu, yuyzhao}@student.ethz.ch

About

Deep Learning Course Project at ETH Zurich

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •