README FILE
Author: Jianyuan (Jet) Yu
Affiliation: Wireless, ECE, Virginia Tech
Email : [email protected]
Date : April, 2018
Bibliography sum up of the Deep Reinforcement Learning on Dynamic Channel Access Project.
- bibliography.bib for conference or journal writing.
- Illustration Graphs for conference or journal writing.
- Equations, Algorithms & Tables for conference or journal writing.
-
- BibTex
- [luong2018applications]
- (+) cover many other work besides dynamic channel access, such as rate control, cache, offload and security, part of them could be our further work.
- (+) cover pratical details such as multi-agent
-
- list liquid state machine / echo state machine
- (-) skip DQN
-
- (+) achieve 66.7% rate when coexist with stochastic channel, the Gilber-Elliot/ 2-state Makov Chain Model.
- round-robin
- (+) require p_i,j preknown; p11 >= p01 all channel; p11 < p01 channel 2 or 3.
- (-) limited, work channel case.
- [zhao2008myopic]
-
- [ahmad2009optimality]
- Liu, Keqin, and Qing Zhao. "Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access." IEEE Transactions on Information Theory 56.11 (2010): 5547-5567.
- [liu2010indexability]
- 45-page version
- Zhang, Yalin, et al. "Model free dynamic sensing order selection for imperfect sensing multichannel cognitive radio networks: A Q-learning approach." Communication Systems (ICCS), 2014 IEEE International Conference on. IEEE, 2014.
- (+) imprefect sensing analyze
- (+) Q-learning
- (-) sense-tx procedure is not real learning
-
-
first journal, IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 4, NO. 2, JUNE 2018, First paper apply DQN on channel access
- stochastic over channel -> stochastic-hopping
- dynamic environment -> auto detect and re-learn
- stack Memory DQN
- sync bettween pair of DQN -> emmergency channel
- (-) sense first
-
USC
-
[wang2018deep]
-
-
- Lingjia Liu's work, apply Echo State Machine , one type of RNN
- DQN + RC > DQN + MLP, same performance, better convergecast
- (-) sense first
- [chang2018distributive]
-
- Time slot aceess
- [yu2017deep]
-
- MIT
- (+)reduce state space in a math way, rather than neural network
- [tsiligkaridis2017accelerated]
-
- (+) first handle multi-agent learning, first implement DRQN + DoubleDQN in LSTM, where author treat
distribute observation as the partial observation. - (-) poor verification
- 30 page version
- [naparstek2017deep]
- (+) first handle multi-agent learning, first implement DRQN + DoubleDQN in LSTM, where author treat
After July 2018
-
- BibTex
- [lu2018uav]
- (+) apply Tranfer Learning to fast initial CNN
- (+) claim to convergecast in 200 steps.
- (-) lack technic details
-
- [wang2018cell]
- BibTex
- (+) apply DRQN to solver partial observation
- (+) transfer learning
- Sutton, Richard S., Andrew G. Barto, and Francis Bach. Reinforcement learning: An introduction. MIT press, 1998.
- BibTex
Method | Author | Afflicate | comment | Bibtex | paper | abbreviation | openSource |
---|---|---|---|---|---|---|---|
DQN | Mnih | Google DeepMind | - | BibTex | paper | [mnih2015human] | DQN |
Double DQN | Van Hasselt | Google DeepMind | - | BibTex | paper | [van2016deep] | Double DQN |
Prioritized DQN | Tom Schaul | Google DeepMind | - | BibTex | paper | [schaul2015prioritized] | Pri DQN |
Dueling DQN | Wang, Ziyu | Google DeepMind | - | BibTex | paper | [wang2015dueling] | Duel DQN |
Asynchronous DQN | Mnih | Google DeepMind | Asynchronous Advantage Actor Critic (A3C) + RNN with continuous action space | BibTex | paper | mnih2016asynchronous] | Asyn DQN |
Distributional DQN | Marc G. Bellemare | Google DeepMind | - | BibTex | paper | [wang2015dueling] | |
Noisy Nets DQL | Meire Fortunato | Google DeepMind | - | BibTex | paper | [wang2015dueling] | |
Rainbow DQN | Matteo Hessel | Google DeepMind | - | BibTex | paper | [hessel2017rainbow] | |
Deep Deterministic Policy Gradient (DDPG) | David Silver | Google DeepMind | - | BibTex | paper | [silver2014deterministic] | DDPG |
Distributed Proximal Policy Optimization (DPPO) | John Schulman | OpenAI | - | BibTex | paper | [schulman2017proximal] | DDPO |
- Wolpertinger architecture(similiar to actor-critic)
- deal with large action-space, ~1 M action
-
- DRQN
- BibTex
- [hausknecht2015deep]
-
Silver, David, and Joel Veness. "Monte-Carlo planning in large POMDPs." Advances in neural information processing systems. 2010. - by David Silver, MIT, 2010.
- POMDP
- BibTex
- [silver2010monte]
- morvan's github for DQN famlily
@misc{Mofan2013,
author = {Mofan Zhou},
title = {Reinforcement-learning-with-tensorflow},
year = {2016},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow}},
commit = {81fea33905c7f81719ec031eab51c68225eb7cce}
}