README FILE
Author: Jianyuan (Jet) Yu
Affiliation: Wireless, ECE, Virginia Tech
Email : [email protected]
Date : April, 2018
- Overview
- News
- ToDoList
- Notice
- Bugs
- Related Files:
- Reference:
- Tutorial of Deep Reinforcement Learning
- Ongoing Work - POMDP
- Configuration
- File Topology
- How to run
This project work around applying deep Q network[1] in dynamic channel access.
It validate the performance of intelligent node acess channel without information exchange
with other nodes(legacy, hopping, intermittent, dsa etc). It mainly concerns about convergency speed
and scale issues.
To be exact, we look into following aspects:
- coexsitence with other type of nodes
- legacy
- legacy with tx prob
- hopping
- intermittent(duty cycle)
- dsa (able to wait)
- poission (the arrival interval & service interval follow poisson distribution, i.e. M/M/1 queue model)
- mdp
- dqn
- learn to wait
- learn to occupy more than one channels
- learn to avoid hidden nodes
- learn to utilize spatial reuse (exposed nodes)
- select good channels (when several channel available, some low quality channel bring low reward).
The inspiration comes from SC2 competition, and some papers[2][3] have start some work around it.
The project transfer Chris's code of MDP-DCA Matlab simulator as the starter with MDP python solver[4], and then adapot DQN python solver[5].
Another repository[6] maintain by Yue would merge soon, and [7] is the technical report.
- (Fri Aug 3) Multiple learning nodes coexsit fixed, starting running scale-up case.
- (Tue Jul 17) 2-state markov Chain node added
- (Fri Jun 29) stack-DQN
- add partial observation node with shorten observation as state
- add partial observation node with shorten observation plus padding with zero/one as full state
- add partial observation node with stacked partial observation together as state
- (Sun Jun 24) some new features
- add in possion node, model under M/M/1 theory, with arrival rate & service rate configurable
- add in policy gradient learning node, namely deep policy gradient (dpg) node
- rename, adapt string name, dumb node under 9, learning node start with 10 or more
- learn possion node
- learn legacy node with fixed baised tx prob
- learn long im node under limited memory and steps
- dynamic environment
- learn to greedy occupied all available channel
efficient multiple dsa node coexistmultiple dqn node coexist- merge yue's guess item & eligiable trace dqn node
- dpg
- vi
- pomcp
- possion
- uniform
- 2-state markovChain
-
When assign new number of channels and DQN node exist, need to restart the IPython console, exist pop size umatch error. While would not happen in raw terminal.
-
mdpNode would meet compuation constraint when number of channel over 10, result in dead loop (stuck at stateSpaceCreate).
-
the assigment of patrial observation is currently in dqnNode.py file, tho silly way.
-
due to the
.pyc
are git-ignored, when do the pull, these files would not be cloned, hence when run codes without.pyc
, it would pop error likeImportError: No module named ddpgNode
. Never mind, just execute again and again (to generate more.pyc
files) till it runs through.
We assume all nodes detect and make decision at same time, hence the multiple dsaNode may collide (T.B.D.). -> create politeness to dsa nodes to avoid ping-pong effect, a ugly way.Unstable performance when multiple dqnNode works (T.B.D.). -> assign priority to learning nodes to make them observe-action one by one, a ugly way.