The implementation of deepRL agents for both discrete and continuous controls, with time series measurements as inputs.
fully-connected, 1d-convolutional, lstm (on-policy algorithms only).
Define all parameters in config.ini
, and run python3 main.py --config-path [path to config.ini]
. This is in run.sh
Multi-processing implementation is at multiprocess branch, in which the global wt and local batch are maintained in queues. It is not as optimal as the multi-threading implementation due to the potential lag between the generation and consumpution of each local batch.
In config-path, the variable BASE DIR has the directory where results are present. Go there on your machine and it will have subfolders of log/, model/.
To monitor progress on tensorboard, type python -m tensorflow.tensorboard --logdir=.
which will launch tensorboard and you can monitor progress on a browser window. Some example plots are below.
Detailed config files are located under ./docs
.
continuous control | discrete control |
---|---|
Pendulum | Acrobot |
![]() |
![]() |
MountainCarContinuous | MountainCar |
![]() |
![]() |
- The default 500 maximum episode length is too short for DDPG to explore a successful trace so it is relaxed to 2000 in this comparison.
- The convergence comparison may not be meaningful since it mostly depends how fast the agent could explore a sucessful trace to get the sparse reward, under a random behavior policy.
- DDPG training takes much longer time, as the in-memory replay buffer grows during training.