This repository implements a deep reinforcement learning algorithm designed to utilize uncertainty in robotic model dynamics to learn to avoid future collisions.
This repository implements and explores a new way to train policy gradient algorithms using the output from the self-supervised perturbation detection algorithm presented in my deep_dynamics repository as the reward for each action in a given state. A pre-trained collision anticipation model can be used when training this intrinsic-RL method. The motivation for this approach is to develop new learning algorithms for robots that minimize damaging interactions during machine learning in new environments. This research proposes a new deep reinforcement learning algorithm that attempts to solve the dodge ball task with eight projectiles. The Monte-Carlo (i.e. stochastic) policy gradient algorithm method was used, however, the proposed intrinsic reward strategy is not specific to the Monte-Carlo policy gradient algorithm. The intrinsic reward strategy presented is general to any policy gradient method and may be usable in other reinforcement learning frameworks. A visual and mathematical depiction of the intrinsic-RL method is shown below.
Train intrinsic-RL policy gradient algorithm using collision anticipation and a stateful ConvIRNN network structure
python train.py --use_ca=True --policy_inp_type=3
Demo intrinsic-RL policy gradient algorithm with no collision anticipation and a ConvLSTM network structure
python demo_model.py --use_ca=False --policy_inp_type=1
Change BASE_DIR in config.ini to the absolute path of the current directory.
Packages needed to run the code include:
- numpy
- scipy
- python3
- PyTorch
- matplotlib
- VREP