Source code(Tensorflow)for Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization (https://arxiv.org/abs/1807.00442)
- gym[mujoco,atari]
- scipy
- tqdm
- joblib
- zmq
- dill
- mpi4py
- cloudpickle
- tensorflow>=1.4.0
- opencv-python
- Atari
python -m baselines.ppo2.run_all_atari
- Mujoco
python -m baselines.ppo2.run_all_mujoco
- Use PPO
python -m baselines.ppo2.run_atari --env AlienNoFrameskip-v4 --num-timesteps 10000000 --seed 10
- Use POP3D
python -m baselines.ppo2.run_atari --env AlienNoFrameskip-v4 --num-timesteps 10000000 --seed 10 --use-penal 1
You can download results on three seeds from google drive https://drive.google.com/file/d/1c79TqWn74mHXhLjoTWaBKfKaQOsfD2hg/view?usp=sharing. We release it to make reproduction of this paper easy.
Thanks to OpenAI's baselines, our code is based on https://github.com/openai/baselines.git