Our version of the deep q-learning algorithm from The DQN paper. This algorithm reads the screen and the integer score of the Atari 2600 game Space Invaders. The output is the same control commands as a human would have with a controller (albeit, without the physical controller).
- Python 2.7
- Theano
- Lasagne
- pygame
- Arcade Learning Environment (ALE) 0.5.1
- Atari 2600 ROM of space_invaders.bin
Look at /provision/aws_installation.sh for a concise shell history to install the environment.
Human-level control through deep reinforcement learning
Deep Reinforcement Learning with Double Q-learning - more stable learning through double q-learning
Action-Conditional Video Prediction using Deep Networks in Atari Games - predicting future frames
Dueling Network Architectures for Deep Q-learning
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Reccurent Model of Visual Attention - applying q-learning to figure out what part of the image to look at.
Prioritized Experience Replay - drawing from the memory should be more likely if the memory is more shocking
Deep Recurrent Q-Learning For Partially Observable MDPs - by using LSTM you can get rid of preprocessing done in DQN paper. "The recurrent net can better adapt at evaluation time if the quality of observations changes"
A fast learning algorithm for deep belief nets - Training one layer at a time
Reinforcement Learning and Automated Planning: A Survey
Autoregressive Neural Networks - Neural Networks applied to Time Series.
Deep Autoregressive Neural Networks - predicting future frames of an Atari Game.
Reinforcement Learning: An introduction - very thorough introduction to Reinforcement Learning.
A survey of robot learning by demonstration Learning by|from demonstration = Learning by watching = Learning from observation = Programming by demonstration = Behaviour cloning|imitation|mimicry
Deep Reinforcement Learning Nice summary of recent advances in Deep Q-learning.
Concurrent Q-learning for Autonomous Mapping and Navigation One-trial learning???
Using Reinforcement Learning to Adapt an Imitation Task Overcoming new obstacles ???
On the importance of initialization and momentum in deep learning - Nesterov Momentum vs Nesterov Accelerated Gradient
CNN Features off-the-shelf: an Astounding Baseline for Recognition NN generated features are better then manually-made
Prioritized Experience Replay - on Atari games
Network in Network - MaxPooling looses information, let's keep some more information.
Concurrent Reinforcement Learning - RL in time dependent environments