rl-painter

Reinforcement Learning to Paint

Brief

Each observation contains two images, the "target" and the "canvas". Given the observation, you should output a list of 8 scalars within the range of [0,1] as the action, to control where to paint the next stroke.

The reward will be positive if you decreased the difference between "target" and "canvas". The higher the total reward, the better your agent have painted.

Usage

python env.py to test the environment.

ipython -i ddpg2.py then r(10000) to test the env with a naive DDPG algorithm.

Dependencies

opencv-python with OpenCV3.x
a few other helping libraries. Please refer to code.
Python 3.5+

RL-specific details

the observation is Markovian.
human could achieve this task by trial-and-error before finally making a step. Therefore the optimal policy might involve some other classical algorithmic component(search, local optimization, or even modeling) atop deep neural networks.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
Lenna.png		Lenna.png
Lenna_neutrual.png		Lenna_neutrual.png
README.md		README.md
ddpg2.py		ddpg2.py
env.py		env.py
ipc.py		ipc.py
plotter.py		plotter.py
rpm.py		rpm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rl-painter

Brief

Usage

Dependencies

RL-specific details

About

Releases

Packages

Languages

ctmakro/rl-painter

Folders and files

Latest commit

History

Repository files navigation

rl-painter

Brief

Usage

Dependencies

RL-specific details

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages