Reinforce.jl is an interface for Reinforcement Learning. It is intended to connect modular environments, policies, and solvers with a simple interface.
Packages which build on Reinforce:
- AtariAlgos: Environment which wraps Atari games using ArcadeLearningEnvironment
- OpenAIGym: Wrapper for OpenAI's python package: gym
New environments are created by subtyping AbstractEnvironment
and implementing a few methods:
reset!(env)
actions(env, s) --> A
step!(env, s, a) --> r, s′
finished(env, s′)
and optional overrides:
state(env) --> s
reward(env) --> r
which map to env.state
and env.reward
respectively when unset.
TODO: more details and examples
Agents/policies are created by subtyping AbstractPolicy
and implementing action
. The built-in random policy is a short example:
type RandomPolicy <: AbstractPolicy end
action(policy::RandomPolicy, r, s′, A′) = rand(A′)
The action
method maps the last reward and current state to the next chosen action: (r, s′) --> a′
.
Iterate through episodes using the Episode
iterator. The convenience method episode!
demonstrates this:
function episode!(env, policy = RandomPolicy(); stepfunc = on_step, kw...)
ep = Episode(env, policy; kw...)
for sars in ep
stepfunc(env, ep.niter, sars)
end
ep.total_reward, ep.niter
end
A 4-tuple (s,a,r,s′)
is returned from each step of the episode. Whether we write r
or r′
is a matter of convention.