Skip to content


Repository files navigation

Alt text

Applied Reinforcement Learning @ Facebook

Build Status


<TODO: add stuff from dex here>



We have included a Dockerfile for the CPU-only build and CUDA build under the docker directory. The CUDA build will need nvidia-docker to run.

To build, cd into the respective directory and run

docker build -t horizon:dev .

If the Horizon unittests seem stuck, your Docker VM might not have enough memory. In that case, multiprocessing might be killed and the tests could be left in a hanging state. You can try to increase docker resource limit.

Linux (Ubuntu)

Clone repo:

git clone
cd Horizon/

Our project uses Thrift to define configuration and Spark to transform training data into the right format. They require installing dependencies not managed by virtualenv. Here is the list of software needed to be installed on your system.

  • Thrift compiler version 0.11.0 or above. You will need to build from source. See 1, 2.
  • Oracle Java 8
  • Maven

To install them all, you can run ./ After it finished, you will need to add this line to your .bash_profile

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Now, we recommend you to create virtualenv so that python dependencies can be contained in this project.

virtualenv -p python3 env
. env/bin/activate

First, install dependencies:

pip install -r requirements.txt

Then, install appropriate PyTorch 1.0 nightly build into the virtual environment:

# For CPU build
pip install torch_nightly -f

# For CUDA 9.0 build
pip install torch_nightly -f

# For CUDA 9.2 build
pip install torch_nightly -f

After that, you will need to generate the python code of the thrift config definition. If you changed the thrift later on, you will have to rerun this.

thrift --gen py --out . ml/rl/thrift/core.thrift

And now, you are ready for installation.

pip install -e .

At this point, you should be able to run all unit tests:

python test


Online RL Training

Horizon supports online training environments for model testing. To train a model on OpenAI Gym, simply run:

python ml/rl/test/gym/ -p ml/rl/test/gym/discrete_dqn_cartpole_v0.json

Configs for different environments and algorithms can be found in ml/rl/test/gym/.

Batch RL Training Details

Horizon also supports training on offline data (Batch RL).

Quick Batch RL Examples

Discrete-Action DQN Workflow

cp ml/rl/workflow/sample_datasets/discrete_action/cartpole_training_data.json.gz ~
cp ml/rl/workflow/sample_datasets/discrete_action/state_features_norm.json.gz ~
gunzip ~/cartpole_training_data.json.gz ~/state_features_norm.json.gz

python ml/rl/workflow/ -p ml/rl/workflow/sample_configs/discrete_action/dqn_example.json

Parametric-Action DQN Workflow

cp ml/rl/workflow/sample_datasets/parametric_action/cartpole_training_data.json.gz ~
cp ml/rl/workflow/sample_datasets/parametric_action/state_features_norm.json.gz ~
cp ml/rl/workflow/sample_datasets/parametric_action/action_norm.json.gz
gunzip ~/cartpole_training_data.json.gz ~/state_features_norm.json.gz ~/action_norm.json.gz

python ml/rl/workflow/ -p ml/rl/workflow/sample_configs/parametric_action/parametric_dqn_example.json

DDPG Workflow

cp ml/rl/workflow/sample_datasets/continuous_action/pendulum_training_data.json.gz ~
cp ml/rl/workflow/sample_datasets/continuous_action/state_features_norm.json.gz ~
cp ml/rl/workflow/sample_datasets/continuous_action/action_norm.json.gz ~
gunzip ~/pendulum_training_data.json.gz ~/state_features_norm.json.gz ~/action_norm.json.gz

python ml/rl/workflow/ -p ml/rl/workflow/sample_configs/continuous_action/ddpg_example.json
Detailed Overview

For DQN training, we expect the input data to have the following schema:

<TODO: add schema>

An example data set with this schema is given in ml/rl/workflow/sample_datasets/discrete_action/cartpole_pre_timeline.json.gz.

To train a DQN model on this data set we do the following:

Copy and unzip example dataset:

mkdir cartpole_discrete
cp ml/rl/workflow/sample_datasets/discrete_action/cartpole_pre_timeline.json.gz cartpole_discrete/
gunzip cartpole_discrete/cartpole_pre_timeline.json.gz

Models are trained on consecutive pairs of state/action tuples. To assist in creating this table, we have an RLTimelineOperator spark operator. Build and run the timeline operator on the data. Make sure that you have java (not openjdk that sometimes is shipped with linux) & scala installed:

mvn -f preprocessing/pom.xml package

Next, download spark if you don't already have it:

tar xvf spark-2.3.1-bin-hadoop2.7.tgz
mv spark-2.3.1-bin-hadoop2.7 /usr/local/spark

Now run the timeline operator on the training data directory to generate the training data:

/usr/local/spark/bin/spark-submit --class com.facebook.spark.rl.Preprocessor preprocessing/target/rl-preprocessing-1.1.jar  "cat ml/rl/workflow/sample_configs/discrete_action/timeline.json"

This will create a directory cartpole_discrete_training_data that contains the (sharded) post timeline data. An example of this data is given in ml/rl/workflow/sample_datasets/discrete_action/cartpole_training_data.json.gz. We will use this example to train our DQN model.

cp ml/rl/workflow/sample_datasets/discrete_action/cartpole_training_data.json.gz ~/
gunzip ~/cartpole_training_data.json.gz

Next, we will create our normalization meta-data based off of the features in the training data. We run a one-time normalization workflow that analyzes the dataset and determines the best normalization parameters.

python ml/rl/workflow/ -p ml/rl/workflow/sample_configs/discrete_action/dqn_example.json

This will create a file that contains our feature normalization meta-data at the path specified in dqn_example.json. Next we can run the DQN training workflow as follows:

python ml/rl/workflow/ -p ml/rl/workflow/sample_configs/discrete_action/dqn_example.json

This command trains the DQN model on the training data ~/cartpole_training_data using the normalization parameters ~/state_features_norm.json.

Upon completion of training two models are ouput to file. We output a snaphot of the PyTorch trainer object - a python object that holds all objects necessary to resume training (neural nets, optimizers, etc.) and a caffe2 model which can be used in production for inference across many devices. See test_read_c2_model_from_file in ml/rl/test/workflow/ for an example of how to use the outputted caffe2 model in Python.


A platform for Applied Reinforcement Learning (Applied RL)



Code of conduct





No releases published


No packages published


  • Python 92.4%
  • Scala 4.2%
  • Dockerfile 1.6%
  • Other 1.8%