Inspired by the article entitled An Automated Portfolio Trading System with Feature Preprocessing and Recurrent Reinforcement Learning written by Lin Li, we aim at implementing a fully automated trading system which incorporates a portfolio weight rebalance function and handles multiple assets. The trading bot is based on recurrent reinforcement learning (RRL).
To implement the article, a Python library named rrl_trading
has been developed. It consists in 4 main sub-modules which are introduced below:
Module | Description |
---|---|
data |
Data collection, indicators, preprocessing (PCA, DWT) |
model |
RRL model, training, validation |
metrics |
Cumulative returns/profits, Sharpe ratio |
backtest |
Run RRL strategy, visualisation |
The library can be used in a jupyter notebook as in notebook.ipynb or in command line. You need to follow the following steps to use the library directly in your terminal:
- Indicate the data settings (assets, indicators, period, etc.) in the data.yaml configuration file
- Run the following command in you terminal
python -m rrl_trading -config_path "./config/data.yaml" -initial_invest [INITIAL_INVEST] -fees [FEES] -n_epochs [N_EPOCHS] -version [VERSION]
Notes:
-initial_invest
can be integer of float-fees
can take "no_fees", "10bps", "30bps", etc. as value ("bps" stands for "basis points"). It is recommended to use integer values for the bps fees.-n_epochs
must be an integer-version
can be an integer or a string.
The below schema briefly depicts the two main parts of the trading bot namely the data preprocessing layer as well as the recurrent reinforcement learning model, its training and validation process. The following sections give more details about each step used to build the bot, as well as the results obtained during backtests.
Conceptual schema of the RRL-PCA-DWT trading systemIn this part, we explain the data source used to realize the project and the preprocessing steps that are implemented to remove the noise in the raw data and uncover the general pattern underlying the financial data set.
Since the trading system is supposed to run continuously on daily data, the yfinance
library is useful to retrieve accurate financial data on multiple stocks. It is an open-source tool that uses Yahoo Finance's publicly available APIs, and is intended for research and educational purposes.
As in Lin Li's article, we used the 8 subsequent financial assets as input in the RRL trading system. These stocks are listed in the S&P500 index which is representative of the general stock market condition in the US. When downloading the data from Yahoo Finance, Open, High, Low, Close and Volume are returned for each stock. The study is realised between 2009/12/31 and 2017/12/29.
Ticker | Company |
---|---|
XOM | Exxon Mobil Corporation |
VZ | Verizon Communications Inc. |
NKE | Nike, Inc. |
AMAT | Applied Materials, Inc. |
MCD | McDonald's Corporation |
MSFT | Microsoft Corporation |
AAP | Advance Auto Parts, Inc. |
NOV | Nov, Inc. |
Technical indicators are heuristic or pattern-based signals produced by the price, volume, and/or open interest of a security or contract used by traders who follow technical analysis. In other words, they summarize the general pattern of the financial time series. While 4 groups of technical indicators are mentioned in the article, we solely use 3 types as depicted in the following table. Cycle indicators indeed decreased the performance of the RRL trading system.
Momentum | Volatility | Volume |
---|---|---|
Momentum (MOM) | Average True Range (ATR) | Chaikin Oscillator (CO) |
Moving Average Convergence Divergence (MACD) | Normalized Average True Range (NATR) | On Balance Volume (OBV) |
Money Flow Index (MFI) | ||
Relative Strength Index (RSI) |
Both the ta
and TA-Lib
Python libraries are leveraged to compute the indicators without much difficulty.
We note
To avoid scaling issues, each technical indicator feature is normalized using the z-score:
where
One of the main stake that arise when training a machine learning model, is the agent's ability to generalize on unseen data. In other words the ML agent needs to learn the general pattern of the data, and noise has to be removed.
PCA is the first technique used in the preprocessing layer and aims at reducing the dimension of the input data. To that end, PCA identifies principal axes that represent the directions of maximum variance of the input. In our project, the normalized indicators in
The sklearn
library is used to implement PCA.
Although PCA is a powerful technique for dimension reduction, some local noise may persist in the reduced data. Consequently, the DWT method is applied on the principal components in
The PyWavelets
is used to implement Discrete Wavelet Transform.
As shown by the conceptual schema, the data is divided into training and trading (validation) batches of length
$$ \begin{align*} \mathcal{B}{\text{train}} & = { (X_b, \mathrm{r}b) }{b=1}^{B-1} \\ \mathcal{B}{\text{val}} & = { (X_b, \mathrm{r}b) }{b=2}^{B} \end{align*} $$
where
It is relevant to note that normalization and PCA are only fitted on the training batches and the technical indicators are calculated on each batch separately. The idea behind this is to ensure that the model's performance is an accurate reflection of its ability to generalize to new data.
The data
module from the rrl_trading
library contains a dataclass named Data
which helps the user creating the dataset easily. This dataclass incorporates all necessary methods to split the data into batches, compute indicators and apply the PCA & DWT transformations.
Once the data fully prepreocessed and the training and trading batches created, the recurrent reinforcement lerning model can start its learning process.
Based on the preprocessed technical indicators, the RRL agent aims at rebalancing the portfolio which is composed of
$$\mathrm{F}t = [F{1,t}, \dots, F_{m,t}]^{'}$$
where,
Given
$$ R_t = (1 + \mathrm{F}{t-3}^{'}\mathrm{r}{t})(1 - \delta \cdot \mathrm{e}m^{'}|F{t-2} - F_{t-3}|) - 1 \quad \text{where } \mathrm{r}t = (r{1, t}, \dots, r_{m, t})^{'} $$
Note we use positions computed at time
Let us note
The feature matrix is defined as:
$$ X_t = [\mathrm{x}{1, t}, \dots, \mathrm{x}{m,t}]^{'} \quad \text{such that } \mathrm{x}{i, t} = [1, x{i, 1, t}, \dots, x_{i, n, t}, F_{i, t-1}] $$
Note that
The hidden values computed at time
where
We then use the hyperbolic tangent activation function such that:
$$ \mathrm{f}t = [f{1, t}, \dots, f_{m, t}]^{'} = \text{tanh}(Y_t) $$
Finally, we use the softmax function to normalize the outputs and obtain portfolio weights at time
It is important to note taht the trader that takes only long positions, which implies the sum of porftoflio weights at time
The following schema is a simplified version of the shallow neural network used to build the trading system. Here, the portfolio is made up of 2 assets and there are 2 preprocessed features to learn the optimal portfolio weights. The dotted lines illustrate the network's recursive pattern. The dashed line stands for the assets' returns at time time
The model
sub-module from the rrl_trading
library implements a python version of the RRL model through the RRL
class. The latter incorporates forward pass, backpropagation and computes portfolio returns for a given portfolio allocation.
When training the PCA-DWT-RRL trading agent, the objective function needs to be maximized as it is the Sharpe ratio. To that end, the network's parameters are updated based on the gradient ascent rule. The following image shows the training process for one batch over
More details on backpropagation can be found in the original article.
The methods developed in the optimize
sub-module from the rrl_trading
library are used to train the RRL agent.
This final section presents backtesting results of different strategies based on the RRL trading system. The benchmark is the "Buy & hold" strategy in which the investor buy the stock at the begining of the trading period and sells at the end.
Emphasis is placed on comparing cumulative profits and annualized sharpe ratio for trading strategies implemented on different preprocessed data. We set the transaction fees to 30 bps which is
It can be noted that the best performance is obtained when we solely apply the Discrete Wavelet Transform in the preprocessing layer (RRL-DWT). The sharpe ratio is higher for the RRL-PCA-DWT and RRL-DWT strategies than for the "Buy & hold" benchmark. The two other RRL-based strategies perform poorly.
Note the above graph has been obtained using the plot_cumulative_profits()
function from the backtest
sub-module.
The backtest
sub-module gives investors key insights about a specific startegy.
On the one hand, one can get a dynamic visualisation of cumulative returns at the end of each trading window using the make_cumrets_barplot()
function. The first plot indicates the RRL-DWT strategy obtain quite satisfying results over the different trading batches as only 3 of them get negative cumulative returns.
On the other hand, the plot_avg_portfolio_allocation()
function shows the average portfolio allocation over trading windows which is useful to identify the assets favored by the trading agent.
From the above pie chart, it can be noticed that AMAT, MSFT, MCD and AAP accounts for more than 60% of the portfolio optimized by the trading agent. It seems interesting to run another RRL-DWT strategy with these 4 assets only.
Here, the trading system's decisions seem to be efficient when fees are not too high.