The word "volatile" comes from the Latin volatilis, meaning "having wings" or "able to fly". With time, the financial market adopted it to describe asset price variability over time. Here, Volatile becomes a "trading companion", designed to help you every day to make unemotional, algorithmic-based, trading decisions. |
If you expect Volatile to predict the unpredictable, you are in the wrong place. Be reasonable: this is a swing trading software, runnable on your laptop, aimed to quickly discover out-of-trend opportunities by comparing current stock prices to their projections in a few days. If the current price is much lower than its future projection, perhaps it is a good opportunity to buy; vice versa, if it is much higher, perhaps it is a good moment to sell. This does neither mean the projection will be necessarily met, nor that you will make a short-term profit for every single transaction you make. Anything could happen. However, running Volatile on a daily basis will put you in condition to very quickly survey the market, find good opportunities and base your trading decisions on models, algorithms and data.
Volatiles estimates stock trends, predict short-term future prices, then ranks and rates. All you need to do to run Volatile is to open your terminal and type
python volatile.py
Volatile will automatically analyse the list of stock symbols saved in symbols_list.txt
. This should neither be considered as representative nor complete list of stocks; feel free to update it as you please (do not worry if by chance you enter a symbol twice). Mind that it can take a while to access information of stock symbols that are either not in the list or that you pass for the first time. For this reason, relevant stock information is stored in stock_info.csv
and will be fast to access from the second time onwards.
When the run is complete, a prediction table like the following will appear printed on your shell:
For each symbol, the table tells you its sector and industry, then the last available price and finally a rating. Possible ratings are HIGHLY ABOVE TREND, ABOVE TREND, ALONG TREND, BELOW TREND and HIGHLY BELOW TREND. Symbols appear in the table ranked from the furthest above to the furthest below their respective trends. Ranking and rating are derived from a score metric that compares the predicted price in 5 trading days (usually this corresponds to the price in one week) to the last available observed price, scaling by the standard deviation of the prediction; see the technical part below for more details. The prediction table can be saved in the current directory as prediction_table.csv
by adding the following flag to the command above: --save-table
.
In the current directory, several estimation plots will appear. stock_estimation.png
is a visualisation of stock prices and their estimations over the last year, together with a notion of uncertainty. Only stocks rated either above or below their trends will be plotted, ranked as in the prediction table. Notice how the estimation crucially attempts to reproduce the trend of a stock but not to learn its noise. The uncertainty, on the other hand, depends on the stock volatility; the smaller the volatility, the more confident we are about our estimates, the more a sudden shift from the trend will be regarded as significant. You can use this plot as a sanity check that the estimation procedure agrees with your intuition. Make sure to glance at it before any transaction.
sector_estimation.png
and industry_estimation.png
are plots that help you to quickly visualise estimated sector and industry performances. A sector estimate can be though as the average behaviour of its belonging industries, which in turn should be regarded as the average behaviour of its belonging stocks. Both sectors and industries are ranked in alphabetical order.
Finally, market_estimation.png
shows the overall estimated market trend. This can be considered as the average of the sector estimates. Use this plot to immediately know in what phase the stock market currently is.
If you do not want plots to be saved in the current directory, you can disable them by adding the flag
--no-plots
.
You can also provide a list of symbols directly in the command line using the flag -s
; for example, type python volatile.py -s AAPL GOOGL
. In this case, Volatile will perform analysis exclusively based on AAPL and GOOGL. Mind that if the list of symbols is rather small, Volatile will not have enough exposure to the market to provide accurate results.
The easiest way to use Volatile is to:
- open this notebook;
- depending on your OS, press
ctrl+s
orcmd+s
to save it as a.ipynb
file (make sure not to save it as a.txt
file, which is the default option); - upload the notebook on Google Colab and run it.
Alternatively, you can download Volatile locally. First, open a terminal and go to the directory where you intend to install Volatile. On a Mac or Linux, you can do so by typing
cd path/to/your/directory
If you are fine installing Volatile in your home directory, instead of the command before you can just type cd
. Then, download Volatile from Github and get in its main directory by typing
git clone https://github.com/gianlucadetommaso/volatile.git
cd volatile
We recommend to activate a virtual environment. Type
pip install virtualenv
virtualenv venv
source venv/bin/activate
Now that you are in your virtual environment, install the dependencies:
pip install tensorflow-cpu tensorflow-probability matplotlib yfinance
As an alternative, you can also use the requirements file; type pip install -r requirements.txt
.
Important: Tensorflow is currently supported only up to Python 3.8, not yet Python 3.9 (see here); make sure to activate the virtual environment with the right Python version.
Done! You're all set to use Volatile.
Volatile adopts a Bayesian hierarchical model based on adjusted closing prices, sector and industry information, estimating log-price via polynomials in time.
Denote to represent times at which observations arrive.
corresponds to the number of days in the training dataset, which is taken to be the last one year of data.
Furthermore, denote to be prior scale parameters associated to the j-th order of a polynomial with degree
. Decreasing the scales as
increases penalises deviation from zero of higher-order parameters, thereby encouraging simpler models. We will describe below how to select the model complexity
.
We write:
to indicate that an industry
belongs to a sector
, where
is the number of industries and
the number of sectors;
to indicate that a stock
belongs to an industry
, where
the number of stocks.
Then, we construct the hierarchical model
Parameters at market-level are prior means for sector-level parameters
, which in turn are prior means for industry-level parameters
; finally, the latter are prior means for stock-level parameters
Components of the parameters at each level are supposed to be conditionally independent given the parameters at the level above in the hierarchy.
Whereas are used to determine the coefficients of the polynomial model,
are used to determine the scales of the likelihood function. The likelihood, defined in the last line of the hierarchical model, is a Gaussian centred at the polynomial model, with scales that become larger and larger the further the time index
gets from the current time
. In other words, recent data are weighted more than older ones, which get less and less importance the older they get.
In order to estimate parameters, we condition on adjusted closing log-prices , for all
, then we estimate the mode of the posterior distribution, also known as Maximum-A-Posteriori (MAP). From a frequentist statistics perspective, this corresponds to a polynomial regression task where we minimise a regularised and weighted mean-squared error loss. A plot showing the loss decay during training can be saved in the current directory as
loss_decay.png
by adding the flag --plot-loss
in the command line.
Obtained our estimates , we can use the likelihood mean
as an estimator of the log-prices for any time in the past, as well as a predictor for times in the short future. As a measure of uncertainty, we take the learned scale of the likelihood, that is
.
We use the estimates above to select the order of the polynomial. For each candidate order, we train the model with data up to 5 trading days before the current date, then predict the last 5 trading days and test against actual observations. If the likelihood model fits well the data, we should have that the empirical second moment
is approximately 1, where
and
are the estimators described above, while
are actual log-price observations. Thus, we first compute the absolute distance between the empirical second model and 1, then select the polynomial order that makes it the smallest.
Given the selected model complexity, Volatile trains the model and provides a rating for each stock by introducing the following score:
where is the last available log-price,
is its prediction in 5 trading days (usually, that corresponds to the log-price in one week) and
is the estimated standard deviation of the prediction. If the future prediction is larger than the current price, the score will be positive; the larger the difference and the more confident we are about the prediction (or equivalently, the smaller the standard deviation is), the more positive will be the score. We can reason similarly if the score is negative. In other words, a large positive score indicates that the current price is undervalued with respect to its stock trend, therefore an opportunity to buy; a large negative score indicates, vice versa, that the current price is overvalued with respect to its stock trend, therefore a moment to sell.
Then, stocks are rated according to the following criteria:
Because we model log-prices as a Gaussian, the distribution of prices is a log-Normal distribution, whose mean and standard deviation can be derived in closed form from the estimators and
. We use log-Normal distribution statistics at times
to produce the stock estimation plot and at time
to fill the prediction table. In order to produce the market, sector and industry estimation plots, we proceed analogously but with estimators at respective levels, that is
and
for market,
and
for sector,
and
for industry.