Skip to content

Commit

Permalink
init commit
Browse files Browse the repository at this point in the history
  • Loading branch information
you-n-g committed Sep 22, 2020
1 parent aa51e5a commit 99ebd87
Show file tree
Hide file tree
Showing 131 changed files with 20,218 additions and 0 deletions.
33 changes: 33 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# https://github.com/github/gitignore/blob/master/Python.gitignore
__pycache__/

*.pyc
*.so
*.ipynb
.ipynb_checkpoints
_build
build/
dist/


*.pkl
*.hd5
*.csv

.env
.vim
.nvimrc
.vscode

qlib/data/_libs/expanding.cpp
qlib/data/_libs/rolling.cpp
examples/estimator/estimator_example/

*.egg-info/


# special software
mlruns/

tags

152 changes: 152 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
Changelog
====================
Here you can see the full list of changes between each QLib release.

Version 0.1.0
--------------------
This is the initial release of QLib library.

Version 0.1.1
--------------------
Performance optimize. Add more features and operators.

Version 0.1.2
--------------------
- Support operator syntax. Now ``High() - Low()`` is equivalent to ``Sub(High(), Low())``.
- Add more technical indicators.

Version 0.1.3
--------------------
Bug fix and add instruments filtering mechanism.

Version 0.2.0
--------------------
- Redesign ``LocalProvider`` database format for performance improvement.
- Support load features as string fields.
- Add scripts for database construction.
- More operators and technical indicators.

Version 0.2.1
--------------------
- Support registering user-defined ``Provider``.
- Support use operators in string format, e.g. ``['Ref($close, 1)']`` is valid field format.
- Support dynamic fields in ``$some_field`` format. And exising fields like ``Close()`` may be deprecated in the future.

Version 0.2.2
--------------------
- Add ``disk_cache`` for reusing features (enabled by default).
- Add ``qlib.contrib`` for experimental model construction and evaluation.


Version 0.2.3
--------------------
- Add ``backtest`` module
- Decoupling the Strategy, Account, Position, Exchange from the backtest module

Version 0.2.4
--------------------
- Add ``profit attribution`` module
- Add ``rick_control`` and ``cost_control`` strategies

Version 0.3.0
--------------------
- Add ``estimator`` module

Version 0.3.1
--------------------
- Add ``filter`` module

Version 0.3.2
--------------------
- Add real price trading, if the ``factor`` field in the data set is incomplete, use ``adj_price`` trading
- Refactor ``handler`` ``launcher`` ``trainer`` code
- Support ``backtest`` configuration parameters in the configuration file
- Fix bug in position ``amount`` is 0
- Fix bug of ``filter`` module

Version 0.3.3
-------------------
- Fix bug of ``filter`` module

Version 0.3.4
--------------------
- Support for ``finetune model``
- Refactor ``fetcher`` code

Version 0.3.5
--------------------
- Support multi-label training, you can provide multiple label in ``handler``. (But LightGBM doesn't support due to the algorithm itself)
- Refactor ``handler`` code, dataset.py is no longer used, and you can deploy your own labels and features in ``feature_label_config``
- Handler only offer DataFrame. Also, ``trainer`` and model.py only receive DataFrame
- Change ``split_rolling_data``, we roll the data on market calender now, not on normal date
- Move some date config from ``handler`` to ``trainer``

Version 0.4.0
--------------------
- Add `data` package that holds all data-related codes
- Reform the data provider structure
- Create a server for data centralized management `qlib-server<https://amc-msra.visualstudio.com/trading-algo/_git/qlib-server>`_
- Add a `ClientProvider` to work with server
- Add a pluggable cache mechanism
- Add a recursive backtracking algorithm to inspect the furthest reference date for an expression

.. note::
The ``D.instruments`` function does not support ``start_time``, ``end_time``, and ``as_list`` parameters, if you want to get the results of previous versions of ``D.instruments``, you can do this:


>>> from qlib.data import D
>>> instruments = D.instruments(market='csi500')
>>> D.list_instruments(instruments=instruments, start_time='2015-01-01', end_time='2016-02-15', as_list=True)


Version 0.4.1
--------------------
- Add support Windows
- Fix ``instruments`` type bug
- Fix ``features`` is empty bug(It will cause failure in updating)
- Fix ``cache`` lock and update bug
- Fix use the same cache for the same field (the original space will add a new cache)
- Change "logger handler" from config
- Change model load support 0.4.0 later
- The default value of the ``method`` parameter of ``risk_analysis`` function is changed from **ci** to **si**


Version 0.4.2
--------------------
- Refactor DataHandler
- Add ``ALPHA360`` DataHandler


Version 0.4.3
--------------------
- Implementing Online Inference and Trading Framework
- Refactoring The interfaces of backtest and strategy module.


Version 0.4.4
--------------------
- Optimize cache generation performance
- Add report module
- Fix bug when using ``ServerDatasetCache`` offline.
- In the previous version of ``long_short_backtest``, there is a case of ``np.nan`` in long_short. The current version ``0.4.4`` has been fixed, so ``long_short_backtest`` will be different from the previous version.
- In the ``0.4.2`` version of ``risk_analysis`` function, ``N`` is ``250``, and ``N`` is ``252`` from ``0.4.3``, so ``0.4.2`` is ``0.002122`` smaller than the ``0.4.3`` the backtest result is slightly different between ``0.4.2`` and ``0.4.3``.
- refactor the argument of backtest function.
- **NOTE**:
- The default arguments of topk margin strategy is changed. Please pass the arguments explicitly if you want to get the same backtest result as previous version.
- The TopkWeightStrategy is changed slightly. It will try to sell the stocks more than ``topk``. (The backtest result of TopkAmountStrategy remains the same)
- The margin ratio mechanism is supported in the Topk Margin strategies.


Version 0.4.5
--------------------
- Add multi-kernel implementation for both client and server.
- Support a new way to load data from client which skips dataset cache.
- Change the default dataset method from single kernel implementation to multi kernel implementation.
- Accelerate the high frequency data reading by optimizing the relative modules.
- Support a new method to write config file by using dict.

Version 0.4.6
--------------------
- Some bugs are fixed
- The default config in `Version 0.4.5` is not friendly to daily frequency data.
- Backtest error in TopkWeightStrategy when `WithInteract=True`.
196 changes: 196 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,199 @@
Qlib is an AI-oriented quantitative investment platform, which aims to realize the potential, empower the research, and create the value of AI technologies in quantitative investment.

With Qlib, you can easily apply your favorite model to create a better Quant investment strategy.


- [Framework of Qlib](#framework-of-qlib)
- [Quick start](#quick-start)
- [Installation](#installation)
- [Get Data](#get-data)
- [Auto Quant research workflow with _estimator_](#auto-quant-research-workflow-with-estimator)
- [Customized Quant research workflow by code](#customized-quant-research-workflow-by-code)
- [More About Qlib](#more-about-qlib)
- [Offline mode and online mode](#offline-mode-and-online-mode)
- [Performance of Qlib Data Server](#performance-of-qlib-data-server)
- [Contributing](#contributing)



# Framework of Qlib
![framework](docs/_static/img/framework.png)

At the module level, Qlib is a platform that consists of the above components. Each component is loose-coupling and can be used stand-alone.

| Name | Description |
| ------ | ----- |
| _Data layer_ | _DataServer_ focus on providing high performance infrastructure for user to retrieve and get raw data. _DataEnhancement_ will preprocess the data and provide the best dataset to be fed in to the models |
| _Interday Model_ | _Interday model_ focus on producing forecasting signals(aka. _alpha_). Models are trained by _Model Creator_ and managed by _Model Manager_. User could choose one or multiple models for forecasting. Multiple models could be combined with _Ensemble_ module |
| _Interday Strategy_ | _Portfolio Generator_ will take forecasting signals as input and output the orders based on current position to achieve target portfolio |
| _Intraday Trading_ | _Order Executor_ is responsible for executing orders produced by _Interday Strategy_ and returning the executed results. |
| _Analysis_ | User could get detailed analysis report of forecasting signal and portfolio in this part. |

* The modules with hand-drawn style is under development and will be released in the future.
* The modules with dashed border is highly user-customizable and extendible.


# Quick start

## Installation

To install Qlib from source you need _Cython_ in addition to the normal dependencies above:

```bash
pip install numpy
pip install --upgrade cython
```

Clone the repository and then run:
```bash
python setup.py install
```


## Get Data
- Load and prepare the Data: execute the following command to load the stock data:
```bash
python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
```
<!--
- Run the initialization code and get stock data:
```python
import qlib
from qlib.data import D
from qlib.config import REG_CN
# Initialization
mount_path = "~/.qlib/qlib_data/cn_data" # target_dir
qlib.init(mount_path=mount_path, region=REG_CN)
# Get stock data by Qlib
# Load trading calendar with the given time range and frequency
print(D.calendar(start_time='2010-01-01', end_time='2017-12-31', freq='day')[:2])
# Parse a given market name into a stockpool config
instruments = D.instruments('csi500')
print(D.list_instruments(instruments=instruments, start_time='2010-01-01', end_time='2017-12-31', as_list=True)[:6])
# Load features of certain instruments in given time range
instruments = ['SH600000']
fields = ['$close', '$volume', 'Ref($close, 1)', 'Mean($close, 3)', '$high-$low']
print(D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day').head())
```
-->

## Auto Quant research workflow with _estimator_
Qlib provides a tool named `estimator` to run whole workflow automatically(including building dataset, train models, backtest, analysis)

1. Run _estimator_ (_config.yaml_ for: [estimator_config.yaml](examples/estimator/estimator_config.yaml)):

```bash
cd examples # Avoid running program under the directory contains `qlib`
estimator -c estimator/estimator_config.yaml
```

Estimator result:

```bash
risk
sub_bench mean 0.000662
std 0.004487
annual 0.166720
sharpe 2.340526
mdd -0.080516
sub_cost mean 0.000577
std 0.004482
annual 0.145392
sharpe 2.043494
mdd -0.083584
```
See the full documents for [Use _Estimator_ to Start An Experiment](TODO:URL).

2. Analysis

Run `examples/estimator/analyze_from_estimator.ipynb` in `jupyter notebook`
1. forecasting signal analysis
- Cumulative Return

![Cumulative Return](docs/_static/img/analysis/analysis_model_cumulative_return.png)
![long_short](docs/_static/img/analysis/analysis_model_long_short.png)
- Information Coefficient(IC)

![Information Coefficient](docs/_static/img/analysis/analysis_model_IC.png)
![Monthly IC](docs/_static/img/analysis/analysis_model_monthly_IC.png)
![IC](docs/_static/img/analysis/analysis_model_NDQ.png)
- Auto Correlation

![Auto Correlation](docs/_static/img/analysis/analysis_model_auto_correlation.png)




2. portfolio analysis
- Report

![Report](docs/_static/img/analysis/report.png)
<!--
- Score IC
![Score IC](docs/_static/img/score_ic.png)
- Cumulative Return
![Cumulative Return](docs/_static/img/cumulative_return.png)
- Risk Analysis
![Risk Analysis](docs/_static/img/risk_analysis.png)
- Rank Label
![Rank Label](docs/_static/img/rank_label.png)
-->

## Customized Quant research workflow by code
Automatic workflow may not suite the research workflow of all Quant researchers. To support flexible Quant research workflow, Qlib also provide modularized interface to allow researchers to build their own workflow. [Here](TODO_URL) is a demo for customized Quant research workflow by code



# More About Qlib
The detailed documents are organized in [docs](docs).
[Sphinx](http://www.sphinx-doc.org) and the readthedocs theme is required to build the documentation in html formats.
```bash
cd docs/
conda install sphinx sphinx_rtd_theme -y
# Otherwise, you can install them with pip
# pip install sphinx sphinx_rtd_theme
make html
```
You can also view the [latest document](TODO_URL) online directly.

The roadmap is managed as a [github project](https://github.com/microsoft/qlib/projects/1).



## Offline mode and online mode
The data server of Qlib can both deployed as offline mode and online mode. The default mode is offline mode.

Under offline mode, the data will be deployed locally.

Under online mode, the data will be deployed as a shared data service. The data and their cache will be shared by clients. The data retrieving performance is expected to be improved due to a higher rate of cache hits. It will use less disk space, too. The documents of the online mode can be found in [Qlib-Server](TODO_link). The online mode can be deployed automatically with [Azure CLI based scripts](TODO_link)

## Performance of Qlib Data Server
The performance of data processing is important to data-driven methods like AI technologies. As an AI-oriented platform, Qlib provides a solution for data storage and data processing. To demonstrate the performance of Qlib, We
compare Qlib with several other solutions.

We evaluate the performance of several solutions by completing the same task,
which creates a dataset(14 features/factors) from the basic OHLCV daily data of a stock market(800 stocks each day from 2007 to 2020). The task involves data queries and processing.

| | HDF5 | MySQL | MongoDB | InfluxDB | Qlib -E -D | Qlib +E -D | Qlib +E +D |
| -- | ------ | ------ | -------- | --------- | ----------- | ------------ | ----------- |
| Total (1CPU) (seconds) | 184.4±3.7 | 365.3±7.5 | 253.6±6.7 | 368.2±3.6 | 147.0±8.8 | 47.6±1.0 | **7.4±0.3** |
| Total (64CPU) (seconds) | | | | | 8.8±0.6 | **4.2±0.2** | |
* `+(-)E` indicates with(out) `ExpressionCache`
* `+(-)D` indicates with(out) `DatasetCache`

Most general-purpose databases take too much time on loading data. After looking into the underlying implementation, we find that data go through too many layers of interfaces and unnecessary format transformations in general-purpose database solutions.
Such overheads greatly slow down the data loading process.
Qlib data are stored in a compact format, which is efficient to be combined into arrays for scientific computation.





# Contributing

Expand Down
Loading

0 comments on commit 99ebd87

Please sign in to comment.