init commit

lyrl · Sep 22, 2020 · 99ebd87 · 99ebd87
1 parent aa51e5a
commit 99ebd87
Show file tree

Hide file tree

Showing 131 changed files with 20,218 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,33 @@
+# https://github.com/github/gitignore/blob/master/Python.gitignore
+__pycache__/
+
+*.pyc
+*.so
+*.ipynb
+.ipynb_checkpoints
+_build
+build/
+dist/
+
+
+*.pkl
+*.hd5
+*.csv
+
+.env
+.vim
+.nvimrc
+.vscode
+
+qlib/data/_libs/expanding.cpp
+qlib/data/_libs/rolling.cpp
+examples/estimator/estimator_example/
+
+*.egg-info/
+
+
+# special software
+mlruns/
+
+tags
+
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -0,0 +1,152 @@
+Changelog
+====================
+Here you can see the full list of changes between each QLib release.
+
+Version 0.1.0
+--------------------
+This is the initial release of QLib library.
+
+Version 0.1.1
+--------------------
+Performance optimize. Add more features and operators.
+
+Version 0.1.2
+--------------------
+- Support operator syntax. Now ``High() - Low()`` is equivalent to ``Sub(High(), Low())``.   
+- Add more technical indicators.
+
+Version 0.1.3
+--------------------
+Bug fix and add instruments filtering mechanism.
+
+Version 0.2.0
+--------------------
+- Redesign ``LocalProvider`` database format for performance improvement.
+- Support load features as string fields.
+- Add scripts for database construction.
+- More operators and technical indicators.
+
+Version 0.2.1
+--------------------
+- Support registering user-defined ``Provider``.
+- Support use operators in string format, e.g. ``['Ref($close, 1)']`` is valid field format.
+- Support dynamic fields in ``$some_field`` format. And exising fields like ``Close()`` may be deprecated in the future.
+
+Version 0.2.2
+--------------------
+- Add ``disk_cache`` for reusing features (enabled by default).
+- Add ``qlib.contrib`` for experimental model construction and evaluation.
+
+
+Version 0.2.3
+--------------------
+- Add ``backtest`` module
+- Decoupling the Strategy, Account, Position, Exchange from the backtest module
+
+Version 0.2.4
+--------------------
+- Add ``profit attribution`` module
+- Add ``rick_control`` and ``cost_control`` strategies
+
+Version 0.3.0
+--------------------
+- Add ``estimator`` module
+
+Version 0.3.1
+--------------------
+- Add ``filter`` module
+
+Version 0.3.2
+--------------------
+- Add real price trading, if the ``factor`` field in the data set is incomplete, use ``adj_price`` trading
+- Refactor ``handler`` ``launcher`` ``trainer`` code
+- Support ``backtest`` configuration parameters in the configuration file
+- Fix bug in position ``amount`` is 0
+- Fix bug of ``filter`` module
+
+Version 0.3.3
+-------------------
+- Fix bug of ``filter`` module
+
+Version 0.3.4
+--------------------
+- Support for ``finetune model``
+- Refactor ``fetcher`` code
+
+Version 0.3.5
+--------------------
+- Support multi-label training, you can provide multiple label in ``handler``. (But LightGBM doesn't support due to the algorithm itself)
+- Refactor ``handler`` code, dataset.py is no longer used, and you can deploy your own labels and features in ``feature_label_config``
+- Handler only offer DataFrame. Also, ``trainer`` and model.py only receive DataFrame
+- Change ``split_rolling_data``, we roll the data on market calender now, not on normal date
+- Move some date config from ``handler`` to ``trainer``
+
+Version 0.4.0
+--------------------
+- Add `data` package that holds all data-related codes
+- Reform the data provider structure
+- Create a server for data centralized management `qlib-server<https://amc-msra.visualstudio.com/trading-algo/_git/qlib-server>`_
+- Add a `ClientProvider` to work with server
+- Add a pluggable cache mechanism
+- Add a recursive backtracking algorithm to inspect the furthest reference date for an expression
+
+.. note::
+    The ``D.instruments`` function does not support ``start_time``, ``end_time``, and ``as_list`` parameters, if you want to get the results of previous versions of ``D.instruments``, you can do this:
+
+
+    >>> from qlib.data import D
+    >>> instruments = D.instruments(market='csi500')
+    >>> D.list_instruments(instruments=instruments, start_time='2015-01-01', end_time='2016-02-15', as_list=True)
+
+
+Version 0.4.1
+--------------------
+- Add support Windows
+- Fix ``instruments`` type bug
+- Fix ``features`` is empty bug(It will cause failure in updating)
+- Fix ``cache`` lock and update bug
+- Fix use the same cache for the same field (the original space will add a new cache)
+- Change "logger handler" from config
+- Change model load support 0.4.0 later
+- The default value of the ``method`` parameter of ``risk_analysis`` function is changed from **ci** to **si**
+
+
+Version 0.4.2
+--------------------
+- Refactor DataHandler
+- Add ``ALPHA360`` DataHandler
+
+
+Version 0.4.3
+--------------------
+- Implementing Online Inference and Trading Framework
+- Refactoring The interfaces of backtest and strategy module.
+
+
+Version 0.4.4
+--------------------
+- Optimize cache generation performance
+- Add report module
+- Fix bug when using ``ServerDatasetCache`` offline.
+- In the previous version of ``long_short_backtest``, there is a case of ``np.nan`` in long_short. The current version ``0.4.4`` has been fixed, so ``long_short_backtest`` will be different from the previous version.
+- In the ``0.4.2`` version of ``risk_analysis`` function, ``N`` is ``250``, and ``N`` is ``252`` from ``0.4.3``, so ``0.4.2`` is ``0.002122`` smaller than the ``0.4.3`` the backtest result is slightly different between ``0.4.2`` and ``0.4.3``.
+- refactor the argument of backtest function.
+    - **NOTE**:
+      - The default arguments of topk margin strategy is changed. Please pass the arguments explicitly if you want to get the same backtest result as previous version.
+      - The TopkWeightStrategy is changed slightly. It will try to sell the stocks more than ``topk``.  (The backtest result of TopkAmountStrategy remains the same)
+- The margin ratio mechanism is supported in the Topk Margin strategies.
+
+
+Version 0.4.5
+--------------------
+- Add multi-kernel implementation for both client and server.
+    - Support a new way to load data from client which skips dataset cache.
+    - Change the default dataset method from single kernel implementation to multi kernel implementation.
+- Accelerate the high frequency data reading by optimizing the relative modules.
+- Support a new method to write config file by using dict.
+
+Version 0.4.6
+--------------------
+- Some bugs are fixed
+    - The default config in `Version 0.4.5` is not friendly to daily frequency data.
+    - Backtest error in TopkWeightStrategy when `WithInteract=True`.
diff --git a/README.md b/README.md
@@ -1,3 +1,199 @@
+Qlib is an AI-oriented quantitative investment platform, which aims to realize the potential, empower the research, and create the value of AI technologies in quantitative investment.
+
+With Qlib, you can easily apply your favorite model to create a better Quant investment strategy.
+
+
+- [Framework of Qlib](#framework-of-qlib)
+- [Quick start](#quick-start)
+  - [Installation](#installation)
+  - [Get Data](#get-data)
+  - [Auto Quant research workflow with _estimator_](#auto-quant-research-workflow-with-estimator)
+  - [Customized Quant research workflow by code](#customized-quant-research-workflow-by-code)
+- [More About Qlib](#more-about-qlib)
+  - [Offline mode and online mode](#offline-mode-and-online-mode)
+  - [Performance of Qlib Data Server](#performance-of-qlib-data-server)
+- [Contributing](#contributing)
+
+
+
+# Framework of Qlib
+![framework](docs/_static/img/framework.png)
+
+At the module level, Qlib is a platform that consists of the above components. Each component is loose-coupling and can be used stand-alone.
+
+| Name                | Description                                                                                                                                                                                                                                                   |
+| ------              | -----                                                                                                                                                                                                                                                         |
+| _Data layer_        | _DataServer_ focus on providing high performance infrastructure  for user to retrieve and get raw data. _DataEnhancement_ will preprocess the data and provide the best dataset to be fed in to the models                                                    |
+| _Interday Model_    | _Interday model_ focus on producing forecasting signals(aka. _alpha_). Models are trained by _Model Creator_ and managed by _Model Manager_. User could choose one or multiple models for forecasting. Multiple models could be combined with _Ensemble_ module |
+| _Interday Strategy_ | _Portfolio Generator_ will take forecasting signals as input and output the orders based on current position to achieve target portfolio                                                                                                                      |
+| _Intraday Trading_  | _Order Executor_ is responsible for executing orders produced by _Interday Strategy_ and returning the executed results.                                                                                                                                        |
+| _Analysis_          | User could get detailed analysis report of forecasting signal and portfolio in this part.                                                                                                                                                                     |
+
+* The modules with hand-drawn style is under development and will be  released in the future.
+* The modules with dashed border is highly user-customizable and extendible.
+
+
+# Quick start
+
+## Installation
+
+To install Qlib from source you need _Cython_ in addition to the normal dependencies above:
+
+```bash
+pip install numpy
+pip install --upgrade  cython
+```
+
+Clone the repository and then run:
+```bash
+python setup.py install
+```
+
+
+## Get Data
+- Load and prepare the Data: execute the following command to load the stock data:
+  ```bash
+  python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
+  ```
+<!-- 
+- Run the initialization code and get stock data:
+
+  ```python
+  import qlib
+  from qlib.data import D
+  from qlib.config import REG_CN
+
+  # Initialization
+  mount_path = "~/.qlib/qlib_data/cn_data"  # target_dir
+  qlib.init(mount_path=mount_path, region=REG_CN)
+
+  # Get stock data by Qlib
+  # Load trading calendar with the given time range and frequency
+  print(D.calendar(start_time='2010-01-01', end_time='2017-12-31', freq='day')[:2])
+
+  # Parse a given market name into a stockpool config
+  instruments = D.instruments('csi500')
+  print(D.list_instruments(instruments=instruments, start_time='2010-01-01', end_time='2017-12-31', as_list=True)[:6])
+
+  # Load features of certain instruments in given time range
+  instruments = ['SH600000']
+  fields = ['$close', '$volume', 'Ref($close, 1)', 'Mean($close, 3)', '$high-$low']
+  print(D.features(instruments, fields, start_time='2010-01-01', end_time='2017-12-31', freq='day').head())
+  ```
+ -->
+
+## Auto Quant research workflow with _estimator_
+Qlib provides a tool named `estimator` to run whole workflow automatically(including building dataset, train models, backtest, analysis)
+
+1. Run _estimator_ (_config.yaml_ for: [estimator_config.yaml](examples/estimator/estimator_config.yaml)):
+
+    ```bash
+    cd examples  # Avoid running program under the directory contains `qlib`
+    estimator -c estimator/estimator_config.yaml
+    ```
+
+    Estimator result:
+
+    ```bash
+
+                          risk
+    sub_bench mean    0.000662
+              std     0.004487
+              annual  0.166720
+              sharpe  2.340526
+              mdd    -0.080516
+    sub_cost  mean    0.000577
+              std     0.004482
+              annual  0.145392
+              sharpe  2.043494
+              mdd    -0.083584
+    ```
+    See the full documents for [Use _Estimator_ to Start An Experiment](TODO:URL).
+
+2. Analysis
+
+    Run `examples/estimator/analyze_from_estimator.ipynb` in `jupyter notebook`
+    1.  forecasting signal analysis
+        - Cumulative Return
+
+        ![Cumulative Return](docs/_static/img/analysis/analysis_model_cumulative_return.png)
+        ![long_short](docs/_static/img/analysis/analysis_model_long_short.png)
+        - Information Coefficient(IC)
+
+        ![Information Coefficient](docs/_static/img/analysis/analysis_model_IC.png)        
+        ![Monthly IC](docs/_static/img/analysis/analysis_model_monthly_IC.png)        
+        ![IC](docs/_static/img/analysis/analysis_model_NDQ.png)
+        - Auto Correlation
+
+        ![Auto Correlation](docs/_static/img/analysis/analysis_model_auto_correlation.png)
+
+
+
+
+    2.  portfolio analysis
+        - Report
+
+        ![Report](docs/_static/img/analysis/report.png)
+        <!-- 
+        - Score IC
+        ![Score IC](docs/_static/img/score_ic.png)
+        - Cumulative Return
+        ![Cumulative Return](docs/_static/img/cumulative_return.png)
+        - Risk Analysis
+        ![Risk Analysis](docs/_static/img/risk_analysis.png)
+        - Rank Label
+        ![Rank Label](docs/_static/img/rank_label.png)
+        -->
+
+## Customized Quant research workflow by code
+Automatic workflow may not suite the research workflow of all Quant researchers. To support flexible Quant research workflow, Qlib also provide modularized interface to allow researchers to build their own workflow. [Here](TODO_URL) is a demo for customized Quant research workflow by code
+
+
+
+# More About Qlib
+The detailed documents are organized in [docs](docs).
+[Sphinx](http://www.sphinx-doc.org) and the readthedocs theme is required to build the documentation in html formats. 
+```bash
+cd docs/
+conda install sphinx sphinx_rtd_theme -y
+# Otherwise, you can install them with pip
+# pip install sphinx sphinx_rtd_theme
+make html
+```
+You can also view the [latest document](TODO_URL) online directly.
+
+The roadmap is managed as a [github project](https://github.com/microsoft/qlib/projects/1).
+
+
+
+## Offline mode and online mode
+The data server of Qlib can both deployed as offline mode and online mode. The default mode is offline mode.
+
+Under offline mode, the data will be deployed locally. 
+
+Under online mode, the data will be deployed as a shared data service. The data and their cache will be shared by clients. The data retrieving performance is expected to be improved due to a higher rate of cache hits. It will use less disk space, too. The documents of the online mode can be found in [Qlib-Server](TODO_link). The online mode can be deployed automatically with [Azure CLI based scripts](TODO_link)
+
+## Performance of Qlib Data Server
+The performance of data processing is important to data-driven methods like AI technologies. As an AI-oriented platform, Qlib provides a solution for data storage and data processing. To demonstrate the performance of Qlib, We
+compare Qlib with several other solutions. 
+
+We evaluate the performance of several solutions by completing the same task,
+which creates a dataset(14 features/factors) from the basic OHLCV daily data of a stock market(800 stocks each day from 2007 to 2020). The task involves data queries and processing.
+
+|                         | HDF5      | MySQL     | MongoDB   | InfluxDB  | Qlib -E -D  | Qlib +E -D   | Qlib +E +D  |
+| --                      | ------    | ------    | --------  | --------- | ----------- | ------------ | ----------- |
+| Total (1CPU) (seconds)  | 184.4±3.7 | 365.3±7.5 | 253.6±6.7 | 368.2±3.6 | 147.0±8.8   | 47.6±1.0     | **7.4±0.3** |
+| Total (64CPU) (seconds) |           |           |           |           | 8.8±0.6     | **4.2±0.2**  |             |
+* `+(-)E` indicates with(out) `ExpressionCache`
+* `+(-)D` indicates with(out) `DatasetCache`
+
+Most general-purpose databases take too much time on loading data. After looking into the underlying implementation, we find that data go through too many layers of interfaces and unnecessary format transformations in general-purpose database solutions.
+Such overheads greatly slow down the data loading process.
+Qlib data are stored in a compact format, which is efficient to be combined into arrays for scientific computation.
+
+
+
+
 
 # Contributing