CIKM2024-MATCC: A Novel Approach for Robust Stock Price Prediction Incorporating Market Trends and Cross-time Correlations
This repository is the official implementation of MATCC: A Novel Approach for Robust Stock Price Prediction Incorporating Market Trends and Cross-time Correlations.
MATCC is a novel framework for robust stock price prediction, which explicitly extracts market trends as guiding information, decomposes stock data into trend and fluctuation components, and employs a carefully designed structure for mining cross-time correlation.
- Install dependencies.
- pandas == 1.5.3
- torch == 1.13.0
-
Install Qlib. We recommend you to install Qlib from SJTU-Qlib.
-
Get the CSI300 and CSI800 stock instruments from Qlib dataset update and put it in your qlib data directory.
-
Use the dataset we have provided. You can follow the link to get our preprocessed dataset 2020_2023. If you fail to download from our link, you may try to use datasets from MASTER.
-
If you want to design and implement your own data preprocessing methods, please refer to
util/DropExtremeLabel.py
. If you want to change the period of the train/validation/test dataset, please modifyutil/2023.yaml
. To get CSI300 or CSI800, just change themarket
inutil/2023.yaml
anduniverse
inutil/generate_dataset.py
. Run the following commmand:
python util/generate_dataset.py --universe csi300
The published data went through the following necessary preprocessing.
- Drop NA features, and perform robust daily Z-score normalization on each feature dimension.
- Drop NA labels and 5% of the most extreme labels, and perform daily Z-score normalization on labels.
To train the model in the paper, run this command:
python train_model_MATCC.py
To evaluate our model on CSI300 or CSI800 dataset, you can change the universe
and model_param_path
in test_model_MATCC.py
and run:
python test_model_MATCC.py
Then you will get xxxxx_label.pkl
and xxxxx_pred.pkl
, which are the prediction results and labels.
To get the AR, IR, and more portfolio-based metrics, you can run this command:
python backtest.py
Our model achieves the following performance on CSI300, CSI800, S&P500 (2020.07 - 2023.12.31) and NK225 (2022.07 - 2024.07):
Extreme Market environment
Normal Market environment
We appreciate the following github repos a lot for their valuable code base or datasets:
- MASTER: https://github.com/SJTU-Quant/MASTER
- AutoFormer: https://github.com/thuml/Autoformer
- DLinear: https://github.com/cure-lab/LTSF-Linear
- RWKV: https://github.com/BlinkDL/RWKV-LM
- Pytorch_linear_WarmUp_CosineAnnealing: https://github.com/saadnaeem-dev/pytorch-linear-warmup-cosine-annealing-warm-restarts-weight-decay