This is the origin Pytorch implementation of Area2Area forecasting formula in the following paper: [Area2Area Forecasting: Looser Constraints, Better Predictions](Manuscript submitted to journal Information Sciences).
Positive sampling: We treat input sequences with/without data augmentation which span the same timestamps as positive examples with each other. Meanwhile, other input sequences are treated as negative examples of the current selected input sequence, even if they only contain one different timestamp. (2) Negative sampling: We introduce extra negative data augmentation into each input sequence where we randomly shuffle all elements of input sequence and treat shuffled input sequences as negative examples of the current selected input sequence.
Figure 1. An overview of the final contrastive loss. Each block denotes the representation of input sequence under a timespan
of a variable. Blocks with different colors refer to representations of different variables. Only input sequences with data augmentation (parallelogram with light color) which span the same timestamps of initial sequences are treated as their positive
examples (dotted lines) and other sequences are treated as their negative examples (solid lines).
Area Loss (AL) replaces the certain truth of prediction sequence with an area surrounding it as shown in Figure 6. We choose a fixed upper bound as the maximum of truth area and a fixed lower bound as the minimum one. It means that values of prediction elements will be treated as truth if they are within aforementioned area
Figure 2. AL extends the truth (black line) to an area encompassed by two red dotted lines which separately refer to the
upper and lower bound.
we propose Temporal ELAN, a simple and efficient feature extractor and use it as the backbone in the first stage. It is a CNN based network and has at least two merits compared with canonical Temporal Convolution Network (TCN):
- It abandons dilated convolution and utilizes only normal convolutions and pooling to extract universal feature maps of input sequence.
- It is derived from ELAN, a 2D CNN which has been proved to be powerful in Computer Vision (CV) tasks. Built upon ELAN, it not only acquires its merits, but also is capable of extracting hierarchical multi-scale feature maps of input sequence.
Figure 3. The architecture of Temporal ELAN. Each CPblock contains two convolutional layers and a maxpooling layer to shrink sequence length.
- Python 3.8
- matplotlib == 3.3.4
- numpy == 1.20.1
- pandas == 1.2.4
- scikit_learn == 0.24.1
- scipy == 1.9.0
- torch == 1.11.0
Dependencies can be installed using the following command:
pip install -r requirements.txt
ETT, ECL, Traffic, Exchange and weather datasets were acquired at: datasets;
After you acquire raw data of all datasets, please separately place them in corresponding folders at ./data
.
We place ETT in the folder ./ETT-data
, ECL in the folder ./electricity
, Exchange in the folder ./exchage_rate
, Traffic in the folder ./traffic
and weather in the folder ./weather
of here (the folder tree in the link is shown as below) into folder ./data
and rename them from ./ETT-data
, ./electricity
, ./exchange_rate
, ./traffic
and ./weather
to ./ETT
, ./ECL
, ./Exchange
, ./Traffic
and./weather
respectively. We rename the file of ECL/ Exchange from electricity.csv
/ exchange_rate.csv
/traffic.csv
to ECL.csv
/ Exchange.csv
/ Traffic.csv
and rename the last variable of ECL/Exchange/Traffic from OT
to original MT_321
/ Singapore
/ Sensor_861
.
|-Autoformer
| |-ETT-data
| | |-ETTh1.csv
| | |-ETTh2.csv
| | |-ETTm1.csv
| | |-ETTm2.csv
| |
| |-electricity
| | |-electricity.csv
| |
| |-exchange_rate
| | |-exchange_rate.csv
| |
| |-traffic
| | |-traffic.csv
| |
| |-weather
| | |-weather.csv
Then you can get the folder tree shown as below:
|-data
| |-ECL
| | |-ECL.csv
| |
| |-ETT
| | |-ETTh1.csv
| | |-ETTh2.csv
| | |-ETTm1.csv
| | |-ETTm2.csv
| |
| |-Exchange
| | |-Exchange.csv
| |
| |-Traffic
| | |-Traffic.csv
| |
| |-weather
| | |-weather.csv
Commands for training and testing Temporal ELAN-A2A of all datasets are in ./scripts/A2A.sh
.
More parameter information please refer to main.py
.
We provide a complete command for training and testing Temporal ELAN-A2A:
python -u main.py --model <model> --data <data> --root_path <root_path> --features <features> --label_len <label_len> --pred_list <pred_list> --d_model <d_model> --repr_dim <repr_dim> --kernel <kernel> --attn_nums <attn_nums> --pyramid <pyramid> --criterion <criterion> --dropout <dropout> --batch_size <batch_size> --cost_batch_size <cost_batch_size> --order_num <order_num> --aug_num <aug_num> --jitter <jitter> --activation <activation> --cost_epochs <cost_epochs> --cost_grow_epochs <cost_grow_epochs> --train_epochs <train_epochs> --itr <itr>
Here we provide a more detailed and complete command description for training and testing the model:
Parameter name | Description of parameter |
---|---|
model | The model of experiment. This can be set to A2A |
data | The dataset name |
root_path | The root path of the data file |
data_path | The data file name |
features | The forecasting task. This can be set to M ,S (M : multivariate forecasting, S : univariate forecasting |
target | Target feature in S task |
checkpoints | Location of model checkpoints |
label_len | Input sequence length |
pred_list | Group of prediction sequence length |
enc_in | Input size |
c_out | Output size |
d_model | Dimension of model |
aug_num | The number of sequences with data augmentation, including initial sequences |
order_num | The number of shuffled sequences |
jitter | Supremum data augmentation amplification |
aug | Augmentation method used in Area Loss. This can be set to Scaling ,Jittering |
instance_loss | Including instance loss during the first stage |
dropout | Dropout |
kernel | The kernel size |
criterion | Standardization |
itr | Experiments times |
activation | Activation function. This can be set to Gelu ,Tanh |
cost_epochs | Train epochs of the first stage |
cost_grow_epochs | Train growing epochs of the first stage per experiment |
train_epochs | Train epochs of the second stage |
cost_batch_size | The batch size of training input data in the first stage |
batch_size | The batch size of training input data in the second stage |
patience | Early stopping patience |
learning_rate | Optimizer learning rate |
pyramid | The number of Pyramid networks |
attn_nums | The number of streams in the main Pyramid network |
repr_dim | Dimension of representation |
loss | Loss function |
The experiment parameters of each data set are formated in the A2A.sh
files in the directory ./scripts/
. You can refer to these parameters for experiments, and you can also adjust the parameters to obtain better mse and mae results or draw better prediction figures.
Figure 4. Quantitative results of multivariate forecasting with {ETTh1, ETTm2, ECL}
Figure 5. Quantitative results of multivariate forecasting with {Traffic, Exchange, Weather}
Figure 6. Quantitative results of univariate forecasting under ETTh1 and Weather
Figure 7. Univariate forecasting results
If you have any questions, feel free to contact Li Shen through Email ([email protected]) or Github issues. Pull requests are highly welcomed!