Our implementation of MTAD-GAT: Multivariate Time-series Anomaly Detection (MTAD) via Graph Attention Networks (GAT) by Zhao et al. (2020).
- This repo includes a complete framework for multivariate anomaly detection, using a model that is heavily inspired by MTAD-GAT.
- Our work does not serve to reproduce the original results in the paper.
- For contact, feel free to use [email protected] or [email protected]
- By default we use the recently proposed GATv2, but include the option to use the standard GAT
- Instead of using a Variational Auto-Encoder (VAE) as the Reconstruction Model, we use a GRU-based decoder.
- We provide implementations of the following thresholding methods, but their parameters should be customized to different datasets:
- peaks-over-threshold (POT) as in the MTAD-GAT paper
- thresholding method proposed by Hundman et. al.
- brute-force method that searches through "all" possible thresholds and picks the one that gives highest F1 score
- All methods are applied, and their respective results are outputted together for comparison.
- Parts of our code should be credited to the following:
- OmniAnomaly for preprocessing and evaluation methods and an implementation of POT
- TelemAnom for plotting methods and thresholding method
- pyGAT by Diego Antognini for inspiration on GAT-related methods
- Their respective licences are included in
licences
.
To clone the repo:
git clone https://github.com/ML4ITS/mtad-gat-pytorch.git && cd mtad-gat-pytorch
Get data:
cd datasets && wget https://s3-us-west-2.amazonaws.com/telemanom/data.zip && unzip data.zip && rm data.zip &&
cd data && wget https://raw.githubusercontent.com/khundman/telemanom/master/labeled_anomalies.csv && cd .. && cd ..
This downloads the MSL and SMAP datasets. The SMD dataset is already in repo. We refer to TelemAnom and OmniAnomaly for detailed information regarding these three datasets.
Install dependencies (virtualenv is recommended):
pip install -r requirements.txt
Preprocess the data:
python preprocess.py --dataset <dataset>
where <dataset> is one of MSL, SMAP or SMD.
To train:
python train.py --dataset <dataset>
where <dataset> is one of msl, smap or smd (upper-case also works). If training on SMD, one should specify which machine using the --group
argument.
You can change the default configuration by adding more arguments. All arguments can be found in args.py
. Some examples:
- Training machine-1-1 of SMD for 10 epochs, using a lookback (window size) of 150:
python train.py --dataset smd --group 1-1 --lookback 150 --epochs 10
- Training MSL for 10 epochs, using standard GAT instead of GATv2 (which is the default), and a validation split of 0.2:
python train.py --dataset msl --epochs 10 --use_gatv2 False --val_split 0.2
Output are saved in output/<dataset>/<ID>
(where the current datetime is used as ID) and include:
summary.txt
: performance on test set (precision, recall, F1, etc.)config.txt
: the configuration used for model, training, etc.train/test.pkl
: saved forecasts, reconstructions, actual, thresholds, etc.train/test_scores.npy
: anomaly scorestrain/validation_losses.png
: plots of train and validation loss during trainingmodel.pt
model parameters of trained model
This repo includes example outputs for MSL, SMAP and SMD machine 1-1.
result_visualizer.ipynb
provides a jupyter notebook for visualizing results.
To launch notebook:
jupyter notebook result_visualizer.ipynb
Predicted anomalies are visualized using a blue rectangle.
Actual (true) anomalies are visualized using a red rectangle.
Thus, correctly predicted anomalies are visualized by a purple (blue + red) rectangle.
Some examples:
SMD test set (feature 0) | SMD train set (feature 0) |
---|---|
Example from MSL test set (note that one anomaly segment is not detected):
Feature-Oriented GAT layer | Time-Oriented GAT layer |
---|---|
Left: The feature-oriented GAT layer views the input data as a complete graph where each node represents the values of one feature across all timestamps in the sliding window.
Right: The time-oriented GAT layer views the input data as a complete graph in which each node represents the values for all features at a specific timestamp.
Recently, Brody et al. (2021) proposed GATv2, a modified version of the standard GAT.
They argue that the original GAT can only compute a restricted kind of attention (which they refer to as static) where the ranking of attended nodes is unconditioned on the query node. That is, the ranking of attention weights is global for all nodes in the graph, a property which the authors claim to severely hinders the expressiveness of the GAT. In order to address this, they introduce a simple fix by modifying the order of operations, and propose GATv2, a dynamic attention variant that is strictly more expressive that GAT. We refer to the paper for further reading. The difference between GAT and GATv2 is depicted below: