Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
yzhao062 authored and yzhao062 committed Apr 4, 2022
2 parents a9d0ed1 + da91abb commit adf0341
Show file tree
Hide file tree
Showing 20 changed files with 596 additions and 25 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/testing-cron.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest]
python-version: [3.6, 3.9]
python-version: [3.6, 3.7, 3.8, 3.9]

steps:
- uses: actions/checkout@v2
Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ on:
- master
- development
pull_request:
branches: [ master ]
branches:
- master
- development

jobs:
build:
Expand All @@ -19,7 +21,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest]
python-version: [3.6, 3.9]
python-version: [3.6, 3.7, 3.8, 3.9]

steps:
- uses: actions/checkout@v2
Expand Down
4 changes: 4 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,10 @@ v<0.9.6>, <12/24/2021> -- Model persistence doc improvement.
v<0.9.7>, <01/03/2022> -- Add ECOD.
v<0.9.8>, <02/23/2022> -- Add Feature Importance for iForest.
v<0.9.8>, <03/05/2022> -- Update ECOD (TKDE 2022).
v<0.9.9>, <03/20/2022> -- Renovate documentation.
v<0.9.9>, <03/23/2022> -- Add example for COPOD interpretability.
v<0.9.9>, <03/23/2022> -- Add outlier detection by Cook’s distances.
v<0.9.9>, <04/04/2022> -- Various community fix.



Expand Down
3 changes: 3 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,7 @@ Probabilistic MAD Median Absolute Deviation (MAD)
Probabilistic SOS Stochastic Outlier Selection 2012 [#Janssens2012Stochastic]_
Linear Model PCA Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 2003 [#Shyu2003A]_
Linear Model MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) 1999 [#Hardin2004Outlier]_ [#Rousseeuw1999A]_
Linear Model CD Use Cook's distance for outlier detection 1977 [#Cook1977Detection]_
Linear Model OCSVM One-Class Support Vector Machines 2001 [#Scholkopf2001Estimating]_
Linear Model LMDD Deviation-based Outlier Detection (LMDD) 1996 [#Arning1996A]_
Proximity-Based LOF Local Outlier Factor 2000 [#Breunig2000LOF]_
Expand Down Expand Up @@ -548,6 +549,8 @@ Reference
.. [#Burgess2018Understanding] Burgess, Christopher P., et al. "Understanding disentangling in beta-VAE." arXiv preprint arXiv:1804.03599 (2018).
.. [#Cook1977Detection] Cook, R.D., 1977. Detection of influential observation in linear regression. Technometrics, 19(1), pp.15-18.
.. [#Goldstein2012Histogram] Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In *KI-2012: Poster and Demo Track*\ , pp.59-63.
.. [#Gopalan2019PIDForest] Gopalan, P., Sharan, V. and Wieder, U., 2019. PIDForest: Anomaly Detection via Partial Identification. In Advances in Neural Information Processing Systems, pp. 15783-15793.
Expand Down
3 changes: 3 additions & 0 deletions TODO.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
1. ECOD parallelization and interpretability
2. Add latest deep learning algorithms.
3. finish the wrapping for cook distance detector
12 changes: 6 additions & 6 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
# -- Project information -----------------------------------------------------

project = 'pyod'
copyright = '2021, Yue Zhao'
copyright = '2022, Yue Zhao'
author = 'Yue Zhao'

# The short X.Y version
Expand All @@ -50,8 +50,8 @@
'sphinx.ext.imgmath',
'sphinx.ext.viewcode',
'sphinxcontrib.bibtex',
'sphinx.ext.napoleon',
'sphinx_rtd_theme',
# 'sphinx.ext.napoleon',
# 'sphinx_rtd_theme',
]

bibtex_bibfiles = ['zreferences.bib']
Expand Down Expand Up @@ -90,7 +90,7 @@
#
# html_theme = 'default'

html_theme = "sphinx_rtd_theme"
html_theme = "furo"

# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
Expand All @@ -112,8 +112,8 @@
# 'searchbox.html']``.
#
# html_sidebars = {}
html_sidebars = {'**': ['globaltoc.html', 'relations.html', 'sourcelink.html',
'searchbox.html']}
# html_sidebars = {'**': ['globaltoc.html', 'relations.html', 'sourcelink.html',
# 'searchbox.html']}

# -- Options for HTMLHelp output ---------------------------------------------

Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ Probabilistic MAD Median Absolute Deviation (MAD)
Probabilistic SOS Stochastic Outlier Selection 2012 :class:`pyod.models.sos.SOS` :cite:`a-janssens2012stochastic`
Linear Model PCA Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 2003 :class:`pyod.models.pca.PCA` :cite:`a-shyu2003novel`
Linear Model MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) 1999 :class:`pyod.models.mcd.MCD` :cite:`a-rousseeuw1999fast,a-hardin2004outlier`
Linear Model CD Use Cook's distance for outlier detection 1977 :class:`pyod.models.cd.CD` :cite:`a-cook1977detection`
Linear Model OCSVM One-Class Support Vector Machines 2001 :class:`pyod.models.ocsvm.OCSVM` :cite:`a-scholkopf2001estimating`
Linear Model LMDD Deviation-based Outlier Detection (LMDD) 1996 :class:`pyod.models.lmdd.LMDD` :cite:`a-arning1996linear`
Proximity-Based LOF Local Outlier Factor 2000 :class:`pyod.models.lof.LOF` :cite:`a-breunig2000lof`
Expand Down
10 changes: 10 additions & 0 deletions docs/pyod.models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,16 @@ pyod.models.combination module
:show-inheritance:
:inherited-members:

pyod.models.cd module
---------------------

.. automodule:: pyod.models.cd
:members:
:exclude-members:
:undoc-members:
:show-inheritance:
:inherited-members:

pyod.models.copod module
------------------------

Expand Down
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
combo
furo
joblib
keras
matplotlib
Expand Down
15 changes: 13 additions & 2 deletions docs/zreferences.bib
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@ @article{pevny2016loda


@article{burgess2018understanding,
title={Understanding disentangling in beta-VAE},
title={Understanding disentangling in betVAE},
author={Burgess, Christopher P and Higgins, Irina and Pal, Arka and Matthey, Loic and Watters, Nick and Desjardins, Guillaume and Lerchner, Alexander},
journal={arXiv preprint arXiv:1804.03599},
year={2018}
Expand Down Expand Up @@ -379,10 +379,21 @@ @inproceedings{perini2020quantifying
publisher={Springer}
}

@article{Li2021ecod,
@article{li2021ecod,
title={ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions},
author={Li, Zheng and Zhao, Yue and Hu, Xiyang and Botta, Nicola and Ionescu, Cezar and Chen, H. George},
journal={IEEE Transactions on Knowledge and Data Engineering},
year={2022},
publisher={IEEE}
}

@article{cook1977detection,
title={Detection of influential observation in linear regression},
author={Cook, R Dennis},
journal={Technometrics},
volume={19},
number={1},
pages={15--18},
year={1977},
publisher={Taylor \& Francis}
}
58 changes: 58 additions & 0 deletions examples/cd_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
"""Example of using Cook's distance (CD) for
outlier detection
"""
# Author: D Kulik
# License: BSD 2 clause

from __future__ import division
from __future__ import print_function

import os
import sys

# temporary solution for relative imports in case pyod is not installed
# if pyod is installed, no need to use the following line
sys.path.append(
os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))

import numpy as np
from pyod.models.cd import CD
from pyod.utils.data import generate_data
from pyod.utils.data import evaluate_print
from pyod.utils.example import visualize

if __name__ == "__main__":
contamination = 0.1 # percentage of outliers
n_train = 200 # number of training points
n_test = 100 # number of testing points

# Generate sample data
X_train, y_train, X_test, y_test = \
generate_data(n_train=n_train,
n_test=n_test,
n_features=2,
contamination=contamination,
random_state=42)

# train HBOS detector
clf_name = 'CD'
clf = CD()
clf.fit(X_train, y_train)

# get the prediction labels and outlier scores of the training data
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores

# get the prediction on the test data
y_test_pred = clf.predict(np.append(X_test, y_test.reshape(-1,1), axis=1)) # outlier labels (0 or 1)
y_test_scores = clf.decision_function(np.append(X_test, y_test.reshape(-1,1), axis=1)) # outlier scores

# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, y_train, y_train_scores)
print("\nOn Test Data:")
evaluate_print(clf_name, y_test, y_test_scores)

# visualize the results
visualize(clf_name, X_train, y_train, X_test, y_test, y_train_pred,
y_test_pred, show_figure=True, save_figure=False)
54 changes: 54 additions & 0 deletions examples/copod_interpretability.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,56 @@
# -*- coding: utf-8 -*-
"""Example of using Copula Based Outlier Detector (COPOD) for outlier detection
Sample wise interpretation is provided here.
"""
# Author: Winston Li <[email protected]>
# License: BSD 2 clause

from __future__ import division
from __future__ import print_function

import os
import sys

# temporary solution for relative imports in case pyod is not installed
# if pyod is installed, no need to use the following line
sys.path.append(
os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))

from scipy.io import loadmat
from sklearn.model_selection import train_test_split

from pyod.models.copod import COPOD
from pyod.utils.utility import standardizer

if __name__ == "__main__":
# Define data file and read X and y
# Generate some data if the source data is missing
mat_file = 'cardio.mat'

mat = loadmat(os.path.join('data', mat_file))
X = mat['X']
y = mat['y'].ravel()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,
random_state=1)

# standardizing data for processing
X_train_norm, X_test_norm = standardizer(X_train, X_test)

# train COPOD detector
clf_name = 'COPOD'
clf = COPOD()

# you could try parallel version as well.
# clf = COPOD(n_jobs=2)
clf.fit(X_train)

# get the prediction labels and outlier scores of the training data
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores

print('The first sample is an outlier', y_train[0])
clf.explain_outlier(0)

# we could see feature 7, 16, and 20 is above the 0.99 cutoff
# and play a more important role in deciding it is an outlier.
Loading

0 comments on commit adf0341

Please sign in to comment.