Skip to content

Commit 1ed157a

Browse files
committed
📝 add anomaly detection readme
1 parent 6532580 commit 1ed157a

8 files changed

+69
-1
lines changed
File renamed without changes.

images/AnomalyDetection_02.png

18.2 KB
Loading

images/AnomalyDetection_03.png

18.4 KB
Loading

images/AnomalyDetection_04.png

9.39 KB
Loading

images/AnomalyDetection_05.png

4.32 KB
Loading

images/AnomalyDetection_06.png

4.42 KB
Loading

images/AnomalyDetection_07.png

6.26 KB
Loading

readme.md

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1049,6 +1049,18 @@ from sklearn.preprocessing import StandardScaler
10491049
- 参数估计:![$${u_1},{u_2}, \cdots ,{u_n};\sigma _1^2,\sigma _2^2 \cdots ,\sigma _n^2$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%7Bu_1%7D%2C%7Bu_2%7D%2C%20%5Ccdots%20%2C%7Bu_n%7D%3B%5Csigma%20_1%5E2%2C%5Csigma%20_2%5E2%20%5Ccdots%20%2C%5Csigma%20_n%5E2%24%24)
10501050
- 计算`p(x)`,若是`P(x)<ε`则认为异常,其中`ε`为我们要求的概率的临界值`threshold`
10511051
- 这里只是**单元高斯分布**,假设了`feature`之间是独立的,下面会讲到**多元高斯分布**,会自动捕捉到`feature`之间的关系
1052+
- **参数估计**实现代码
1053+
```
1054+
# 参数估计函数(就是求均值和方差)
1055+
def estimateGaussian(X):
1056+
m,n = X.shape
1057+
mu = np.zeros((n,1))
1058+
sigma2 = np.zeros((n,1))
1059+
1060+
mu = np.mean(X, axis=0) # axis=0表示列,每列的均值
1061+
sigma2 = np.var(X,axis=0) # 求每列的方差
1062+
return mu,sigma2
1063+
```
10521064

10531065
### 3、评价`p(x)`的好坏,以及`ε`的选取
10541066
-**偏斜数据**的错误度量
@@ -1064,6 +1076,56 @@ from sklearn.preprocessing import StandardScaler
10641076

10651077
- `ε`的选取
10661078
- 尝试多个`ε`值,使`F1Score`的值高
1079+
- 实现代码
1080+
```
1081+
# 选择最优的epsilon,即:使F1Score最大
1082+
def selectThreshold(yval,pval):
1083+
'''初始化所需变量'''
1084+
bestEpsilon = 0.
1085+
bestF1 = 0.
1086+
F1 = 0.
1087+
step = (np.max(pval)-np.min(pval))/1000
1088+
'''计算'''
1089+
for epsilon in np.arange(np.min(pval),np.max(pval),step):
1090+
cvPrecision = pval<epsilon
1091+
tp = np.sum((cvPrecision == 1) & (yval == 1)).astype(float) # sum求和是int型的,需要转为float
1092+
fp = np.sum((cvPrecision == 1) & (yval == 0)).astype(float)
1093+
fn = np.sum((cvPrecision == 1) & (yval == 0)).astype(float)
1094+
precision = tp/(tp+fp) # 精准度
1095+
recision = tp/(tp+fn) # 召回率
1096+
F1 = (2*precision*recision)/(precision+recision) # F1Score计算公式
1097+
if F1 > bestF1: # 修改最优的F1 Score
1098+
bestF1 = F1
1099+
bestEpsilon = epsilon
1100+
return bestEpsilon,bestF1
1101+
```
1102+
1103+
### 4、选择使用什么样的feature(单元高斯分布)
1104+
- 如果一些数据不是满足高斯分布的,可以变化一下数据,例如`log(x+C),x^(1/2)`
1105+
- 如果`p(x)`的值无论异常与否都很大,可以尝试组合多个`feature`,(因为feature之间可能是有关系的)
1106+
1107+
### 5、多元高斯分布
1108+
- 单元高斯分布存在的问题
1109+
- 如下图,红色的点为异常点,其他的都是正常点(比如CPU和memory的变化)
1110+
![enter description here][50]
1111+
- x1对应的高斯分布如下:
1112+
![enter description here][51]
1113+
- x2对应的高斯分布如下:
1114+
![enter description here][52]
1115+
- 可以看出对应的p(x1)和p(x2)的值变化并不大,就不会认为异常
1116+
- 因为我们认为feature之间是相互独立的,所以如上图是以**正圆**的方式扩展
1117+
- 多元高斯分布
1118+
- ![$$x \in {R^n}$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24x%20%5Cin%20%7BR%5En%7D%24%24),并不是建立`p(x1),p(x2)...p(xn)`,而是统一建立`p(x)`
1119+
- 其中参数:![$$\mu \in {R^n},\Sigma \in {R^{n \times {\rm{n}}}}$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%5Cmu%20%5Cin%20%7BR%5En%7D%2C%5CSigma%20%5Cin%20%7BR%5E%7Bn%20%5Ctimes%20%7B%5Crm%7Bn%7D%7D%7D%7D%24%24),`Σ`**协方差矩阵**
1120+
- ![$$p(x) = {1 \over {{{(2\pi )}^{{n \over 2}}}|\Sigma {|^{{1 \over 2}}}}}{e^{ - {1 \over 2}{{(x - u)}^T}{\Sigma ^{ - 1}}(x - u)}}$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24p%28x%29%20%3D%20%7B1%20%5Cover%20%7B%7B%7B%282%5Cpi%20%29%7D%5E%7B%7Bn%20%5Cover%202%7D%7D%7D%7C%5CSigma%20%7B%7C%5E%7B%7B1%20%5Cover%202%7D%7D%7D%7D%7D%7Be%5E%7B%20-%20%7B1%20%5Cover%202%7D%7B%7B%28x%20-%20u%29%7D%5ET%7D%7B%5CSigma%20%5E%7B%20-%201%7D%7D%28x%20-%20u%29%7D%7D%24%24)
1121+
- 同样,`|Σ|`越小,`p(x)`越尖
1122+
- 例如:
1123+
![enter description here][53]
1124+
表示x1,x2**正相关**,即x1越大,x2也就越大
1125+
![enter description here][54]
1126+
若:
1127+
![enter description here][55]
1128+
表示x1,x2**负相关**
10671129

10681130

10691131
[1]: ./images/LinearRegression_01.png "LinearRegression_01.png"
@@ -1114,4 +1176,10 @@ from sklearn.preprocessing import StandardScaler
11141176
[46]: ./images/PCA_06.png "PCA_06.png"
11151177
[47]: ./images/PCA_07.png "PCA_07.png"
11161178
[48]: ./images/PCA_08.png "PCA_08.png"
1117-
[49]: ./images/AnomalyDetection.png "AnomalyDetection.png"
1179+
[49]: ./images/AnomalyDetection_01.png "AnomalyDetection_01.png"
1180+
[50]: ./images/AnomalyDetection_04.png "AnomalyDetection_04.png"
1181+
[51]: ./images/AnomalyDetection_02.png "AnomalyDetection_02.png"
1182+
[52]: ./images/AnomalyDetection_03.png "AnomalyDetection_03.png"
1183+
[53]: ./images/AnomalyDetection_05.png "AnomalyDetection_05.png"
1184+
[54]: ./images/AnomalyDetection_07.png "AnomalyDetection_07.png"
1185+
[55]: ./images/AnomalyDetection_06.png "AnomalyDetection_06.png"

0 commit comments

Comments
 (0)