@@ -1032,19 +1032,24 @@ from sklearn.preprocessing import StandardScaler
1032
1032
## 七、异常检测 Anomaly Detection
1033
1033
- [ 全部代码] ( /AnomalyDetection/AnomalyDetection.py )
1034
1034
1035
- ### 1、异常检测流程
1036
- - ![ $$ {x^{(i)}} $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%7Bx%5E%7B%28i%29%7D%7D%24%24 ) 对应feature
1037
- - 建立模型model:` P(x) `
1038
- - 检查若是:` P(x)<ε ` ,则认为是异常,其中` ε ` 为我们要求的概率的临界值` threshold `
1039
-
1040
- ### 2、高斯分布(正态分布)` Gaussian distribution `
1035
+ ### 1、高斯分布(正态分布)` Gaussian distribution `
1041
1036
- 分布函数:![ $$ p(x) = {1 \over {\sqrt {2\pi } \sigma }}{e^{ - {{{{(x - u)}^2}} \over {2{\sigma ^2}}}}} $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24p%28x%29%20%3D%20%7B1%20%5Cover%20%7B%5Csqrt%20%7B2%5Cpi%20%7D%20%5Csigma%20%7D%7D%7Be%5E%7B%20-%20%7B%7B%7B%7B%28x%20-%20u%29%7D%5E2%7D%7D%20%5Cover%20%7B2%7B%5Csigma%20%5E2%7D%7D%7D%7D%7D%24%24 )
1042
1037
- 其中,` u ` 为数据的** 均值** ,` σ ` 为数据的** 标准差**
1043
1038
- ` σ ` 越** 小** ,对应的图像越** 尖**
1044
1039
- 参数估计(` parameter estimation ` )
1045
1040
- ![ $$ u = {1 \over m}\sum\limits_{i = 1}^m {{x^{(i)}}} $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24u%20%3D%20%7B1%20%5Cover%20m%7D%5Csum%5Climits_%7Bi%20%3D%201%7D%5Em%20%7B%7Bx%5E%7B%28i%29%7D%7D%7D%20%24%24 )
1046
1041
- ![ $$ {\sigma ^2} = {1 \over m}\sum\limits_{i = 1}^m {{{({x^{(i)}} - u)}^2}} $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%7B%5Csigma%20%5E2%7D%20%3D%20%7B1%20%5Cover%20m%7D%5Csum%5Climits_%7Bi%20%3D%201%7D%5Em%20%7B%7B%7B%28%7Bx%5E%7B%28i%29%7D%7D%20-%20u%29%7D%5E2%7D%7D%20%24%24 )
1047
1042
1043
+ ### 2、异常检测算法
1044
+ - 例子
1045
+ - 训练集:![ $$ \{ {x^{(1)}},{x^{(2)}}, \cdots {x^{(m)}}\} $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%5C%7B%20%7Bx%5E%7B%281%29%7D%7D%2C%7Bx%5E%7B%282%29%7D%7D%2C%20%5Ccdots%20%7Bx%5E%7B%28m%29%7D%7D%5C%7D%20%24%24 ) ,其中![ $$ x \in {R^n} $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24x%20%5Cin%20%7BR%5En%7D%24%24 )
1046
+ - 假设![ $$ {x_1},{x_2} \cdots {x_n} $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%7Bx_1%7D%2C%7Bx_2%7D%20%5Ccdots%20%7Bx_n%7D%24%24 ) 相互独立,建立model模型:![ $$ p(x) = p({x_1};{u_1},\sigma _1^2)p({x_2};{u_2},\sigma _2^2) \cdots p({x_n};{u_n},\sigma _n^2) = \prod\limits_{j = 1}^n {p({x_j};{u_j},\sigma _j^2)} $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24p%28x%29%20%3D%20p%28%7Bx_1%7D%3B%7Bu_1%7D%2C%5Csigma%20_1%5E2%29p%28%7Bx_2%7D%3B%7Bu_2%7D%2C%5Csigma%20_2%5E2%29%20%5Ccdots%20p%28%7Bx_n%7D%3B%7Bu_n%7D%2C%5Csigma%20_n%5E2%29%20%3D%20%5Cprod%5Climits_%7Bj%20%3D%201%7D%5En%20%7Bp%28%7Bx_j%7D%3B%7Bu_j%7D%2C%5Csigma%20_j%5E2%29%7D%20%24%24 )
1047
+ - 过程
1048
+ - 选择具有代表异常的` feature ` : xi
1049
+ - 参数估计:![ $$ {u_1},{u_2}, \cdots ,{u_n};\sigma _1^2,\sigma _2^2 \cdots ,\sigma _n^2 $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%7Bu_1%7D%2C%7Bu_2%7D%2C%20%5Ccdots%20%2C%7Bu_n%7D%3B%5Csigma%20_1%5E2%2C%5Csigma%20_2%5E2%20%5Ccdots%20%2C%5Csigma%20_n%5E2%24%24 )
1050
+ - 计算` p(x) ` ,若是` P(x)<ε ` 则认为异常,其中` ε ` 为我们要求的概率的临界值` threshold `
1051
+
1052
+
1048
1053
1049
1054
[ 1 ] : ./images/LinearRegression_01.png " LinearRegression_01.png "
1050
1055
[ 2 ] : ./images/LogisticRegression_01.png " LogisticRegression_01.png "
0 commit comments