Skip to content

Commit 6532580

Browse files
committed
📝 add anomaly detection readme
1 parent d6e6f30 commit 6532580

File tree

2 files changed

+13
-5
lines changed

2 files changed

+13
-5
lines changed

images/AnomalyDetection.png

7.32 KB
Loading

readme.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1051,14 +1051,21 @@ from sklearn.preprocessing import StandardScaler
10511051
- 这里只是**单元高斯分布**,假设了`feature`之间是独立的,下面会讲到**多元高斯分布**,会自动捕捉到`feature`之间的关系
10521052

10531053
### 3、评价`p(x)`的好坏,以及`ε`的选取
1054-
- 因为数据可能是非常**偏斜**的(就是`y=1`的个数非常少,(`y=1`表示异常)),所以可以使用`Precision/Recall`,计算`F1Score`(在**CV交叉验证集**上),公式:
1055-
![$${F_1}Score = 2{{PR} \over {P + R}}$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%7BF_1%7DScore%20%3D%202%7B%7BPR%7D%20%5Cover%20%7BP%20+%20R%7D%7D%24%24)
1054+
-**偏斜数据**的错误度量
1055+
- 因为数据可能是非常**偏斜**的(就是`y=1`的个数非常少,(`y=1`表示异常)),所以可以使用`Precision/Recall`,计算`F1Score`(在**CV交叉验证集**上)
1056+
- 例如:预测癌症,假设模型可以得到`99%`能够预测正确,`1%`的错误率,但是实际癌症的概率很小,只有`0.5%`,那么我们始终预测没有癌症y=0反而可以得到更小的错误率。使用`error rate`来评估就不科学了。
1057+
- 如下图记录:
1058+
![enter description here][49]
1059+
- ![$$\Pr ecision = {{TP} \over {TP + FP}}$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%5CPr%20ecision%20%3D%20%7B%7BTP%7D%20%5Cover%20%7BTP%20+%20FP%7D%7D%24%24) ,即:**正确预测正样本/所有预测正样本**
1060+
- ![$${\mathop{\rm Re}\nolimits} {\rm{call}} = {{TP} \over {TP + FN}}$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%7B%5Cmathop%7B%5Crm%20Re%7D%5Cnolimits%7D%20%7B%5Crm%7Bcall%7D%7D%20%3D%20%7B%7BTP%7D%20%5Cover%20%7BTP%20+%20FN%7D%7D%24%24) ,即:**正确预测正样本/真实值为正样本**
1061+
- 总是让`y=1`(较少的类),计算`Precision``Recall`
1062+
- ![$${F_1}Score = 2{{PR} \over {P + R}}$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%7BF_1%7DScore%20%3D%202%7B%7BPR%7D%20%5Cover%20%7BP%20+%20R%7D%7D%24%24)
1063+
- 还是以癌症预测为例,假设预测都是no-cancer,TN=199,FN=1,TP=0,FP=0,所以:Precision=0/0,Recall=0/1=0,尽管accuracy=199/200=99.5%,但是不可信。
1064+
10561065
- `ε`的选取
10571066
- 尝试多个`ε`值,使`F1Score`的值高
10581067

10591068

1060-
1061-
10621069
[1]: ./images/LinearRegression_01.png "LinearRegression_01.png"
10631070
[2]: ./images/LogisticRegression_01.png "LogisticRegression_01.png"
10641071
[3]: ./images/LogisticRegression_02.png "LogisticRegression_02.png"
@@ -1106,4 +1113,5 @@ from sklearn.preprocessing import StandardScaler
11061113
[45]: ./images/PCA_05.png "PCA_05.png"
11071114
[46]: ./images/PCA_06.png "PCA_06.png"
11081115
[47]: ./images/PCA_07.png "PCA_07.png"
1109-
[48]: ./images/PCA_08.png "PCA_08.png"
1116+
[48]: ./images/PCA_08.png "PCA_08.png"
1117+
[49]: ./images/AnomalyDetection.png "AnomalyDetection.png"

0 commit comments

Comments
 (0)