marrygit
diff --git a/‎images/AnomalyDetection.png renamed to ‎images/AnomalyDetection_01.png b/‎images/AnomalyDetection.png renamed to ‎images/AnomalyDetection_01.png
diff --git a/‎images/AnomalyDetection_02.png
18.2 KB b/‎images/AnomalyDetection_02.png
18.2 KB
diff --git a/‎images/AnomalyDetection_03.png
18.4 KB b/‎images/AnomalyDetection_03.png
18.4 KB
diff --git a/‎images/AnomalyDetection_04.png
9.39 KB b/‎images/AnomalyDetection_04.png
9.39 KB
diff --git a/‎images/AnomalyDetection_05.png
4.32 KB b/‎images/AnomalyDetection_05.png
4.32 KB
diff --git a/‎images/AnomalyDetection_06.png
4.42 KB b/‎images/AnomalyDetection_06.png
4.42 KB
diff --git a/‎images/AnomalyDetection_07.png
6.26 KB b/‎images/AnomalyDetection_07.png
6.26 KB
diff --git a/‎readme.md
Lines changed: 69 additions & 1 deletion b/‎readme.md
Lines changed: 69 additions & 1 deletion
@@ -1049,6 +1049,18 @@ from sklearn.preprocessing import StandardScaler
  - 参数估计：![$${u_1},{u_2}, \cdots ,{u_n};\sigma _1^2,\sigma _2^2 \cdots ,\sigma _n^2$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%7Bu_1%7D%2C%7Bu_2%7D%2C%20%5Ccdots%20%2C%7Bu_n%7D%3B%5Csigma%20_1%5E2%2C%5Csigma%20_2%5E2%20%5Ccdots%20%2C%5Csigma%20_n%5E2%24%24)
  - 计算`p(x)`,若是`P(x)<ε`则认为异常，其中`ε`为我们要求的概率的临界值`threshold`
 - 这里只是**单元高斯分布**，假设了`feature`之间是独立的，下面会讲到**多元高斯分布**，会自动捕捉到`feature`之间的关系
+- **参数估计**实现代码
+```
+# 参数估计函数（就是求均值和方差）
+def estimateGaussian(X):
+    m,n = X.shape
+    mu = np.zeros((n,1))
+    sigma2 = np.zeros((n,1))
+    
+    mu = np.mean(X, axis=0) # axis=0表示列，每列的均值
+    sigma2 = np.var(X,axis=0) # 求每列的方差
+    return mu,sigma2
+```
 
 ### 3、评价`p(x)`的好坏，以及`ε`的选取
 - 对**偏斜数据**的错误度量
@@ -1064,6 +1076,56 @@ from sklearn.preprocessing import StandardScaler
 
 - `ε`的选取
  - 尝试多个`ε`值，使`F1Score`的值高
+- 实现代码
+```
+# 选择最优的epsilon，即：使F1Score最大    
+def selectThreshold(yval,pval):
+    '''初始化所需变量'''
+    bestEpsilon = 0.
+    bestF1 = 0.
+    F1 = 0.
+    step = (np.max(pval)-np.min(pval))/1000
+    '''计算'''
+    for epsilon in np.arange(np.min(pval),np.max(pval),step):
+        cvPrecision = pval<epsilon
+        tp = np.sum((cvPrecision == 1) & (yval == 1)).astype(float)  # sum求和是int型的，需要转为float
+        fp = np.sum((cvPrecision == 1) & (yval == 0)).astype(float)
+        fn = np.sum((cvPrecision == 1) & (yval == 0)).astype(float)
+        precision = tp/(tp+fp)  # 精准度
+        recision = tp/(tp+fn)   # 召回率
+        F1 = (2*precision*recision)/(precision+recision)  # F1Score计算公式
+        if F1 > bestF1:  # 修改最优的F1 Score
+            bestF1 = F1
+            bestEpsilon = epsilon
+    return bestEpsilon,bestF1
+```
+
+### 4、选择使用什么样的feature（单元高斯分布）
+- 如果一些数据不是满足高斯分布的，可以变化一下数据，例如`log(x+C),x^(1/2)`等
+- 如果`p(x)`的值无论异常与否都很大，可以尝试组合多个`feature`,(因为feature之间可能是有关系的)
+
+### 5、多元高斯分布
+- 单元高斯分布存在的问题
+ - 如下图，红色的点为异常点，其他的都是正常点（比如CPU和memory的变化）   
+ ![enter description here][50]
+ - x1对应的高斯分布如下：   
+ ![enter description here][51]
+ - x2对应的高斯分布如下：   
+ ![enter description here][52]
+ - 可以看出对应的p(x1)和p(x2)的值变化并不大，就不会认为异常
+ - 因为我们认为feature之间是相互独立的，所以如上图是以**正圆**的方式扩展
+- 多元高斯分布
+ - ![$$x \in {R^n}$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24x%20%5Cin%20%7BR%5En%7D%24%24)，并不是建立`p(x1),p(x2)...p(xn)`，而是统一建立`p(x)`
+ - 其中参数：![$$\mu  \in {R^n},\Sigma  \in {R^{n \times {\rm{n}}}}$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%5Cmu%20%5Cin%20%7BR%5En%7D%2C%5CSigma%20%5Cin%20%7BR%5E%7Bn%20%5Ctimes%20%7B%5Crm%7Bn%7D%7D%7D%7D%24%24),`Σ`为**协方差矩阵**
+ - ![$$p(x) = {1 \over {{{(2\pi )}^{{n \over 2}}}|\Sigma {|^{{1 \over 2}}}}}{e^{ - {1 \over 2}{{(x - u)}^T}{\Sigma ^{ - 1}}(x - u)}}$$](http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24p%28x%29%20%3D%20%7B1%20%5Cover%20%7B%7B%7B%282%5Cpi%20%29%7D%5E%7B%7Bn%20%5Cover%202%7D%7D%7D%7C%5CSigma%20%7B%7C%5E%7B%7B1%20%5Cover%202%7D%7D%7D%7D%7D%7Be%5E%7B%20-%20%7B1%20%5Cover%202%7D%7B%7B%28x%20-%20u%29%7D%5ET%7D%7B%5CSigma%20%5E%7B%20-%201%7D%7D%28x%20-%20u%29%7D%7D%24%24)
+ - 同样，`|Σ|`越小，`p(x)`越尖
+ - 例如：    
+ ![enter description here][53]，  
+ 表示x1,x2**正相关**，即x1越大，x2也就越大
+ ![enter description here][54]
+ 若：   
+  ![enter description here][55]，
+ 表示x1,x2**负相关**
 
 
   [1]: ./images/LinearRegression_01.png "LinearRegression_01.png"
@@ -1114,4 +1176,10 @@ from sklearn.preprocessing import StandardScaler
   [46]: ./images/PCA_06.png "PCA_06.png"
   [47]: ./images/PCA_07.png "PCA_07.png"
   [48]: ./images/PCA_08.png "PCA_08.png"
-  [49]: ./images/AnomalyDetection.png "AnomalyDetection.png"
+  [49]: ./images/AnomalyDetection_01.png "AnomalyDetection_01.png"
+  [50]: ./images/AnomalyDetection_04.png "AnomalyDetection_04.png"
+  [51]: ./images/AnomalyDetection_02.png "AnomalyDetection_02.png"
+  [52]: ./images/AnomalyDetection_03.png "AnomalyDetection_03.png"
+  [53]: ./images/AnomalyDetection_05.png "AnomalyDetection_05.png"
+  [54]: ./images/AnomalyDetection_07.png "AnomalyDetection_07.png"
+  [55]: ./images/AnomalyDetection_06.png "AnomalyDetection_06.png"