update and refactor documentation

ramcoelho · May 2, 2016 · 5950af6 · 5950af6
1 parent 55e73b4
commit 5950af6
Show file tree

Hide file tree

Showing 18 changed files with 434 additions and 62 deletions.
diff --git a/README.md b/README.md
@@ -37,9 +37,23 @@ composer require php-ai/php-ml
 ## Features
 
 * Classification
+    * [k-Nearest Neighbors](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/k-nearest-neighbors/)
+    * [Naive Bayes](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/naive-bayes/)
 * Regression
+    * [Least Squares](http://php-ml.readthedocs.io/en/latest/machine-learning/regression/least-squares/)
 * Clustering
+    * [k-Means](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/k-means)
+    * [DBSCAN](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/dbscan)
 * Cross Validation
+    * [Random Split](http://php-ml.readthedocs.io/en/latest/machine-learning/cross-validation/random-split)
+* Datasets
+    * [CSV](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/csv-dataset)
+    * Ready to use:
+        * [Iris](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/demo/iris/)
+* Math
+    * [Distance](http://php-ml.readthedocs.io/en/latest/math/distance/)
+    * [Matrix](http://php-ml.readthedocs.io/en/latest/math/matrix/)
+
 
 ## Contribute
 

diff --git a/docs/index.md b/docs/index.md
@@ -1,11 +1,30 @@
-# PHP Machine Learning (PHP-ML)
+# PHP Machine Learning library
 
 [![Build Status](https://scrutinizer-ci.com/g/php-ai/php-ml/badges/build.png?b=develop)](https://scrutinizer-ci.com/g/php-ai/php-ml/build-status/develop)
+[![Documentation Status](https://readthedocs.org/projects/php-ml/badge/?version=develop)](http://php-ml.readthedocs.org/en/develop/?badge=develop)
 [![Total Downloads](https://poser.pugx.org/php-ai/php-ml/downloads.svg)](https://packagist.org/packages/php-ai/php-ml)
 [![License](https://poser.pugx.org/php-ai/php-ml/license.svg)](https://packagist.org/packages/php-ai/php-ml)
 [![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/php-ai/php-ml/badges/quality-score.png?b=develop)](https://scrutinizer-ci.com/g/php-ai/php-ml/?branch=develop)
 
-Fresh approach to machine learning in PHP. Note that at the moment PHP is not the best choice for machine learning but maybe this will change ...
+Fresh approach to Machine Learning in PHP. Note that at the moment PHP is not the best choice for machine learning but maybe this will change ...
+
+Simple example of classification:
+```php
+use Phpml\Classifier\KNearestNeighbors;
+
+$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
+$labels = ['a', 'a', 'a', 'b', 'b', 'b'];
+
+$classifier = new KNearestNeighbors();
+$classifier->train($samples, $labels);
+
+$classifier->predict([3, 2]); 
+// return 'b'
+```
+
+## Documentation
+
+To find out how to use PHP-ML follow [Documentation](http://php-ml.readthedocs.org/).
 
 ## Installation
 
@@ -15,14 +34,33 @@ Currently this library is in the process of developing, but You can install it w
 composer require php-ai/php-ml
 ```
 
-## To-Do
+## Features
+
+* Classification
+    * [k-Nearest Neighbors](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/k-nearest-neighbors/)
+    * [Naive Bayes](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/naive-bayes/)
+* Regression
+    * [Least Squares](http://php-ml.readthedocs.io/en/latest/machine-learning/regression/least-squares/)
+* Clustering
+    * [k-Means](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/k-means)
+    * [DBSCAN](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/dbscan)
+* Cross Validation
+    * [Random Split](http://php-ml.readthedocs.io/en/latest/machine-learning/cross-validation/random-split)
+* Datasets
+    * [CSV](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/csv-dataset)
+    * Ready to use:
+        * [Iris](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/demo/iris/)
+* Math
+    * [Distance](http://php-ml.readthedocs.io/en/latest/math/distance/)
+    * [Matrix](http://php-ml.readthedocs.io/en/latest/math/matrix/)
+
 
-* implements more algorithms
-* integration with Lavacharts for data visualization
+## Contribute
 
-## Testing
+- Issue Tracker: github.com/php-ai/php-ml/issues
+- Source Code: github.com/php-ai/php-ml
 
-After installation, you can launch the test suite in project root directory (you will need to install dev requirements with composer)
+After installation, you can launch the test suite in project root directory (you will need to install dev requirements with Composer)
 
 ```
 bin/phpunit
@@ -34,4 +72,4 @@ PHP-ML is released under the MIT Licence. See the bundled LICENSE file for detai
 
 ## Author
 
-Arkadiusz Kondas (@ArkadiuszKondas)
+Arkadiusz Kondas (@ArkadiuszKondas)
diff --git a/...rning/classification/knearestneighbors.md → ...ing/classification/k-nearest-neighbors.md b/...rning/classification/knearestneighbors.md → ...ing/classification/k-nearest-neighbors.md
@@ -5,7 +5,7 @@ Classifier implementing the k-nearest neighbors algorithm.
 ### Constructor Parameters
 
 * $k - number of nearest neighbors to scan (default: 3)
-* $distanceMetric - Distance class, default Euclidean (see Distance Metric  documentation)
+* $distanceMetric - Distance object, default Euclidean (see [distance documentation](math/distance/))
 
 ```
 $classifier = new KNearestNeighbors($k=4);
@@ -14,7 +14,7 @@ $classifier = new KNearestNeighbors($k=3, new Minkowski($lambda=4));
 
 ### Train
 
-To train a classifier simply provide train samples and labels (as `array`):
+To train a classifier simply provide train samples and labels (as `array`). Example:
 
 ```
 $samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
@@ -26,7 +26,7 @@ $classifier->train($samples, $labels);
 
 ### Predict
 
-To predict sample class use `predict` method. You can provide one sample or array of samples:
+To predict sample label use `predict` method. You can provide one sample or array of samples:
 
 ```
 $classifier->predict([3, 2]);

diff --git a/...ine-learning/classification/naivebayes.md → ...ne-learning/classification/naive-bayes.md b/...ine-learning/classification/naivebayes.md → ...ne-learning/classification/naive-bayes.md
@@ -4,7 +4,7 @@ Classifier based on applying Bayes' theorem with strong (naive) independence ass
 
 ### Train
 
-To train a classifier simply provide train samples and labels (as `array`):
+To train a classifier simply provide train samples and labels (as `array`). Example:
 
 ```
 $samples = [[5, 1, 1], [1, 5, 1], [1, 1, 5]];
@@ -16,7 +16,7 @@ $classifier->train($samples, $labels);
 
 ### Predict
 
-To predict sample class use `predict` method. You can provide one sample or array of samples:
+To predict sample label use `predict` method. You can provide one sample or array of samples:
 
 ```
 $classifier->predict([3, 1, 1]);

diff --git a/docs/machine-learning/clustering/dbscan.md b/docs/machine-learning/clustering/dbscan.md
@@ -0,0 +1,27 @@
+# DBSCAN clustering
+
+It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature.
+*(source: wikipedia)*
+
+### Constructor Parameters
+
+* $epsilon - epsilon, maximum distance between two samples for them to be considered as in the same neighborhood
+* $minSamples - number of samples in a neighborhood for a point to be considered as a core point (this includes the point itself)
+* $distanceMetric - Distance object, default Euclidean (see [distance documentation](math/distance/))
+
+```
+$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3);
+$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3, new Minkowski($lambda=4));
+```
+
+### Clustering
+
+To divide the samples into clusters simply use `cluster` method. It's return the `array` of clusters with samples inside.
+
+```
+$samples = [[1, 1], [8, 7], [1, 2], [7, 8], [2, 1], [8, 9]];
+
+$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3);
+$dbscan->cluster($samples);
+// return [0=>[[1, 1], ...], 1=>[[8, 7], ...]] 
+```
diff --git a/docs/machine-learning/clustering/k-means.md b/docs/machine-learning/clustering/k-means.md
@@ -0,0 +1,37 @@
+# K-means clustering
+
+The K-Means algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. 
+This algorithm requires the number of clusters to be specified.
+
+### Constructor Parameters
+
+* $clustersNumber - number of clusters to find
+* $initialization - initialization method, default kmeans++ (see below)
+
+```
+$kmeans = new KMeans(2);
+$kmeans = new KMeans(4, KMeans::INIT_RANDOM);
+```
+
+### Clustering
+
+To divide the samples into clusters simply use `cluster` method. It's return the `array` of clusters with samples inside.
+
+```
+$samples = [[1, 1], [8, 7], [1, 2], [7, 8], [2, 1], [8, 9]];
+
+$kmeans = new KMeans(2);
+$kmeans->cluster($samples);
+// return [0=>[[1, 1], ...], 1=>[[8, 7], ...]] 
+```
+
+### Initialization methods
+
+#### kmeans++ (default)
+
+K-means++ method selects initial cluster centers for k-mean clustering in a smart way to speed up convergence.
+It use the DASV seeding method consists of finding good initial centroids for the clusters.
+
+#### random
+
+Random initialization method chooses completely random centroid. It get the space boundaries to avoid placing clusters centroid too far from samples data.
diff --git a/...-learning/cross-validation/randomsplit.md → ...learning/cross-validation/random-split.md b/...-learning/cross-validation/randomsplit.md → ...learning/cross-validation/random-split.md
diff --git a/docs/machine-learning/datasets/csv-dataset.md b/docs/machine-learning/datasets/csv-dataset.md
@@ -12,4 +12,4 @@ Helper class that loads data from CSV file. It extends the `ArrayDataset`.
 $dataset = new CsvDataset('dataset.csv', 2, true);
 ```
 
-See Array Dataset for more information.
+See [ArrayDataset](machine-learning/datasets/array-dataset/) for more information.
diff --git a/docs/machine-learning/datasets/demo/iris.md b/docs/machine-learning/datasets/demo/iris.md
@@ -17,7 +17,7 @@ To load Iris dataset simple use:
 $dataset = new Iris();
 ```
 
-### Several samples
+### Several samples example
 
 ```
 sepal length,sepal width,petal length,petal width,class

diff --git a/docs/machine-learning/metric/accuracy.md b/docs/machine-learning/metric/accuracy.md
@@ -4,7 +4,7 @@ Class for calculate classifier accuracy.
 
 ### Score
 
-To calculate classifier accuracy score use `score` static method. Parametrs:
+To calculate classifier accuracy score use `score` static method. Parameters:
 
 * $actualLabels - (array) true sample labels
 * $predictedLabels - (array) predicted labels (e.x. from test group)

diff --git a/docs/machine-learning/metric/distance/chebyshev.md b/docs/machine-learning/metric/distance/chebyshev.md
diff --git a/docs/machine-learning/metric/distance/euclidean.md b/docs/machine-learning/metric/distance/euclidean.md
diff --git a/docs/machine-learning/metric/distance/manhattan.md b/docs/machine-learning/metric/distance/manhattan.md
diff --git a/docs/machine-learning/metric/distance/minkowski.md b/docs/machine-learning/metric/distance/minkowski.md
diff --git a/docs/machine-learning/regression/least-squares.md b/docs/machine-learning/regression/least-squares.md
@@ -0,0 +1,51 @@
+# LeastSquares Linear Regression
+
+Linear model that use least squares method to approximate solution. 
+
+### Train
+
+To train a model simply provide train samples and targets values (as `array`). Example:
+
+```
+$samples = [[60], [61], [62], [63], [65]];
+$targets = [3.1, 3.6, 3.8, 4, 4.1];
+
+$regression = new LeastSquares();
+$regression->train($samples, $targets);
+```
+
+### Predict
+
+To predict sample target value use `predict` method with sample to check (as `array`). Example:
+
+```
+$regression->predict([64]);
+// return 4.06
+```
+
+### Multiple Linear Regression
+
+The term multiple attached to linear regression means that there are two or more sample parameters used to predict target. 
+For example you can use: mileage and production year to predict price of a car.  
+
+```
+$samples = [[73676, 1996], [77006, 1998], [10565, 2000], [146088, 1995], [15000, 2001], [65940, 2000], [9300, 2000], [93739, 1996], [153260, 1994], [17764, 2002], [57000, 1998], [15000, 2000]];
+$targets = [2000, 2750, 15500, 960, 4400, 8800, 7100, 2550, 1025, 5900, 4600, 4400];
+
+$regression = new LeastSquares();
+$regression->train($samples, $targets);
+$regression->predict([60000, 1996])
+// return 4094.82
+```
+
+### Intercept and Coefficients
+
+After you train your model you can get the intercept and coefficients array.
+
+```
+$regression->getIntercept();
+// return -7.9635135135131
+
+$regression->getCoefficients();
+// return [array(1) {[0]=>float(0.18783783783783)}]
+```