Skip to content

Commit

Permalink
update and refactor documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
akondas committed May 2, 2016
1 parent 55e73b4 commit 5950af6
Show file tree
Hide file tree
Showing 18 changed files with 434 additions and 62 deletions.
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,23 @@ composer require php-ai/php-ml
## Features

* Classification
* [k-Nearest Neighbors](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/k-nearest-neighbors/)
* [Naive Bayes](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/naive-bayes/)
* Regression
* [Least Squares](http://php-ml.readthedocs.io/en/latest/machine-learning/regression/least-squares/)
* Clustering
* [k-Means](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/k-means)
* [DBSCAN](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/dbscan)
* Cross Validation
* [Random Split](http://php-ml.readthedocs.io/en/latest/machine-learning/cross-validation/random-split)
* Datasets
* [CSV](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/csv-dataset)
* Ready to use:
* [Iris](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/demo/iris/)
* Math
* [Distance](http://php-ml.readthedocs.io/en/latest/math/distance/)
* [Matrix](http://php-ml.readthedocs.io/en/latest/math/matrix/)


## Contribute

Expand Down
54 changes: 46 additions & 8 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,30 @@
# PHP Machine Learning (PHP-ML)
# PHP Machine Learning library

[![Build Status](https://scrutinizer-ci.com/g/php-ai/php-ml/badges/build.png?b=develop)](https://scrutinizer-ci.com/g/php-ai/php-ml/build-status/develop)
[![Documentation Status](https://readthedocs.org/projects/php-ml/badge/?version=develop)](http://php-ml.readthedocs.org/en/develop/?badge=develop)
[![Total Downloads](https://poser.pugx.org/php-ai/php-ml/downloads.svg)](https://packagist.org/packages/php-ai/php-ml)
[![License](https://poser.pugx.org/php-ai/php-ml/license.svg)](https://packagist.org/packages/php-ai/php-ml)
[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/php-ai/php-ml/badges/quality-score.png?b=develop)](https://scrutinizer-ci.com/g/php-ai/php-ml/?branch=develop)

Fresh approach to machine learning in PHP. Note that at the moment PHP is not the best choice for machine learning but maybe this will change ...
Fresh approach to Machine Learning in PHP. Note that at the moment PHP is not the best choice for machine learning but maybe this will change ...

Simple example of classification:
```php
use Phpml\Classifier\KNearestNeighbors;

$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
$labels = ['a', 'a', 'a', 'b', 'b', 'b'];

$classifier = new KNearestNeighbors();
$classifier->train($samples, $labels);

$classifier->predict([3, 2]);
// return 'b'
```

## Documentation

To find out how to use PHP-ML follow [Documentation](http://php-ml.readthedocs.org/).

## Installation

Expand All @@ -15,14 +34,33 @@ Currently this library is in the process of developing, but You can install it w
composer require php-ai/php-ml
```

## To-Do
## Features

* Classification
* [k-Nearest Neighbors](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/k-nearest-neighbors/)
* [Naive Bayes](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/naive-bayes/)
* Regression
* [Least Squares](http://php-ml.readthedocs.io/en/latest/machine-learning/regression/least-squares/)
* Clustering
* [k-Means](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/k-means)
* [DBSCAN](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/dbscan)
* Cross Validation
* [Random Split](http://php-ml.readthedocs.io/en/latest/machine-learning/cross-validation/random-split)
* Datasets
* [CSV](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/csv-dataset)
* Ready to use:
* [Iris](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/demo/iris/)
* Math
* [Distance](http://php-ml.readthedocs.io/en/latest/math/distance/)
* [Matrix](http://php-ml.readthedocs.io/en/latest/math/matrix/)


* implements more algorithms
* integration with Lavacharts for data visualization
## Contribute

## Testing
- Issue Tracker: github.com/php-ai/php-ml/issues
- Source Code: github.com/php-ai/php-ml

After installation, you can launch the test suite in project root directory (you will need to install dev requirements with composer)
After installation, you can launch the test suite in project root directory (you will need to install dev requirements with Composer)

```
bin/phpunit
Expand All @@ -34,4 +72,4 @@ PHP-ML is released under the MIT Licence. See the bundled LICENSE file for detai

## Author

Arkadiusz Kondas (@ArkadiuszKondas)
Arkadiusz Kondas (@ArkadiuszKondas)
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Classifier implementing the k-nearest neighbors algorithm.
### Constructor Parameters

* $k - number of nearest neighbors to scan (default: 3)
* $distanceMetric - Distance class, default Euclidean (see Distance Metric documentation)
* $distanceMetric - Distance object, default Euclidean (see [distance documentation](math/distance/))

```
$classifier = new KNearestNeighbors($k=4);
Expand All @@ -14,7 +14,7 @@ $classifier = new KNearestNeighbors($k=3, new Minkowski($lambda=4));

### Train

To train a classifier simply provide train samples and labels (as `array`):
To train a classifier simply provide train samples and labels (as `array`). Example:

```
$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
Expand All @@ -26,7 +26,7 @@ $classifier->train($samples, $labels);

### Predict

To predict sample class use `predict` method. You can provide one sample or array of samples:
To predict sample label use `predict` method. You can provide one sample or array of samples:

```
$classifier->predict([3, 2]);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Classifier based on applying Bayes' theorem with strong (naive) independence ass

### Train

To train a classifier simply provide train samples and labels (as `array`):
To train a classifier simply provide train samples and labels (as `array`). Example:

```
$samples = [[5, 1, 1], [1, 5, 1], [1, 1, 5]];
Expand All @@ -16,7 +16,7 @@ $classifier->train($samples, $labels);

### Predict

To predict sample class use `predict` method. You can provide one sample or array of samples:
To predict sample label use `predict` method. You can provide one sample or array of samples:

```
$classifier->predict([3, 1, 1]);
Expand Down
27 changes: 27 additions & 0 deletions docs/machine-learning/clustering/dbscan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# DBSCAN clustering

It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature.
*(source: wikipedia)*

### Constructor Parameters

* $epsilon - epsilon, maximum distance between two samples for them to be considered as in the same neighborhood
* $minSamples - number of samples in a neighborhood for a point to be considered as a core point (this includes the point itself)
* $distanceMetric - Distance object, default Euclidean (see [distance documentation](math/distance/))

```
$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3);
$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3, new Minkowski($lambda=4));
```

### Clustering

To divide the samples into clusters simply use `cluster` method. It's return the `array` of clusters with samples inside.

```
$samples = [[1, 1], [8, 7], [1, 2], [7, 8], [2, 1], [8, 9]];
$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3);
$dbscan->cluster($samples);
// return [0=>[[1, 1], ...], 1=>[[8, 7], ...]]
```
37 changes: 37 additions & 0 deletions docs/machine-learning/clustering/k-means.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# K-means clustering

The K-Means algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.
This algorithm requires the number of clusters to be specified.

### Constructor Parameters

* $clustersNumber - number of clusters to find
* $initialization - initialization method, default kmeans++ (see below)

```
$kmeans = new KMeans(2);
$kmeans = new KMeans(4, KMeans::INIT_RANDOM);
```

### Clustering

To divide the samples into clusters simply use `cluster` method. It's return the `array` of clusters with samples inside.

```
$samples = [[1, 1], [8, 7], [1, 2], [7, 8], [2, 1], [8, 9]];
$kmeans = new KMeans(2);
$kmeans->cluster($samples);
// return [0=>[[1, 1], ...], 1=>[[8, 7], ...]]
```

### Initialization methods

#### kmeans++ (default)

K-means++ method selects initial cluster centers for k-mean clustering in a smart way to speed up convergence.
It use the DASV seeding method consists of finding good initial centroids for the clusters.

#### random

Random initialization method chooses completely random centroid. It get the space boundaries to avoid placing clusters centroid too far from samples data.
2 changes: 1 addition & 1 deletion docs/machine-learning/datasets/csv-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ Helper class that loads data from CSV file. It extends the `ArrayDataset`.
$dataset = new CsvDataset('dataset.csv', 2, true);
```

See Array Dataset for more information.
See [ArrayDataset](machine-learning/datasets/array-dataset/) for more information.
2 changes: 1 addition & 1 deletion docs/machine-learning/datasets/demo/iris.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ To load Iris dataset simple use:
$dataset = new Iris();
```

### Several samples
### Several samples example

```
sepal length,sepal width,petal length,petal width,class
Expand Down
2 changes: 1 addition & 1 deletion docs/machine-learning/metric/accuracy.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Class for calculate classifier accuracy.

### Score

To calculate classifier accuracy score use `score` static method. Parametrs:
To calculate classifier accuracy score use `score` static method. Parameters:

* $actualLabels - (array) true sample labels
* $predictedLabels - (array) predicted labels (e.x. from test group)
Expand Down
3 changes: 0 additions & 3 deletions docs/machine-learning/metric/distance/chebyshev.md

This file was deleted.

16 changes: 0 additions & 16 deletions docs/machine-learning/metric/distance/euclidean.md

This file was deleted.

16 changes: 0 additions & 16 deletions docs/machine-learning/metric/distance/manhattan.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/machine-learning/metric/distance/minkowski.md

This file was deleted.

51 changes: 51 additions & 0 deletions docs/machine-learning/regression/least-squares.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# LeastSquares Linear Regression

Linear model that use least squares method to approximate solution.

### Train

To train a model simply provide train samples and targets values (as `array`). Example:

```
$samples = [[60], [61], [62], [63], [65]];
$targets = [3.1, 3.6, 3.8, 4, 4.1];
$regression = new LeastSquares();
$regression->train($samples, $targets);
```

### Predict

To predict sample target value use `predict` method with sample to check (as `array`). Example:

```
$regression->predict([64]);
// return 4.06
```

### Multiple Linear Regression

The term multiple attached to linear regression means that there are two or more sample parameters used to predict target.
For example you can use: mileage and production year to predict price of a car.

```
$samples = [[73676, 1996], [77006, 1998], [10565, 2000], [146088, 1995], [15000, 2001], [65940, 2000], [9300, 2000], [93739, 1996], [153260, 1994], [17764, 2002], [57000, 1998], [15000, 2000]];
$targets = [2000, 2750, 15500, 960, 4400, 8800, 7100, 2550, 1025, 5900, 4600, 4400];
$regression = new LeastSquares();
$regression->train($samples, $targets);
$regression->predict([60000, 1996])
// return 4094.82
```

### Intercept and Coefficients

After you train your model you can get the intercept and coefficients array.

```
$regression->getIntercept();
// return -7.9635135135131
$regression->getCoefficients();
// return [array(1) {[0]=>float(0.18783783783783)}]
```
Loading

0 comments on commit 5950af6

Please sign in to comment.