forked from jorgecasas/php-ml
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
18 changed files
with
434 additions
and
62 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# DBSCAN clustering | ||
|
||
It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature. | ||
*(source: wikipedia)* | ||
|
||
### Constructor Parameters | ||
|
||
* $epsilon - epsilon, maximum distance between two samples for them to be considered as in the same neighborhood | ||
* $minSamples - number of samples in a neighborhood for a point to be considered as a core point (this includes the point itself) | ||
* $distanceMetric - Distance object, default Euclidean (see [distance documentation](math/distance/)) | ||
|
||
``` | ||
$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3); | ||
$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3, new Minkowski($lambda=4)); | ||
``` | ||
|
||
### Clustering | ||
|
||
To divide the samples into clusters simply use `cluster` method. It's return the `array` of clusters with samples inside. | ||
|
||
``` | ||
$samples = [[1, 1], [8, 7], [1, 2], [7, 8], [2, 1], [8, 9]]; | ||
$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3); | ||
$dbscan->cluster($samples); | ||
// return [0=>[[1, 1], ...], 1=>[[8, 7], ...]] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# K-means clustering | ||
|
||
The K-Means algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. | ||
This algorithm requires the number of clusters to be specified. | ||
|
||
### Constructor Parameters | ||
|
||
* $clustersNumber - number of clusters to find | ||
* $initialization - initialization method, default kmeans++ (see below) | ||
|
||
``` | ||
$kmeans = new KMeans(2); | ||
$kmeans = new KMeans(4, KMeans::INIT_RANDOM); | ||
``` | ||
|
||
### Clustering | ||
|
||
To divide the samples into clusters simply use `cluster` method. It's return the `array` of clusters with samples inside. | ||
|
||
``` | ||
$samples = [[1, 1], [8, 7], [1, 2], [7, 8], [2, 1], [8, 9]]; | ||
$kmeans = new KMeans(2); | ||
$kmeans->cluster($samples); | ||
// return [0=>[[1, 1], ...], 1=>[[8, 7], ...]] | ||
``` | ||
|
||
### Initialization methods | ||
|
||
#### kmeans++ (default) | ||
|
||
K-means++ method selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. | ||
It use the DASV seeding method consists of finding good initial centroids for the clusters. | ||
|
||
#### random | ||
|
||
Random initialization method chooses completely random centroid. It get the space boundaries to avoid placing clusters centroid too far from samples data. |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# LeastSquares Linear Regression | ||
|
||
Linear model that use least squares method to approximate solution. | ||
|
||
### Train | ||
|
||
To train a model simply provide train samples and targets values (as `array`). Example: | ||
|
||
``` | ||
$samples = [[60], [61], [62], [63], [65]]; | ||
$targets = [3.1, 3.6, 3.8, 4, 4.1]; | ||
$regression = new LeastSquares(); | ||
$regression->train($samples, $targets); | ||
``` | ||
|
||
### Predict | ||
|
||
To predict sample target value use `predict` method with sample to check (as `array`). Example: | ||
|
||
``` | ||
$regression->predict([64]); | ||
// return 4.06 | ||
``` | ||
|
||
### Multiple Linear Regression | ||
|
||
The term multiple attached to linear regression means that there are two or more sample parameters used to predict target. | ||
For example you can use: mileage and production year to predict price of a car. | ||
|
||
``` | ||
$samples = [[73676, 1996], [77006, 1998], [10565, 2000], [146088, 1995], [15000, 2001], [65940, 2000], [9300, 2000], [93739, 1996], [153260, 1994], [17764, 2002], [57000, 1998], [15000, 2000]]; | ||
$targets = [2000, 2750, 15500, 960, 4400, 8800, 7100, 2550, 1025, 5900, 4600, 4400]; | ||
$regression = new LeastSquares(); | ||
$regression->train($samples, $targets); | ||
$regression->predict([60000, 1996]) | ||
// return 4094.82 | ||
``` | ||
|
||
### Intercept and Coefficients | ||
|
||
After you train your model you can get the intercept and coefficients array. | ||
|
||
``` | ||
$regression->getIntercept(); | ||
// return -7.9635135135131 | ||
$regression->getCoefficients(); | ||
// return [array(1) {[0]=>float(0.18783783783783)}] | ||
``` |
Oops, something went wrong.