LIBMF - large-scale sparse matrix factorization - for PHP
Check out Disco for higher-level collaborative filtering
Run:
composer require ankane/libmf
And download the shared library:
composer exec -- php -r "require 'vendor/autoload.php'; Libmf\Vendor::check();"
Prep your data in the format rowIndex, columnIndex, value
$data = new Libmf\Matrix();
$data->push(0, 0, 5.0);
$data->push(0, 2, 3.5);
$data->push(1, 1, 4.0);
Create a model
$model = new Libmf\Model();
$model->fit($data);
Make predictions
$model->predict($rowIndex, $columnIndex);
Get the latent factors (these approximate the training matrix)
$model->p();
$model->q();
Get the bias (average of all elements in the training matrix)
$model->bias();
Save the model to a file
$model->save('model.txt');
Load the model from a file
$model = Libmf\Model::load('model.txt');
Pass a validation set
$model->fit($data, $validSet);
Perform cross-validation
$model->cv($data);
Specify the number of folds
$model->cv($data, 5);
Pass parameters - default values below
use Libmf\Loss;
new Libmf\Model(
loss: Loss::RealL2, // loss function
factors: 8, // number of latent factors
threads: 12, // number of threads used
bins: 25, // number of bins
iterations: 20, // number of iterations
lambdaP1: 0, // coefficient of L1-norm regularization on P
lambdaP2: 0.1, // coefficient of L2-norm regularization on P
lambdaQ1: 0, // coefficient of L1-norm regularization on Q
lambdaQ2: 0.1, // coefficient of L2-norm regularization on Q
learningRate: 0.1, // learning rate
alpha: 1, // importance of negative entries
c: 0.0001, // desired value of negative entries
nmf: false, // perform non-negative MF (NMF)
quiet: false // no outputs to stdout
);
For real-valued matrix factorization
Loss::RealL2
- squared error (L2-norm)Loss::RealL1
- absolute error (L1-norm)Loss::RealKL
- generalized KL-divergence
For binary matrix factorization
Loss::BinaryLog
- logarithmic errorLoss::BinaryL2
- squared hinge lossLoss::BinaryL1
- hinge loss
For one-class matrix factorization
Loss::OneClassRow
- row-oriented pair-wise logarithmic lossLoss::OneClassCol
- column-oriented pair-wise logarithmic lossLoss::OneClassL2
- squared error (L2-norm)
Calculate RMSE (for real-valued MF)
$model->rmse($data);
Calculate MAE (for real-valued MF)
$model->mae($data);
Calculate generalized KL-divergence (for non-negative real-valued MF)
$model->gkl($data);
Calculate logarithmic loss (for binary MF)
$model->logloss($data);
Calculate accuracy (for binary MF)
$model->accuracy($data);
Calculate MPR (for one-class MF)
$model->mpr($data, $transpose);
Calculate AUC (for one-class MF)
$model->auc($data, $transpose);
Download the MovieLens 100K dataset and use:
$trainSet = new Libmf\Matrix();
$validSet = new Libmf\Matrix();
if (($handle = fopen('u.data', 'r')) !== false) {
$i = 0;
while (($row = fgetcsv($handle, separator: "\t")) !== false) {
$data = $i < 80000 ? $trainSet : $validSet;
$data->push($row[0], $row[1], $row[2]);
$i++;
}
fclose($handle);
}
$model = new Libmf\Model(factors: 20);
$model->fit($trainSet, $validSet);
echo $model->rmse($validSet), "\n";
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/libmf-php.git
cd libmf-php
composer install
composer test