Skip to content
/ SAS Public

Python version of Smoothed Alternating Least Squares algorithm

Notifications You must be signed in to change notification settings

WeiqingYu/SAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SAS

This is a Python realization of Smoothed Alternating Least Squares (SAS) algorithm.

Introduction

Based on Netflix Prize-winning algorithm, ALS, we propose SAS, an missing data recovery algorithm that is more suitable for time-series data.

SAS is designed to recover missing values for large-scale time-series data. Instead of estimating the missing values either column by column or row by row, as in traditional imputation methods, SAS recovers the missing values by extracting patterns of data based on observed entries and then estimating the missing entries with the patterns. SAS fully utilizes the information in the data to learn the pattern and make inferences, which guarantees the estimation accuracy.

SAS solves the matrix completion problem by implementing low-rank matrix decomposition. Specifically, it assumes there are certain patterns in the data, and that these patterns are described in a low-rank matrix. Apart from the low-rank assumption, SAS considers the smoothness of the data, which helps the algorithm better suit time-series data.

Usage

To implement SAS in data recovery tasks, one can simply download the sas.py file and put it in the working directory. Then import the SAS class with the following command.

from sas import SAS

Numpy package is necessary to implement the SAS class. Once imported the class, one can initialize the class with the following command.

sasobject = SAS(lam=1.5,mu=2,rank=20,thres = 0.001, maxit = 20)

Then use the command below to recover a partially observed matrix (numpy array).

res_A, res_B = sasobject.fit(data = pomat)

Parameters

One can specify five parameters at class initialization, i.e., lam, mu, rank, thres, and maxit.

  • lam: The regularization term in ALS to prevent general overfitting. Default: 1.5.

  • mu: The smoothness regularization term to penalize turbulence in the data. Default: 2.

  • rank: The rank of decomposition. Default: 20.

  • thres: Termination threshold for the algorithm. Default: 0.001.

  • maxit: Maximum number of iteration. Default: 20.

Input and Output

Input

The input data for _.fit() function should be a partially observed numpy array, and values at missing entries should be np.nan.

If the complete matrix (ground truth) is available, then one can set withtrue to be True anddata1 to be the complete matrix to observe the recovery accuracy.

The printed training and testing error for each iteration is measured by the mean absolute error.

Output

Output of the fit function is the matrix decomposition of partially observed matrix. To obtain the recovered matrix, simply use the following command to combine the decomposition matrices.

result = np.matmul(res_A.transpose(),res_B)

About

Python version of Smoothed Alternating Least Squares algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published