The collaborative filtering toolkit provides tools to identify patterns of user interests and make targeted recommendations. Learn more about collaborative filtering here.
Most of the algorithms take the rating matrix R, which is a sparse matrix holding the rating given by users to movies, and builds a linear model, finding two low dimensional matrices
U and V s.t. their product approximates R: R ~ UV.
We implement multiple collaborative filtering algorithms: ALS (Alternating least squares), SGD (Stochastic Gradient Descent) , Bias SGD, Weighted-ALS, Sparse-ALS, SVD++.
GraphLab PowerGraph has a fast scalable implementation of the Kmeans++ algorithm: a robust method of grouping datapoints into clusters.
The Computer Vision Toolkit aims to provide fully distributed wrappers to algorithms in OpenCV, an open-source library aimed at real-time computer vision. Currently, the only implemented algorithm is Image-Stitching, where the goal is to create a composite panoramic image from a collection of images. Learn more about computer vision here.
GraphLab PowerGraph Computer Vision Toolkit has become it’s own spin-off project called CloudCV, a comprehensive system that will aims to provide access to state-of-the-art computer vision algorithms on the cloud.
CloudCV: Large-Scale Parallel Computer Vision on the Cloud
Graphical models provide a compact interpretable representation of complex statistical phenomena by encoding random variables as vertices in a graph and relationships between those variables as edges. The Graphical Models toolkit provides a collection of methods to make predictions under uncertainty, and for reasoning about structured noisy data.
The main components of Graphical Models toolkit are:
- Distributed Dual Decomposition: performs maximum a posteriori (MAP) inference in general Markov Random Fields via the Dual Decomposition algorithm. The MRF is assumed to be provided in the standard UAI file format. Maintained by Dhruv Batra.
- Structured Prediction: that applies the Loopy Belief propagation (LBP) algorithm to a pair-wise Markov Random Field encoding the classic Potts Model.
The Graph Analytics Toolkit aims to provide high performance, distributed tools for graph mining, for use in community detection, social network discovery, etc.
The toolkit currently implements the following tools:
Two triangle counting program:
- Undirected Triangle Counting: counts the total number of triangles in a graph, or the the number of triangles each vertex is in
- Directed Triangle Counting: Counts the number of types of triangles each vertex is in
A classical graph algorithm which assigns each vertex a numerical importance value based on random walk properties. Learn more about page rank here.
Identifies a hierarchical ordering of the vertices in the graph, allowing discovery of the central components of the network.
The topic modelling toolbox currently implemented the Latent Dirichlet Allocation algorithm for deriving semantic topic information from a corpus of plain text.
GraphLab PowerGraph iterative solvers, for solving a linear system of the type Ax = b.
Currently Jacobi method is implemented.