README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE,
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

set.seed(27)
```

# vsp

<!-- badges: start -->
[![Codecov test coverage](https://codecov.io/gh/RoheLab/vsp/branch/main/graph/badge.svg)](https://app.codecov.io/gh/RoheLab/vsp?branch=main)
[![CRAN status](https://www.r-pkg.org/badges/version/vsp)](https://CRAN.R-project.org/package=vsp)
[![R-CMD-check](https://github.com/RoheLab/vsp/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/RoheLab/vsp/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

The goal of `vsp` is to enable fast, spectral estimation of latent factors in random dot product graphs. Under mild assumptions, the `vsp` estimator is consistent for (degree-corrected) stochastic blockmodels, (degree-corrected) mixed-membership stochastic blockmodels, and degree-corrected overlapping stochastic blockmodels.

More generally, the `vsp` estimator is consistent for random dot product graphs that can be written in the form

```
E(A) = Z B Y^T
```

where `Z` and `Y` satisfy the varimax assumptions of [1]. `vsp` works on directed and undirected graphs, and on weighted and unweighted graphs. Note that `vsp` is a semi-parametric estimator.

## Installation

You can install the released version of `vsp` from CRAN with

``` r
install.packages("vsp")
```

You can install the development version of `vsp` with:

``` r
install.packages("devtools")
devtools::install_github("RoheLab/vsp")
```

## Example

Obtaining estimates from `vsp` is straightforward. We recommend representing networks as [`igraph`](https://igraph.org/r/) objects or sparse adjacency matrices using the [`Matrix`](https://cran.r-project.org/package=Matrix) package. Once you have your network in one of these formats, you can get estimates by calling the `vsp()` function. The result is a `vsp_fa` S3 object.

Here we demonstrate `vsp` usage on an `igraph` object, using the `enron` network from `igraphdata` package to demonstrate this functionality. First we peak at the graph:

```{r}
library(igraph)
data(enron, package = "igraphdata")

image(sign(get.adjacency(enron, sparse = FALSE)))
```

Now we estimate:

```{r}
library(vsp)

fa <- vsp(enron, rank = 30)
fa
```

```{r}
get_varimax_z(fa)
```

To visualize a screeplot of the singular value, use:

```{r}
screeplot(fa)
```

At the moment, we also enjoy using pairs plots of the factors as a diagnostic measure:

```{r}
plot_varimax_z_pairs(fa, 1:5)
```

```{r}
plot_varimax_y_pairs(fa, 1:5)
```

Similarly, an IPR pairs plot can be a good way to check for singular vector localization (and thus overfitting!).

```{r}
plot_ipr_pairs(fa)
```

```{r}
plot_mixing_matrix(fa)
```

## References

[1] Rohe, Karl, and Muzhe Zeng. “Vintage Factor Analysis with Varimax Performs Statistical Inference.” Journal of the Royal Statistical Society Series B: Statistical Methodology 85, no. 4 (September 29, 2023): 1037–60. https://doi.org/10.1093/jrsssb/qkad029.

Code to reproduce the results from the paper is [available here](https://github.com/RoheLab/vsp-paper).