We introduce Spatiotemporal Graph k-means (STGkM), a novel, unsupervised method to cluster vertices within a dynamic network. Drawing inspiration from traditional k-means, STGkM finds both short-term dynamic clusters and a ``long-lived'' partitioning of vertices within a network whose topology is evolving over time. One of the main advantages of STGkM is that it has only one required parameter, namely k; we therefore include an analysis of the range of this parameter and guidance on selecting its optimal value.
This repository implements Spatiotemporal Graph k-Means (STGkM) and provides scripts that run the method on synthetic and real datasets.
STGkM is implemented as an extension of the Rust k-medoids package. In order to run the code, you must first fork the following repository: https://github.com/OlgaD400/python-kmedoids. You will also have to have Cargo (rust programming language) installed. Then, run the following to compile the k-medoids package from source.
pip install maturin
git clone https://github.com/kno10/python-kmedoids.git
cd python-kmedoids
# build and install the package:
maturin develop --release
- stgkm/distance_functions.py: Implementation of s-journey, as described in the related paper.
- stgkm/graph_visualization.py: Code to visualize an evolving dynamic graph.
- stgkm/helper_functions.py: Helpful functions for running STGkM.
- stgkm/synthetic_graphs.py: Contains classes for all syntehtic graphs.
- tests/tests.py: Contains tests for STGkM.
- stgkm_figures.py: Contains functions for generating all visualizations from experiments.
- DCDID/: Contains files for running DCDID (dynamic community detection based on information dynamics)
- clique_cross_clique_script.py: Script for clique-cross-clique experiments.
- compare_performance_script.py: Script for comparing the performance of different vertex clustering methods across various synthetic datasets.
- reddit_script.py: Script for running STGkM on reddit data.
- roll_call_data_creation_cript.py: Script to get data directly from the House of Representatives website and form dataframes.
- roll_call_data_clustering_script.py: Script to run STGkM on roll call dataset.
- semantic_scholar_script.py: Script to run STGkM on semantic scholar data.
- synthetic_three_cluster_script.py: Script to run STGkM on synthetic three cluster dataset.
- synthetic_two_cluster_script.py: Script to run STGkM on synthetic two cluster dataset.
- theseus_clique_script.py: Script to run STGkM on theseus clique.
All synthetic data can be created with the scripts. For the experimental data:
- Roll call: this can be generated by
scripts/roll_call_data_creation_script.py
- Semantic scholar: this data is generated by
scripts/semantic_scholar_script.py
- Reddit: this data is from SNAP at: https://snap.stanford.edu/data/soc-RedditHyperlinks.html