Skip to content
/ gSpan Public
forked from betterenvi/gSpan

Python implementation of frequent subgraph mining algorithm gSpan. Directed graphs are supported.

Notifications You must be signed in to change notification settings

geraore/gSpan

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gSpan

For Chinese readme, please go to README-Chinese.

gSpan is an algorithm for mining frequent subgraphs.

This program implements gSpan with Python. The repository on GitHub is https://github.com/betterenvi/gSpan. This implementation borrows some ideas from gboost.

Undirected Graphs

This program supports undirected graphs, and produces same results with gboost on the dataset graphdata/graph.data.

Directed Graphs

So far(date: 2016-10-29), gboost does not support directed graphs. This program implements gSpan for directed graphs. More specific, this program can mine frequent directed subgraph that has at least one node that can reach other nodes in the subgraph. But correctness is not guaranteed since the author did not do enough testing. After running several times on datasets graphdata/graph.data.directed.1 and graph.data.simple.5, there is no fault.

How to run

This program is developed under Python 2, therefore please run this program using Python 2.

$ python main.py [-s min_support] [-n num_graph] [-l min_num_vertices] [-u max_num_vertices] [-d] [-v] [-h] [-p] [-w] database_file_name 
Some examples
  • Read graph data from ./graphdata/graph.data, and mine undirected subgraphs given min support is 5000
$ python main.py -s 5000 ./graphdata/graph.data
  • Read graph data from ./graphdata/graph.data, mine undirected subgraphs given min support is 5000, and visualize these frequent subgraphs(matplotlib and networkx are required)
$ python main.py -s 5000 -p ./graphdata/graph.data
  • Read graph data from ./graphdata/graph.data, and mine directed subgraphs given min support is 5000
$ python main.py -s 5000 -d ./graphdata/graph.data
  • Print help info
$ python main.py -h

The author also wrote example code using Jupyter Notebook. Mining results and visualizations are presented. For detail, please refer to main.ipynb.

Running time

  • Environment

    • OS: Windows 10
    • Python version: Python 2.7.12
    • Processor: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 3.60 GHz
    • Ram: 8.00 GB
  • Running time On the dataset ./graphdata/graph.data, running time is listed below:

Min support Number of frequent subgraphs time
5000 26 51.48 s
3000 52 69.07 s
1000 455 3 m 49 s
600 1235 7 m 29 s
400 2710 12 m 53 s

Reference

gSpan: Graph-Based Substructure Pattern Mining, by X. Yan and J. Han. Proc. 2002 of Int. Conf. on Data Mining (ICDM'02).

One C++ implementation of gSpan.

About

Python implementation of frequent subgraph mining algorithm gSpan. Directed graphs are supported.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 91.4%
  • Python 7.7%
  • Roff 0.9%