forked from snap-stanford/snap
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
hysgen project, hypergraph data structure (THGraph), and correspondin…
…g changes to the files are added --- PART 2 + updated two gitignore files.
- Loading branch information
Showing
16 changed files
with
5,804 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Prerequisites | ||
*.d | ||
|
||
# Compiled Object files | ||
*.slo | ||
*.lo | ||
*.o | ||
*.obj | ||
|
||
# Precompiled Headers | ||
*.gch | ||
*.pch | ||
|
||
# Compiled Dynamic libraries | ||
*.so | ||
*.dylib | ||
*.dll | ||
|
||
# Fortran module files | ||
*.mod | ||
*.smod | ||
|
||
# Compiled Static libraries | ||
*.lai | ||
*.la | ||
*.a | ||
*.lib | ||
|
||
# Executables | ||
*.exe | ||
*.out | ||
*.app |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
BSD 3-Clause License | ||
|
||
Copyright (c) 2022, Bahman Pedrood | ||
All rights reserved. | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are met: | ||
|
||
1. Redistributions of source code must retain the above copyright notice, this | ||
list of conditions and the following disclaimer. | ||
|
||
2. Redistributions in binary form must reproduce the above copyright notice, | ||
this list of conditions and the following disclaimer in the documentation | ||
and/or other materials provided with the distribution. | ||
|
||
3. Neither the name of the copyright holder nor the names of its | ||
contributors may be used to endorse or promote products derived from | ||
this software without specific prior written permission. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | ||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | ||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | ||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | ||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# | ||
# Makefile for this SNAP example | ||
# - modify Makefile.ex when creating a new SNAP example | ||
# | ||
# implements: | ||
# all (default), clean | ||
# | ||
|
||
include ../../Makefile.config | ||
include Makefile.ex | ||
include ../Makefile.exmain |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
## Main application file | ||
MAIN = hysgen_main | ||
DEPH = $(EXSNAPADV)/hysgen.h | ||
DEPCPP = $(EXSNAPADV)/hysgen.cpp | ||
#CXXFLAGS += $(CXXOPENMP) | ||
#CXXFLAGS += -g -rdynamic | ||
#CXXFLAGS += -ggdb | ||
#CXXFLAGS += -ggdb3 -rdynamic | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
======================================================================== | ||
Hypergraph Simultaneous Generators (HySGen) | ||
======================================================================== | ||
|
||
This repository contains three main components: | ||
|
||
1) HySGen: An efficient probabilistic generative model for discovering node | ||
clusters/communities in hypergraphs. For the details of the model and | ||
the community inference algorithm, please see our paper*. | ||
2) HGraph: A fast, reliable, and comprehensive C++ data structure for | ||
undirected, unweighted hypergraphs. | ||
3) Three hypergraphs extracted from real-world data, uploaded in the "Data" | ||
directory. Please see our paper* for more information. | ||
|
||
Please cite our paper upon using any of those components: | ||
* B. Pedrood, C. Domeniconi, and K. Laskey. "Hypergraph Simultaneous Generators." AISTATS 2022. | ||
|
||
///////////////////////////////////////////////////////////////////////////// | ||
|
||
The code in this project is developed on top of the SNAP [(c) 2007-2019, | ||
Jure Leskovec] open-source graph analysis library. To facilitate the usage | ||
for SNAP users, I maintained the structure and code standarads as recommended | ||
in SNAP. The directory structure of this project is as follows below: | ||
|
||
snap: | ||
An intact copy of the original SNAP library's source code, which | ||
modules are used in this project. | ||
local_snap: | ||
We developed our classes in this directory. The subdirectories | ||
and file structures are chosen this way for maximum consistency | ||
with SNAP. | ||
local_snap/snap-adv: | ||
HySGen's implemented classes and functions for community inference. | ||
local_snap/snap-core: | ||
HGraph data structure is implemented in this directory. To see | ||
the details of the function and classes, see the files with | ||
"loc_graph" and "loc_subgraph" names. | ||
|
||
Like other SNAP projects, this code works under Windows with Cygwin with GCC, | ||
Mac OS X, Linux and other Unix variants with GCC. To use with Visual Studio, | ||
you have to create a new project for this project. Make sure that a C++ compiler | ||
is installed on the system. Makefiles are provided, so you can complie the code | ||
in the command line with the following command: | ||
make all | ||
|
||
///////////////////////////////////////////////////////////////////////////// | ||
|
||
Parameters: | ||
-i: Input [hyper]edgelist file url. | ||
-o: Output file url + name prefix for the discovered communities. | ||
-c: The number of communities to detect. | ||
-op: Output file performance plot (Default: empty for no plot). | ||
-ci: Community initialization file url (Default: empty). | ||
-l: Url for node names file (Default: empty). | ||
-mc: Minimum size of the communities(Default: 3). | ||
-rs: Random Seed. | ||
-xi: Maximum number of iterations (Default: 1000). | ||
-ic: Initial membership value for the seed communities (Default: 0.1). | ||
-in: The default membership value of each node to all the communities (Default: 0.03). | ||
-rp: Ratio of initial memberships to be randomly perturbed (Default: 0.0). | ||
-rw: Weight for l-1 regularization on learning the model parameters (Default: 0.0) | ||
-sz: Initial step size for backtracking line search (Default: 0.5). | ||
-sa: Control parameter for backtracking line search (Default: 0.5). | ||
-sr: Step-size reduction ratio for backtracking line search (Default: 0.5). | ||
|
||
///////////////////////////////////////////////////////////////////////////// | ||
|
||
Usage: | ||
|
||
Discover 309 communities from the NSF collaboration hypergraph. | ||
|
||
./hysgen_main -i:./Data/NSF/hypergraph.hyperedges -o:./out_communities -c:309 -mc:3 -ic:0.1 -in:0.001 -rw:0.0001 -sa:0.95 -sz:0.01 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
#include "hysgen.h" | ||
#include "agm.h" | ||
|
||
int main(int argc, char* argv[]) { | ||
Env = TEnv(argc, argv, TNotify::StdNotify); | ||
Env.PrepArgs(TStr::Fmt("HySGen. build: %s, %s. Time: %s", __TIME__, __DATE__, TExeTm::GetCurTm())); | ||
TExeTm ExeTm; | ||
|
||
Try | ||
|
||
const TStr InFNm = Env.GetIfArgPrefixStr("-i:", "./synthetic_data/synthetic.hyperedges", "Input [hyper]edgelist file url."); | ||
const TStr OutFPrx = Env.GetIfArgPrefixStr("-o:", "./synthetic_res", "Output file url + name prefix for the discovered communities."); | ||
int OptComs = Env.GetIfArgPrefixInt("-c:", 2, "The number of communities to detect."); | ||
const TStr OutPlt = Env.GetIfArgPrefixStr("-op:", "", "Output file performance plot (empty for no plot)."); | ||
const TStr InitComFNm = Env.GetIfArgPrefixStr("-ci:", "", "Community initialization file url."); | ||
const TStr LabelFNm = Env.GetIfArgPrefixStr("-l:", "", "Input file name for node names (Node ID, Node label) "); | ||
const int MinComSize = Env.GetIfArgPrefixInt("-mc:", 3, "Minimum size of the communities."); | ||
const int RndSeed = Env.GetIfArgPrefixInt("-rs:", 0, "Random Seed"); | ||
int MaxIter = Env.GetIfArgPrefixInt("-xi:", 1000, "Maximum number of iterations"); | ||
const double InitComS = Env.GetIfArgPrefixFlt("-ic:", 0.1, "Initial membership value for the initially assigned communities"); | ||
const double InitNulS = Env.GetIfArgPrefixFlt("-in:", 0.03, "The default membership value of each node to all the communities"); | ||
double PerturbDensity = Env.GetIfArgPrefixFlt("-rp:", 0.0, "Ratio of initial memberships to be randomly perturbed."); | ||
const double RegCoef = Env.GetIfArgPrefixFlt("-rw:", 0.0, "Weight for l-1 regularization on learning the model parameters"); | ||
const double StepSize = Env.GetIfArgPrefixFlt("-sz:", 0.5, "Initial step size for backtracking line search"); | ||
const double StepCtrlParam = Env.GetIfArgPrefixFlt("-sa:", 0.5, "Control parameter for backtracking line search"); | ||
const double StepReductionRatio = Env.GetIfArgPrefixFlt("-sr:", 0.5, "Step-size reduction ratio for backtracking line search"); | ||
|
||
|
||
PHGraph G; | ||
TIntStrH NIDNameH, NIDEdgelistnameH; | ||
TStrIntH NameNIdH, EdgelistnameNIdH; | ||
TStrHash<TInt> NodeNameH; | ||
TVec<TFltV> WckVV; | ||
TVec<TIntFltH> EstCmtyVH; | ||
TVec<TIntV> EstCmtyVV; | ||
if (InFNm.IsSuffix(".hgraph")) { | ||
TFIn GFIn(InFNm); | ||
G = THGraph::Load(GFIn); | ||
} else { | ||
G = THysgenUtil::LoadEdgeList(InFNm, NodeNameH); | ||
NIDNameH.Gen(NodeNameH.Len()); NIDEdgelistnameH.Gen(NodeNameH.Len()); | ||
NameNIdH.Gen(NodeNameH.Len()); EdgelistnameNIdH.Gen(NodeNameH.Len()); | ||
for (int s = 0; s < NodeNameH.Len(); s++) { | ||
NIDNameH.AddDat(s, NodeNameH.GetKey(s)); | ||
NIDEdgelistnameH.AddDat(s, NodeNameH.GetKey(s)); | ||
NameNIdH.AddDat(NodeNameH.GetKey(s), s); | ||
EdgelistnameNIdH.AddDat(NodeNameH.GetKey(s), s); | ||
} | ||
} | ||
if (LabelFNm.Len() > 0) { | ||
TSsParser Ss(LabelFNm, ssfTabSep); | ||
while (Ss.Next()) { | ||
if (Ss.Len() > 1) {NIDNameH.AddDat(NameNIdH.GetDat(Ss[0]), Ss.GetFld(1)); } | ||
} | ||
} | ||
printf("HyperGraph: %d Nodes %d Edges\n", G->GetNodes(), G->GetEdges()); | ||
|
||
TIntV NIDV; | ||
G->GetNIdV(NIDV); | ||
|
||
TExeTm RunTm; | ||
THysgen Optimizer(G, 5, RndSeed, InitComS, InitNulS); | ||
Optimizer.ComInit(OptComs, MinComSize, PerturbDensity); | ||
if (InitComFNm.Len() > 0) { | ||
Optimizer.LoadComInit(InitComFNm); | ||
} | ||
Optimizer.SetRegCoef(RegCoef); | ||
|
||
double Threshold = TFlt::EpsHalf; | ||
Optimizer.GetCmtyVV(EstCmtyVH, EstCmtyVV, WckVV, InitNulS, MinComSize); | ||
THysgenUtil::DumpCmtyVH(OutFPrx + "cmtyvv_init.txt", EstCmtyVH, NIDNameH); | ||
|
||
Optimizer.MLEGradAscent(0.01, MaxIter * G->GetNodes(), OutPlt, StepSize, StepCtrlParam, StepReductionRatio); | ||
|
||
Optimizer.GetCmtyVV(EstCmtyVH, EstCmtyVV, WckVV, Threshold, MinComSize); | ||
THysgenUtil::DumpCmtyVH(OutFPrx + "cmty_ById_values.tsv", EstCmtyVH, NIDEdgelistnameH); | ||
THysgenUtil::DumpCmtyVV(OutFPrx + "cmty_ById_members.txt", EstCmtyVV, NIDEdgelistnameH); | ||
THysgenUtil::DumpCmtyVH(OutFPrx + "cmty_values.txt", EstCmtyVH, NIDNameH); | ||
THysgenUtil::DumpCmtyVV(OutFPrx + "cmty_members.txt", EstCmtyVV, NIDNameH); | ||
|
||
Catch | ||
|
||
printf("\nrun time: %s (%s)\n", ExeTm.GetTmStr(), TSecTm::GetCurTm().GetTmStr().CStr()); | ||
|
||
return 0; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
// stdafx.cpp : source file that includes just the standard includes | ||
// cesna.pch will be the pre-compiled header | ||
// stdafx.obj will contain the pre-compiled type information | ||
|
||
#include "stdafx.h" | ||
|
||
// TODO: reference any additional headers you need in STDAFX.H | ||
// and not in this file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
#pragma once | ||
|
||
#include "targetver.h" | ||
|
||
#include "Snap.h" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
1 2 3 4 5 11 12 13 14 15 21 22 23 24 25 | ||
7 8 9 10 17 18 19 20 27 28 29 30 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Supposed Scenario: | ||
|
||
It's the first day of an academic year. Student gatherings have been recorded from some point during the spring semester last year. Assume there exist two communities of CS student and history students; the only social communities to which they belong. The members of these two communities are specified in the file "ground_truth_comms.txt". | ||
The hypergraph for this example is a network of recorded gatherings in the university in the timeline mentioned before. Each gathering corresponds to a hyperedge that connects the attending students. The hypergraph is saved in "synthetic.hyperedges", where each line corresponds to the ID of the nodes in a hyperedge. | ||
A regular graph equivalent of the hypergraph is represented in "synthetic.edges", where the list of the edges are stored. This graph is created by mapping a k-clique to a hyperedge of size k. | ||
|
||
|
||
################################# | ||
Community detection complication: | ||
|
||
There are two large hyperedges in the hypergraph that make the problem of discoverying the communities complicated, which correspond to two outdoor welcome parties for the students. Nodes 36 through 71 in this hyperedges represent some passerbys who are not students, only joined the parties to enjoy the music, game and free food. Nodes 76 to 87 are new students, half (6) CS and half (6) histroy. The new students should not be correctly identified because the only gatherings they had so far has been an orientation, which has been gathered with 3 senior students of each major to talk about the dept for them; and of course the gathering of welcome party. In the party, they are divided into partiy groups (4-8) that are independent of their major. |
Oops, something went wrong.