English | 中文
nebula-algorithm is a Spark Application based on GraphX with the following Algorithm provided for now:
Name | Use Case |
---|---|
PageRank | page ranking, important node digging |
Louvain | community digging, hierarchical clustering |
KCore | community detection, financial risk control |
LabelPropagation | community detection, consultation propagation, advertising recommendation |
Hanp | community detection, consultation propagation |
ConnectedComponent | community detection, isolated island detection |
StronglyConnectedComponent | community detection |
ShortestPath | path plan, network plan |
TriangleCount | network structure analysis |
GraphTriangleCount | network structure and tightness analysis |
BetweennessCentrality | important node digging, node influence calculation |
ClosenessCentrality | important node digging, node influence calculation |
DegreeStatic | graph structure analysis |
ClusteringCoefficient | recommended, telecom fraud analysis |
Jaccard | similarity calculation, recommendation |
BFS | sequence traversal, Shortest path plan |
Node2Vec | graph machine learning, recommendation |
You could submit the entire spark application or invoke algorithms in lib
library to apply graph algorithms for DataFrame.
-
Build Nebula Algorithm
$ git clone https://github.com/vesoft-inc/nebula-algorithm.git $ cd nebula-algorithm $ mvn clean package -Dgpg.skip -Dmaven.javadoc.skip=true -Dmaven.test.skip=true
After the above buiding process, the target file
nebula-algorithm-3.0-SNAPSHOT.jar
will be placed undernebula-algorithm/target
. -
Download from Maven repo
Alternatively, it could be downloaded from the following Maven repo:
-
Option 1: Submit nebula-algorithm package
- Configuration
Refer to the configuration example.
- Submit Spark Application
${SPARK_HOME}/bin/spark-submit --master <mode> --class com.vesoft.nebula.algorithm.Main nebula-algorithm-3.0—SNAPSHOT.jar -p application.conf
- Limitation
Due to Nebula Algorithm jar does not encode string id, thus during the algorithm execution, the source and target of edges must be in Type Int (The
vid_type
in Nebula Space could be String, while data must be in Type Int). -
Option2: Call nebula-algorithm interface
Now there are 10+ algorithms provided in
lib
fromnebula-algorithm
, which could be invoked in a programming fashion as below:- Add dependencies in
pom.xml
.
<dependency> <groupId>com.vesoft</groupId> <artifactId>nebula-algorithm</artifactId> <version>3.0.0</version> </dependency>
- Instantiate algorithm's config, below is an example for
PageRank
.
import com.vesoft.nebula.algorithm.config.{Configs, PRConfig, SparkConfig} import org.apache.spark.sql.{DataFrame, SparkSession} val spark = SparkSession.builder().master("local").getOrCreate() val data = spark.read.option("header", true).csv("src/test/resources/edge.csv") val prConfig = new PRConfig(5, 1.0) val prResult = PageRankAlgo.apply(spark, data, prConfig, false)
If your vertex ids are Strings, see Pagerank Example for how to encoding and decoding them.
For examples of other algorithms, see examples
Note: The first column of DataFrame in the application represents the source vertices, the second represents the target vertices and the third represents edges' weight.
- Add dependencies in
If you want to write the algorithm result into Nebula, make sure there is corresponding property name in your tag.
| Algorithm | property name |property type|
|:------------------------:|:-----------------------:|:-----------:|
| pagerank | pagerank |double/string|
| louvain | louvain | int/string |
| kcore | kcore | int/string |
| labelpropagation | lpa | int/string |
| connectedcomponent | cc | int/string |
|stronglyconnectedcomponent| scc | int/string |
| betweenness | betweenness |double/string|
| shortestpath | shortestpath | string |
| degreestatic |degree,inDegree,outDegree| int/string |
| trianglecount | trianglecount | int/string |
| clusteringcoefficient | clustercoefficient |double/string|
| closeness | closeness |double/string|
| hanp | hanp | int/string |
| bfs | bfs | string |
| jaccard | jaccard | string |
| node2vec | node2vec | string |
Nebula Algorithm Version | Nebula Version |
---|---|
2.0.0 | 2.0.0, 2.0.1 |
2.1.0 | 2.0.0, 2.0.1 |
2.5.0 | 2.5.0, 2.5.1 |
2.6.0 | 2.6.0, 2.6.1 |
2.6.1 | 2.6.0, 2.6.1 |
2.6.2 | 2.6.0, 2.6.1 |
3.0.0 | 3.0.x, 3.1.x |
3.0-SNAPSHOT | nightly |
Nebula Algorithm is open source, you are more than welcomed to contribute in the following ways:
- Discuss in the community via the forum or raise issues here.
- Compose or improve our documents.
- Pull Request to help improve the code itself here.