forked from thinkaurelius/titan
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgenerating.txt
44 lines (35 loc) · 2.07 KB
/
generating.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[[synthetic-graphs]]
Generating Artificial Natural Graphs
------------------------------------
[.tss-floatleft.tss-width-125]
image:splash-graph.png[]
Real-world graphs are not http://en.wikipedia.org/wiki/Random_graph[random]. Example real-world graphs include social graphs, word graphs, neural graphs, airline graphs, water flow graphs, etc. Interestingly enough, there is a simple statistical understanding of most natural graphs. In short, many vertices have few connections and very few vertices have many connections. A popular algorithm to generate a graph that has this connectivity pattern is known as the http://en.wikipedia.org/wiki/Preferential_attachment[preferential attachment] algorithm which can be simply described with the colloquial phrase: "the rich get richer." This section provides some simple code to artificially generate a natural looking graph in Titan using http://tinkerpop.incubator.apache.org/[Gremlin].
Generating a Graph with Natural Statistics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The first thing to do is to connect to a Cassandra cluster with Titan. In the example below, a local connection is used where `storage.batch-loading` ensures more speedy performance.
[source, gremlin]
conf = new BaseConfiguration()
conf.setProperty('storage.backend', 'cassandra')
conf.setProperty('storage.hostname', '127.0.0.1')
conf.setProperty('storage.batch-loading', 'true')
graph = TitanFactory.open(conf)
g = graph.traversal()
Next, the following script generates a graph with `size` number of edges.
[source, gremlin]
mgmt = graph.openManagement()
follow = mgmt.makeEdgeLabel('linked').multiplicity(MULTI).make()
mgmt.commit()
size = 100000; ids = [g.addVertex().id()]; rand = new Random()
(1..size).each {
v = graph.addVertex()
u = g.V(ids.get(rand.nextInt(ids.size()))).next()
v.addEdge('linked', u)
ids.add(u.id())
ids.add(v.id())
if (it % 1000 == 0)
graph.tx().commit()
}
Computing the In-Degree Distribution of the Graph
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[source, gremlin]
g.V().local(inE().count()).groupCount().order(local).by(valueDecr)