-
Notifications
You must be signed in to change notification settings - Fork 0
/
lesson_1.Rmd
220 lines (146 loc) · 6.37 KB
/
lesson_1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---
title: "Network analaysis basics with iGraph"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
```
Network analysis allows us to take a qualitative look at a system
Network analysis is the study of interactions within a system. A network, or graph, is comprised of *nodes*, the objects of study, and *edges*, the interactions betwee nodes.
## iGraph basics
We're going to start learning about network theory with the [Zachary's Karate Club](https://en.wikipedia.org/wiki/Zachary%27s_karate_club) dataset:
```{r, load_packages.1}
library(igraph)
library(igraphdata)
library(tidyverse)
data(karate)
theme_set(theme_light())
src("src/utils.R")
```
(Note: you can reproduce Zachary's Karate network with `igraph::make_graph("Zachary")`, but it doesn't have as much metadata as the version that comes with `igraphdata`)
Let's do some basic igraph stuff. Let's see what an `igraph` object looks like:
```{r, karate.1}
karate
```
There's a lot of information here, but let's focus on the `attr` line. This tells us the following things:
* `name (g/c)` is an attribute of the graph itself, stored as characters. The same applies for `Citation` and `Author`.
* `Faction (v/n)` tells us that the vertices have a numeric attribute corresponding to the faction the vertex belongs to. Similarly, vertices have names, labels, and colors.
* `weight (e/n)` tells us that the edges have a number assigned to them according to some weight.
Let's start examining more attributes of the graph. We can get the *order* of the graph, which tells us the number of vertices it has:
```{r, karate.2}
gorder(karate)
```
Similarly, the *size* of the graph tells us how many edges there are:
```{r, karate.3}
gsize(karate)
```
Let's type in We can get a list of the nodes in the graph by doing the following:
```{r, karate.4}
V(karate)
```
We can access individual vertices by index...
```{r, karate.5}
V(karate)[1:5]
```
...or by name.
```{r, karate.6}
V(karate)["Mr Hi"]
```
We can also access multiple vertices at once by name:
```{r, karate.7}
V(karate)[c("Mr Hi", "John A", "Actor 7")]
```
We can see a list of the attributes of each vertex with `vertex.attributes`...
```{r, karate.8a}
vertex.attributes(karate)
```
...and we can access that attribute from a particular node by doing using `$`:
```{r, karate.8}
V(karate)[34]$name
```
We can access edges in a similar way:
```{r, karate.9}
E(karate)
```
And to find edge attributes:
```{r,karate.10}
edge.attributes(karate)
```
We can do a basic visualization:
```{r, karate.11}
plot.igraph(karate,
main = "Zachary's Karate Network")
```
We can have the size of the node reflect its degree:
```{r, karate.13.plot}
plot(karate,
vertex.size = degree(karate) * 1,
main = "Zachary's Karate Club"
)
```
Finally, we can try out other layouts:
```{r, karate.12}
plot(karate,
layout = layout_as_tree(karate),
main = "Zachary's Karate Club")
```
### Exercise 1.1
* What do the nodes represent?
* What do the edges represent? How are they weighted?
* What's the difference between a directed and an undirected graph? What kind of graph is `karate`?
* Enter `karate[]` into your console. What is that?
* Plot `karate` again using `layout_in_circle` and scaling the width of each edge by its weight.
## Graph connectivity
A common question in network analysis is to figure out how the degrees in a graph are distributed. We say two nodes are *connected* if there is an edge between them - that is, if the two objects represented by the nodes interact with each other. We define the *degree* of a node as the number of other nodes that the graph is connected it. We can look at the degrees of each node in the graph using `degree`:
```{r, karate.13}
degree(karate)
```
There are a few ways we can access the highest degree nodes. If we just want the single highest node, we can do:
```{r, karate.14}
which.max(degree(karate))
```
If we wanted, say, the 5 highest nodes, we can do:
```{r, karate.15}
sort(degree(karate)) %>%
tail(5)
```
Suppose we wanted all those nodes with degree higher than 4. We can than do:
```{r, karate.16}
degree(karate)[degree(karate) > 4]
```
`iGraph` has a `degree.distribution` function that returns the proportion of nodes of each degree.
```{r, karate.17}
degree.distribution(karate)
```
We can get a quick visualization of this distribution via `plot`:
```{r, karate.18}
plot(degree.distribution(karate))
```
We can use `plot_dd` from the `utils` file to make it a bit prettier.
```{r, karate.20}
plot_dd(karate)
```
The *neighborhood* of a node is the set of nodes that are connected to it. For instance, to see the neighborhood of `Mr Hi`, we can do:
```{r, karate.21}
neighbors(karate, "Actor 7")
```
### Exercise 1.2
* The general intuition is that those nodes that have a higher degree are more important than those nodes with a lower degree connectivity.
* Refer back to the plot of the degree distribution of the network. There are two clear outliers in terms of degree connectivity. Who are they? Does this make sense?
* Find the neighborhoods of the nodes with the highest degree. Use `intersect` to find the overlap between these two neighborhoods. What did you find? Does this make sense? What are some questions you may have about this overlap?
* For the `karate` network, it is easy to pick out those nodes that have a "high" degree connectivity. However, suppose that your degree distribution looked like the graph below. How would you decide a cutoff for a node to have "high" degree connectivity?
```{r, karate.22, echo = FALSE}
sim_graph <- erdos.renyi.game(70, 1/4)
plot_dd(sim_graph)
```
## Graph clustering
Another common technique in network analysis is *graph clustering*. The aim of *graph clustering* is to find subsets of nodes within a graph that are related. *`iGraph` comes with several clustering algorithms. For example, we can use `cluster_fast_greedy` to cluster the `karate` network:
```{r, karate.23}
karate.cfg <- cluster_fast_greedy(karate)
plot(karate.cfg,
karate)
```
### Exercise 1.3
* How do you interpret the previous plot? Does this make sense given the system of study?
* Try out other clustering algorithms and see if you can get equal or better results as above.
* It's easy to find an algorithm and apply it to your network to try and find some structure. What are some things you should keep in mind when applying clustering algorithms to a nentwork?