-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME
216 lines (153 loc) · 7.75 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
The files dht.c and dht.h implement the variant of the Kademlia Distributed
Hash Table (DHT) used in the Bittorrent network (``mainline'' variant).
The dht.c supports Elliptic Curve Cryptography within the DHT protocol.
The file dht-example.c is a stand-alone program that participates in the
DHT. Another example is a patch against Transmission, which you might or
might not be able to find somewhere.
The code is designed to work well in both event-driven and threaded code.
The caller, which is either an event-loop or a dedicated thread, must
periodically call the function dht_periodic. In addition, it must call
dht_periodic whenever any data has arrived from the network.
All functions return -1 in case of failure, in which case errno is set, or
a positive value in case of success.
Trying dht-example.c
********************
$ apt-get install build-essential libssl-dev # debian/ubuntu, ymmv
$ make
$ ./dht-example -4 6668
and on a secondary shell
$ ./dht-example -4 6669 8.9.12.3 6668 # replace 8.9.12.3 with your ip
Running the dht in-the-wild:
$ ./dht-example 6668 router.bittorrent.com 6881
This will bootstrap the node against the world-wide dht network and is
especially usefull to test interoperability.
Initialisation
**************
* dht_generate_key
This method gernerates a key / node ID that can be used to encrypt dht
traffic. Can be used without initialising the library.
* dht_init
This must be called before using the library. You pass it a bound IPv4
datagram socket, a bound IPv6 datagram socket, and your node key (generated
by dht_generate_key).
If you're on a multi-homed host, you should bind the sockets to one of your
addresses.
* dht_uninit
This may be called at the end of the session.
Bootstrapping
*************
The DHT needs to be taught a small number of contacts to begin functioning.
You can hard-wire a small number of stable nodes in your application, but
this obviously fails to scale. You may save the list of known good nodes
at shutdown, and restore it at restart. You may also grab nodes from
torrent files (the nodes field), and you may exchange contacts with other
Bittorrent peers using the PORT extension.
* dht_ping
This is the main bootstrapping primitive. You pass it an address at which
you believe that a DHT node may be living, and a query will be sent. If
a node replies, and if there is space in the routing table, it will be
inserted.
* dht_insert_node
This is a softer bootstrapping method, which doesn't actually send
a query -- it only stores the node in the routing table for later use. It
is a good idea to use that when e.g. restoring your routing table from
disk.
Note that dht_insert_node requires that you supply a node id. If the id
turns out to be wrong, the DHT will eventually recover; still, inserting
massive amounts of incorrect information into your routing table is
certainly not a good idea.
An additionaly difficulty with dht_insert_node is that, for various
reasons, a Kademlia routing table cannot absorb nodes faster than a certain
rate. Dumping a large number of nodes into a table using dht_insert_node
will probably cause most of these nodes to be discarded straight away.
(The tolerable rate is difficult to estimate; it is probably on the order
of one node every few seconds per node already in the table divided by 8,
for some suitable value of 8.)
Doing some work
***************
* dht_periodic
This function should be called by your main loop periodically, and also
whenever data is available on the socket. The time after which
dht_periodic should be called if no data is available is returned in the
parameter tosleep. (You do not need to be particularly accurate; actually,
it is a good idea to be late by a random value.)
The parameters buf, buflen, from and fromlen optionally carry a received
message. If buflen is 0, then no message was received.
Dht_periodic also takes a callback, which will be called whenever something
interesting happens (see below).
* dht_search
This schedules a search for information about the info-hash specified in
id. If port is not 0, it specifies the TCP port on which the current peer
is listening; in that case, when the search is complete it will be announced
to the network. The port is in host order, beware if you got it from
a struct sockaddr_in.
In either case, data is passed to the callback function as soon as it is
available, possibly in multiple pieces. The callback function will
additionally be called when the search is complete.
Up to DHT_MAX_SEARCHES (1024) searches can be in progress at a given time;
any more, and dht_search will return -1. If you specify a new search for
the same info hash as a search still in progress, the previous search is
combined with the new one -- you will only receive a completion indication
once.
Information queries
*******************
* dht_nodes
This returns the number of known good, dubious and cached nodes in our
routing table. This can be used to decide whether it's reasonable to start
a search; a search is likely to be successful as long as we have a few good
nodes; however, in order to avoid overloading your bootstrap nodes, you may
want to wait until good is at least 4 and good + doubtful is at least 30 or
so.
It also includes the number of nodes that recently send us an unsolicited
request; this can be used to determine if the UDP port used for the DHT is
firewalled.
If you want to display a single figure to the user, you should display
good + doubtful, which is the total number of nodes in your routing table.
Some clients try to estimate the total number of nodes, but this doesn't
make much sense -- since the result is exponential in the number of nodes
in the routing table, small variations in the latter cause huge jumps in
the former.
* dht_get_nodes
This retrieves the list of known good nodes, starting with the nodes in our
own bucket. It is a good idea to save the list of known good nodes at
shutdown, and ping them at startup.
* dht_dump_tables
* dht_debug
These are debugging aids.
Functions provided by you
*************************
* The callback function
The callback function is called with 5 arguments. Closure is simply the
value that you passed to dht_periodic. Event is one of DHT_EVENT_VALUES or
DHT_EVENT_VALUES6, which indicates that we have new values, or
DHT_EVENT_SEARCH_DONE or DHT_EVENT_SEARCH_DONE6, which indicates that
a search has completed. In either case, info_hash is set to the info-hash
of the search.
In the case of DHT_EVENT_VALUES, data is a list of nodes in ``compact''
format -- 6 or 18 bytes per node. Its length in bytes is in data_len.
* dht_blacklisted
This is a function that takes an IP address and returns true if this
address should be silently ignored. Do not use this feature unless you
really must -- Kademlia supposes transitive reachability.
* dht_hash
This should compute a reasonably strong cryptographic hash of the passed
values. It should map cleanly to your favourite crypto toolkit's MD5 or
SHA-1 function.
* dht_random_bytes
This should fill the supplied buffer with true random bytes.
Final notes
***********
* NAT
Nothing works well across NATs, but Kademlia is somewhat less impacted than
many other protocols. The implementation takes care to distinguish between
unidirectional and bidirectional reachability, and NATed nodes will
eventually fall out from other nodes' routing tables.
While there is no periodic pinging in this implementation, maintaining
a full routing table requires slightly more than one packet exchange per
minute, even in a completely idle network; this should be sufficient to
make most full cone NATs happy.
* Missing functionality
Some of the code has had very little testing. If it breaks, you get to
keep both pieces.
Juliusz Chroboczek