forked from ceph/ceph
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcrushtool.8
433 lines (431 loc) · 10.4 KB
/
crushtool.8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
.\" Man page generated from reStructuredText.
.
.TH "CRUSHTOOL" "8" "January 12, 2014" "dev" "Ceph"
.SH NAME
crushtool \- CRUSH map manipulation tool
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNOPSIS
.nf
\fBcrushtool\fP ( \-d \fImap\fP | \-c \fImap.txt\fP | \-\-build \-\-num_osds \fInumosds\fP
\fIlayer1\fP \fI\&...\fP | \-\-test ) [ \-o \fIoutfile\fP ]
.fi
.sp
.SH DESCRIPTION
.INDENT 0.0
.TP
.B \fBcrushtool\fP is a utility that lets you create, compile, decompile
and test CRUSH map files.
.UNINDENT
.sp
CRUSH is a pseudo\-random data distribution algorithm that efficiently
maps input values (typically data objects) across a heterogeneous,
hierarchically structured device map. The algorithm was originally
described in detail in the following paper (although it has evolved
some since then):
.INDENT 0.0
.INDENT 3.5
\fI\%http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf\fP
.UNINDENT
.UNINDENT
.sp
The tool has four modes of operation.
.INDENT 0.0
.TP
.B \-\-compile|\-c map.txt
will compile a plaintext map.txt into a binary map file.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-decompile|\-d map
will take the compiled map and decompile it into a plaintext source
file, suitable for editing.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-build \-\-num_osds {num\-osds} layer1 ...
will create map with the given layer structure. See below for a
detailed explanation.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-test
will perform a dry run of a CRUSH mapping for a range of input
object names. See below for a detailed explanation.
.UNINDENT
.sp
Unlike other Ceph tools, \fBcrushtool\fP does not accept generic options
such as \fB\-\-debug\-crush\fP from the command line. They can however be
provided via the CEPH_ARGS environment variable. For instance, to
silence all output from the CRUSH subsystem:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
CEPH_ARGS="\-\-debug\-crush 0" crushtool ...
.ft P
.fi
.UNINDENT
.UNINDENT
.SH RUNNING TESTS WITH --TEST
.sp
The test mode will use the input crush map ( as specified with \fB\-i
map\fP ) and perform a dry run of CRUSH mapping or random placement (
if \fB\-\-simulate\fP is set ). On completion, two kinds of reports can be
created. The \fB\-\-show\-...\fP options output human readable information
on stderr. The \fB\-\-output\-csv\fP option creates CSV files that are
documented by the \fB\-\-help\-output\fP option.
.INDENT 0.0
.TP
.B \-\-show\-statistics
for each rule display the mapping of each object. For instance:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
CRUSH rule 1 x 24 [11,6]
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
shows that object \fB24\fP is mapped to devices \fB[11,6]\fP by rule
\fB1\fP\&. At the end of the mapping details, a summary of the
distribution is displayed. For instance:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
rule 1 (metadata) num_rep 5 result size == 5: 1024/1024
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
shows that rule \fB1\fP which is named \fBmetadata\fP successfully
mapped \fB1024\fP objects to \fBresult size == 5\fP devices when trying
to map them to \fBnum_rep 5\fP replicas. When it fails to provide the
required mapping, presumably because the number of \fBtries\fP must
be increased, a breakdown of the failures is displays. For instance:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
rule 1 (metadata) num_rep 10 result size == 8: 4/1024
rule 1 (metadata) num_rep 10 result size == 9: 93/1024
rule 1 (metadata) num_rep 10 result size == 10: 927/1024
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
shows that although \fBnum_rep 10\fP replicas were required, \fB4\fP
out of \fB1024\fP objects ( \fB4/1024\fP ) were mapped to \fBresult size
== 8\fP devices only.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-show\-bad\-mappings
display which object failed to be mapped to the required number of
devices. For instance:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
shows that when rule \fB1\fP was required to map \fB7\fP devices, it
could only map six : \fB[8,10,2,11,6,9]\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-show\-utilization
display the expected and actual utilisation for each device, for
each number of replicas. For instance:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
device 0: stored : 951 expected : 853.333
device 1: stored : 963 expected : 853.333
\&...
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
shows that device \fB0\fP stored \fB951\fP objects and was expected to store \fB853\fP\&.
Implies \fB\-\-show\-statistics\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-show\-utilization\-all
displays the same as \fB\-\-show\-utilization\fP but does not suppress
output when the weight of a device is zero.
Implies \fB\-\-show\-statistics\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-show\-choose\-tries
display how many attempts were needed to find a device mapping.
For instance:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
0: 95224
1: 3745
2: 2225
\&..
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
shows that \fB95224\fP mappings succeeded without retries, \fB3745\fP
mappings succeeded with one attempts, etc. There are as many rows
as the value of the \fB\-\-set\-choose\-total\-tries\fP option.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-output\-csv
create CSV files (in the current directory) containing information
documented by \fB\-\-help\-output\fP\&. The files are named after the rule
used when collecting the statistics. For instance, if the rule
metadata is used, the CSV files will be:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
metadata\-absolute_weights.csv
metadata\-device_utilization.csv
\&...
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The first line of the file shortly explains the column layout. For
instance:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
metadata\-absolute_weights.csv
Device ID, Absolute Weight
0,1
\&...
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B \-\-output\-name NAME
prepend \fBNAME\fP to the file names generated when \fB\-\-output\-csv\fP
is specified. For instance \fB\-\-output\-name FOO\fP will create
files:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
FOO\-metadata\-absolute_weights.csv
FOO\-metadata\-device_utilization.csv
\&...
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.sp
The \fB\-\-set\-...\fP options can be used to modify the tunables of the
input crush map. The input crush map is modified in
memory. For example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ crushtool \-i mymap \-\-test \-\-show\-bad\-mappings
bad mapping rule 1 x 781 num_rep 7 result [8,10,2,11,6,9]
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
could be fixed by increasing the \fBchoose\-total\-tries\fP as follows:
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B $ crushtool \-i mymap \-\-test
\-\-show\-bad\-mappings \-\-set\-choose\-total\-tries 500
.UNINDENT
.UNINDENT
.UNINDENT
.SH BUILDING A MAP WITH --BUILD
.sp
The build mode will generate hierarchical maps. The first argument
specifies the number of devices (leaves) in the CRUSH hierarchy. Each
layer describes how the layer (or devices) preceding it should be
grouped.
.sp
Each layer consists of:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
bucket ( uniform | list | tree | straw ) size
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The \fBbucket\fP is the type of the buckets in the layer
(e.g. "rack"). Each bucket name will be built by appending a unique
number to the \fBbucket\fP string (e.g. "rack0", "rack1"...).
.sp
The second component is the type of bucket: \fBstraw\fP should be used
most of the time.
.sp
The third component is the maximum size of the bucket. A size of zero
means a bucket of infinite capacity.
.SH EXAMPLE
.sp
Suppose we have two rows with two racks each and 20 nodes per rack. Suppose
each node contains 4 storage devices for Ceph OSD Daemons. This configuration
allows us to deploy 320 Ceph OSD Daemons. Lets assume a 42U rack with 2U nodes,
leaving an extra 2U for a rack switch.
.sp
To reflect our hierarchy of devices, nodes, racks and rows, we would execute
the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ crushtool \-o crushmap \-\-build \-\-num_osds 320 \e
node straw 4 \e
rack straw 20 \e
row straw 2 \e
root straw 0
# id weight type name reweight
\-87 320 root root
\-85 160 row row0
\-81 80 rack rack0
\-1 4 node node0
0 1 osd.0 1
1 1 osd.1 1
2 1 osd.2 1
3 1 osd.3 1
\-2 4 node node1
4 1 osd.4 1
5 1 osd.5 1
\&...
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
CRUSH rulesets are created so the generated crushmap can be
tested. They are the same rulesets as the one created by default when
creating a new Ceph cluster. They can be further edited with:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
# decompile
crushtool \-d crushmap \-o map.txt
# edit
emacs map.txt
# recompile
crushtool \-c map.txt \-o crushmap
.ft P
.fi
.UNINDENT
.UNINDENT
.SH AVAILABILITY
.sp
\fBcrushtool\fP is part of the Ceph distributed storage system. Please
refer to the Ceph documentation at \fI\%http://ceph.com/docs\fP for more
information.
.SH SEE ALSO
.sp
\fBceph\fP(8),
\fBosdmaptool\fP(8),
.SH AUTHORS
.sp
John Wilkins, Sage Weil, Loic Dachary
.SH COPYRIGHT
2010-2014, Inktank Storage, Inc. and contributors. Licensed under Creative Commons BY-SA
.\" Generated by docutils manpage writer.
.