This repository has been archived by the owner on Feb 17, 2022. It is now read-only.
forked from apache/zookeeper
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathzookeeperProgrammers.html
2347 lines (1894 loc) · 76.1 KB
/
zookeeperProgrammers.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="Apache Forrest" name="Generator">
<meta name="Forrest-version" content="0.9">
<meta name="Forrest-skin-name" content="pelt">
<title>ZooKeeper Programmer's Guide</title>
<link type="text/css" href="skin/basic.css" rel="stylesheet">
<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
<link type="text/css" href="skin/profile.css" rel="stylesheet">
<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
<link rel="shortcut icon" href="images/favicon.ico">
</head>
<body onload="init()">
<script type="text/javascript">ndeSetTextSize();</script>
<div id="top">
<!--+
|breadtrail
+-->
<div class="breadtrail">
<a href="http://www.apache.org/">Apache</a> > <a href="http://zookeeper.apache.org/">ZooKeeper</a> > <a href="http://zookeeper.apache.org/">ZooKeeper</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
</div>
<!--+
|header
+-->
<div class="header">
<!--+
|start group logo
+-->
<div class="grouplogo">
<a href="http://hadoop.apache.org/"><img class="logoImage" alt="Hadoop" src="images/hadoop-logo.jpg" title="Apache Hadoop"></a>
</div>
<!--+
|end group logo
+-->
<!--+
|start Project Logo
+-->
<div class="projectlogo">
<a href="http://zookeeper.apache.org/"><img class="logoImage" alt="ZooKeeper" src="images/zookeeper_small.gif" title="ZooKeeper: distributed coordination"></a>
</div>
<!--+
|end Project Logo
+-->
<!--+
|start Search
+-->
<div class="searchbox">
<form action="http://www.google.com/search" method="get" class="roundtopsmall">
<input value="zookeeper.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">
<input name="Search" value="Search" type="submit">
</form>
</div>
<!--+
|end search
+-->
<!--+
|start Tabs
+-->
<ul id="tabs">
<li>
<a class="unselected" href="http://zookeeper.apache.org/">Project</a>
</li>
<li>
<a class="unselected" href="https://cwiki.apache.org/confluence/display/ZOOKEEPER/">Wiki</a>
</li>
<li class="current">
<a class="selected" href="index.html">ZooKeeper 3.4 Documentation</a>
</li>
</ul>
<!--+
|end Tabs
+-->
</div>
</div>
<div id="main">
<div id="publishedStrip">
<!--+
|start Subtabs
+-->
<div id="level2tabs"></div>
<!--+
|end Endtabs
+-->
<script type="text/javascript"><!--
document.write("Last Published: " + document.lastModified);
// --></script>
</div>
<!--+
|breadtrail
+-->
<div class="breadtrail">
</div>
<!--+
|start Menu, mainarea
+-->
<!--+
|start Menu
+-->
<div id="menu">
<div onclick="SwitchMenu('menu_1.1', 'skin/')" id="menu_1.1Title" class="menutitle">Overview</div>
<div id="menu_1.1" class="menuitemgroup">
<div class="menuitem">
<a href="index.html">Welcome</a>
</div>
<div class="menuitem">
<a href="zookeeperOver.html">Overview</a>
</div>
<div class="menuitem">
<a href="zookeeperStarted.html">Getting Started</a>
</div>
<div class="menuitem">
<a href="releasenotes.html">Release Notes</a>
</div>
</div>
<div onclick="SwitchMenu('menu_selected_1.2', 'skin/')" id="menu_selected_1.2Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Developer</div>
<div id="menu_selected_1.2" class="selectedmenuitemgroup" style="display: block;">
<div class="menuitem">
<a href="api/index.html">API Docs</a>
</div>
<div class="menupage">
<div class="menupagetitle">Programmer's Guide</div>
</div>
<div class="menuitem">
<a href="javaExample.html">Java Example</a>
</div>
<div class="menuitem">
<a href="zookeeperTutorial.html">Barrier and Queue Tutorial</a>
</div>
<div class="menuitem">
<a href="recipes.html">Recipes</a>
</div>
</div>
<div onclick="SwitchMenu('menu_1.3', 'skin/')" id="menu_1.3Title" class="menutitle">Admin & Ops</div>
<div id="menu_1.3" class="menuitemgroup">
<div class="menuitem">
<a href="zookeeperAdmin.html">Administrator's Guide</a>
</div>
<div class="menuitem">
<a href="zookeeperQuotas.html">Quota Guide</a>
</div>
<div class="menuitem">
<a href="zookeeperJMX.html">JMX</a>
</div>
<div class="menuitem">
<a href="zookeeperObservers.html">Observers Guide</a>
</div>
<div class="menuitem">
<a href="zookeeperReconfig.html">Dynamic Reconfiguration</a>
</div>
</div>
<div onclick="SwitchMenu('menu_1.4', 'skin/')" id="menu_1.4Title" class="menutitle">Contributor</div>
<div id="menu_1.4" class="menuitemgroup">
<div class="menuitem">
<a href="zookeeperInternals.html">ZooKeeper Internals</a>
</div>
</div>
<div onclick="SwitchMenu('menu_1.5', 'skin/')" id="menu_1.5Title" class="menutitle">Miscellaneous</div>
<div id="menu_1.5" class="menuitemgroup">
<div class="menuitem">
<a href="https://cwiki.apache.org/confluence/display/ZOOKEEPER">Wiki</a>
</div>
<div class="menuitem">
<a href="https://cwiki.apache.org/confluence/display/ZOOKEEPER/FAQ">FAQ</a>
</div>
<div class="menuitem">
<a href="http://zookeeper.apache.org/mailing_lists.html">Mailing Lists</a>
</div>
</div>
<div id="credit"></div>
<div id="roundbottom">
<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
<!--+
|alternative credits
+-->
<div id="credit2"></div>
</div>
<!--+
|end Menu
+-->
<!--+
|start content
+-->
<div id="content">
<div title="Portable Document Format" class="pdflink">
<a class="dida" href="zookeeperProgrammers.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
PDF</a>
</div>
<h1>ZooKeeper Programmer's Guide</h1>
<h3>Developing Distributed Applications that use ZooKeeper</h3>
<div id="front-matter">
<div id="minitoc-area">
<ul class="minitoc">
<li>
<a href="#_introduction">Introduction</a>
</li>
<li>
<a href="#ch_zkDataModel">The ZooKeeper Data Model</a>
<ul class="minitoc">
<li>
<a href="#sc_zkDataModel_znodes">ZNodes</a>
<ul class="minitoc">
<li>
<a href="#sc_zkDataMode_watches">Watches</a>
</li>
<li>
<a href="#Data+Access">Data Access</a>
</li>
<li>
<a href="#Ephemeral+Nodes">Ephemeral Nodes</a>
</li>
<li>
<a href="#Sequence+Nodes+--+Unique+Naming">Sequence Nodes -- Unique Naming</a>
</li>
</ul>
</li>
<li>
<a href="#sc_timeInZk">Time in ZooKeeper</a>
</li>
<li>
<a href="#sc_zkStatStructure">ZooKeeper Stat Structure</a>
</li>
</ul>
</li>
<li>
<a href="#ch_zkSessions">ZooKeeper Sessions</a>
</li>
<li>
<a href="#ch_zkWatches">ZooKeeper Watches</a>
<ul class="minitoc">
<li>
<a href="#sc_WatchSemantics">Semantics of Watches</a>
</li>
<li>
<a href="#sc_WatchRemoval">Remove Watches</a>
</li>
<li>
<a href="#sc_WatchGuarantees">What ZooKeeper Guarantees about Watches</a>
</li>
<li>
<a href="#sc_WatchRememberThese">Things to Remember about Watches</a>
</li>
</ul>
</li>
<li>
<a href="#sc_ZooKeeperAccessControl">ZooKeeper access control using ACLs</a>
<ul class="minitoc">
<li>
<a href="#sc_ACLPermissions">ACL Permissions</a>
<ul class="minitoc">
<li>
<a href="#sc_BuiltinACLSchemes">Builtin ACL Schemes</a>
</li>
<li>
<a href="#ZooKeeper+C+client+API">ZooKeeper C client API</a>
</li>
</ul>
</li>
</ul>
</li>
<li>
<a href="#sc_ZooKeeperPluggableAuthentication">Pluggable ZooKeeper authentication</a>
</li>
<li>
<a href="#ch_zkGuarantees">Consistency Guarantees</a>
</li>
<li>
<a href="#ch_bindings">Bindings</a>
<ul class="minitoc">
<li>
<a href="#Java+Binding">Java Binding</a>
</li>
<li>
<a href="#C+Binding">C Binding</a>
<ul class="minitoc">
<li>
<a href="#Installation">Installation</a>
</li>
<li>
<a href="#Using+the+C+Client">Using the C Client</a>
</li>
</ul>
</li>
</ul>
</li>
<li>
<a href="#ch_guideToZkOperations">Building Blocks: A Guide to ZooKeeper Operations</a>
<ul class="minitoc">
<li>
<a href="#sc_errorsZk">Handling Errors</a>
</li>
<li>
<a href="#sc_connectingToZk">Connecting to ZooKeeper</a>
</li>
<li>
<a href="#sc_readOps">Read Operations</a>
</li>
<li>
<a href="#sc_writeOps">Write Operations</a>
</li>
<li>
<a href="#sc_handlingWatches">Handling Watches</a>
</li>
<li>
<a href="#sc_miscOps">Miscelleaneous ZooKeeper Operations</a>
</li>
</ul>
</li>
<li>
<a href="#ch_programStructureWithExample">Program Structure, with Simple Example</a>
</li>
<li>
<a href="#ch_gotchas">Gotchas: Common Problems and Troubleshooting</a>
</li>
</ul>
</div>
</div>
<a name="_introduction"></a>
<h2 class="h3">Introduction</h2>
<div class="section">
<p>This document is a guide for developers wishing to create
distributed applications that take advantage of ZooKeeper's coordination
services. It contains conceptual and practical information.</p>
<p>The first four sections of this guide present higher level
discussions of various ZooKeeper concepts. These are necessary both for an
understanding of how ZooKeeper works as well how to work with it. It does
not contain source code, but it does assume a familiarity with the
problems associated with distributed computing. The sections in this first
group are:</p>
<ul>
<li>
<p>
<a href="#ch_zkDataModel">The ZooKeeper Data Model</a>
</p>
</li>
<li>
<p>
<a href="#ch_zkSessions">ZooKeeper Sessions</a>
</p>
</li>
<li>
<p>
<a href="#ch_zkWatches">ZooKeeper Watches</a>
</p>
</li>
<li>
<p>
<a href="#ch_zkGuarantees">Consistency Guarantees</a>
</p>
</li>
</ul>
<p>The next four sections provide practical programming
information. These are:</p>
<ul>
<li>
<p>
<a href="#ch_guideToZkOperations">Building Blocks: A Guide to ZooKeeper Operations</a>
</p>
</li>
<li>
<p>
<a href="#ch_bindings">Bindings</a>
</p>
</li>
<li>
<p>
<a href="#ch_programStructureWithExample">Program Structure, with Simple Example</a>
<em>[tbd]</em>
</p>
</li>
<li>
<p>
<a href="#ch_gotchas">Gotchas: Common Problems and Troubleshooting</a>
</p>
</li>
</ul>
<p>The book concludes with an <a href="#apx_linksToOtherInfo">appendix</a> containing links to other
useful, ZooKeeper-related information.</p>
<p>Most of information in this document is written to be accessible as
stand-alone reference material. However, before starting your first
ZooKeeper application, you should probably at least read the chaptes on
the <a href="#ch_zkDataModel">ZooKeeper Data Model</a> and <a href="#ch_guideToZkOperations">ZooKeeper Basic Operations</a>. Also,
the <a href="#ch_programStructureWithExample">Simple Programmming
Example</a> <em>[tbd]</em> is helpful for understanding the basic
structure of a ZooKeeper client application.</p>
</div>
<a name="ch_zkDataModel"></a>
<h2 class="h3">The ZooKeeper Data Model</h2>
<div class="section">
<p>ZooKeeper has a hierarchal name space, much like a distributed file
system. The only difference is that each node in the namespace can have
data associated with it as well as children. It is like having a file
system that allows a file to also be a directory. Paths to nodes are
always expressed as canonical, absolute, slash-separated paths; there are
no relative reference. Any unicode character can be used in a path subject
to the following constraints:</p>
<ul>
<li>
<p>The null character (\u0000) cannot be part of a path name. (This
causes problems with the C binding.)</p>
</li>
<li>
<p>The following characters can't be used because they don't
display well, or render in confusing ways: \u0001 - \u001F and \u007F
- \u009F.</p>
</li>
<li>
<p>The following characters are not allowed: \ud800 - uF8FF,
\uFFF0 - uFFFF.</p>
</li>
<li>
<p>The "." character can be used as part of another name, but "."
and ".." cannot alone be used to indicate a node along a path,
because ZooKeeper doesn't use relative paths. The following would be
invalid: "/a/b/./c" or "/a/b/../c".</p>
</li>
<li>
<p>The token "zookeeper" is reserved.</p>
</li>
</ul>
<a name="sc_zkDataModel_znodes"></a>
<h3 class="h4">ZNodes</h3>
<p>Every node in a ZooKeeper tree is referred to as a
<em>znode</em>. Znodes maintain a stat structure that
includes version numbers for data changes, acl changes. The stat
structure also has timestamps. The version number, together with the
timestamp, allows ZooKeeper to validate the cache and to coordinate
updates. Each time a znode's data changes, the version number increases.
For instance, whenever a client retrieves data, it also receives the
version of the data. And when a client performs an update or a delete,
it must supply the version of the data of the znode it is changing. If
the version it supplies doesn't match the actual version of the data,
the update will fail. (This behavior can be overridden. For more
information see... )<em>[tbd...]</em>
</p>
<div class="note">
<div class="label">Note</div>
<div class="content">
<p>In distributed application engineering, the word
<em>node</em> can refer to a generic host machine, a
server, a member of an ensemble, a client process, etc. In the ZooKeeper
documentation, <em>znodes</em> refer to the data nodes.
<em>Servers</em> refer to machines that make up the
ZooKeeper service; <em>quorum peers</em> refer to the
servers that make up an ensemble; client refers to any host or process
which uses a ZooKeeper service.</p>
</div>
</div>
<p>Znodes are the main enitity that a programmer access. They have
several characteristics that are worth mentioning here.</p>
<a name="sc_zkDataMode_watches"></a>
<h4>Watches</h4>
<p>Clients can set watches on znodes. Changes to that znode trigger
the watch and then clear the watch. When a watch triggers, ZooKeeper
sends the client a notification. More information about watches can be
found in the section
<a href="#ch_zkWatches">ZooKeeper Watches</a>.</p>
<a name="Data+Access"></a>
<h4>Data Access</h4>
<p>The data stored at each znode in a namespace is read and written
atomically. Reads get all the data bytes associated with a znode and a
write replaces all the data. Each node has an Access Control List
(ACL) that restricts who can do what.</p>
<p>ZooKeeper was not designed to be a general database or large
object store. Instead, it manages coordination data. This data can
come in the form of configuration, status information, rendezvous, etc.
A common property of the various forms of coordination data is that
they are relatively small: measured in kilobytes.
The ZooKeeper client and the server implementations have sanity checks
to ensure that znodes have less than 1M of data, but the data should
be much less than that on average. Operating on relatively large data
sizes will cause some operations to take much more time than others and
will affect the latencies of some operations because of the extra time
needed to move more data over the network and onto storage media. If
large data storage is needed, the usually pattern of dealing with such
data is to store it on a bulk storage system, such as NFS or HDFS, and
store pointers to the storage locations in ZooKeeper.</p>
<a name="Ephemeral+Nodes"></a>
<h4>Ephemeral Nodes</h4>
<p>ZooKeeper also has the notion of ephemeral nodes. These znodes
exists as long as the session that created the znode is active. When
the session ends the znode is deleted. Because of this behavior
ephemeral znodes are not allowed to have children.</p>
<a name="Sequence+Nodes+--+Unique+Naming"></a>
<h4>Sequence Nodes -- Unique Naming</h4>
<p>When creating a znode you can also request that
ZooKeeper append a monotonically increasing counter to the end
of path. This counter is unique to the parent znode. The
counter has a format of %010d -- that is 10 digits with 0
(zero) padding (the counter is formatted in this way to
simplify sorting), i.e. "<path>0000000001". See
<a href="recipes.html#sc_recipes_Queues">Queue
Recipe</a> for an example use of this feature. Note: the
counter used to store the next sequence number is a signed int
(4bytes) maintained by the parent node, the counter will
overflow when incremented beyond 2147483647 (resulting in a
name "<path>-2147483647").</p>
<a name="sc_timeInZk"></a>
<h3 class="h4">Time in ZooKeeper</h3>
<p>ZooKeeper tracks time multiple ways:</p>
<ul>
<li>
<p>
<strong>Zxid</strong>
</p>
<p>Every change to the ZooKeeper state receives a stamp in the
form of a <em>zxid</em> (ZooKeeper Transaction Id).
This exposes the total ordering of all changes to ZooKeeper. Each
change will have a unique zxid and if zxid1 is smaller than zxid2
then zxid1 happened before zxid2.</p>
</li>
<li>
<p>
<strong>Version numbers</strong>
</p>
<p>Every change to a node will cause an increase to one of the
version numbers of that node. The three version numbers are version
(number of changes to the data of a znode), cversion (number of
changes to the children of a znode), and aversion (number of changes
to the ACL of a znode).</p>
</li>
<li>
<p>
<strong>Ticks</strong>
</p>
<p>When using multi-server ZooKeeper, servers use ticks to define
timing of events such as status uploads, session timeouts,
connection timeouts between peers, etc. The tick time is only
indirectly exposed through the minimum session timeout (2 times the
tick time); if a client requests a session timeout less than the
minimum session timeout, the server will tell the client that the
session timeout is actually the minimum session timeout.</p>
</li>
<li>
<p>
<strong>Real time</strong>
</p>
<p>ZooKeeper doesn't use real time, or clock time, at all except
to put timestamps into the stat structure on znode creation and
znode modification.</p>
</li>
</ul>
<a name="sc_zkStatStructure"></a>
<h3 class="h4">ZooKeeper Stat Structure</h3>
<p>The Stat structure for each znode in ZooKeeper is made up of the
following fields:</p>
<ul>
<li>
<p>
<strong>czxid</strong>
</p>
<p>The zxid of the change that caused this znode to be
created.</p>
</li>
<li>
<p>
<strong>mzxid</strong>
</p>
<p>The zxid of the change that last modified this znode.</p>
</li>
<li>
<p>
<strong>ctime</strong>
</p>
<p>The time in milliseconds from epoch when this znode was
created.</p>
</li>
<li>
<p>
<strong>mtime</strong>
</p>
<p>The time in milliseconds from epoch when this znode was last
modified.</p>
</li>
<li>
<p>
<strong>version</strong>
</p>
<p>The number of changes to the data of this znode.</p>
</li>
<li>
<p>
<strong>cversion</strong>
</p>
<p>The number of changes to the children of this znode.</p>
</li>
<li>
<p>
<strong>aversion</strong>
</p>
<p>The number of changes to the ACL of this znode.</p>
</li>
<li>
<p>
<strong>ephemeralOwner</strong>
</p>
<p>The session id of the owner of this znode if the znode is an
ephemeral node. If it is not an ephemeral node, it will be
zero.</p>
</li>
<li>
<p>
<strong>dataLength</strong>
</p>
<p>The length of the data field of this znode.</p>
</li>
<li>
<p>
<strong>numChildren</strong>
</p>
<p>The number of children of this znode.</p>
</li>
</ul>
</div>
<a name="ch_zkSessions"></a>
<h2 class="h3">ZooKeeper Sessions</h2>
<div class="section">
<p>A ZooKeeper client establishes a session with the ZooKeeper
service by creating a handle to the service using a language
binding. Once created, the handle starts of in the CONNECTING state
and the client library tries to connect to one of the servers that
make up the ZooKeeper service at which point it switches to the
CONNECTED state. During normal operation will be in one of these
two states. If an unrecoverable error occurs, such as session
expiration or authentication failure, or if the application explicitly
closes the handle, the handle will move to the CLOSED state.
The following figure shows the possible state transitions of a
ZooKeeper client:</p>
<img alt="" src="images/state_dia.jpg"><p>To create a client session the application code must provide
a connection string containing a comma separated list of host:port pairs,
each corresponding to a ZooKeeper server (e.g. "127.0.0.1:4545" or
"127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002"). The ZooKeeper
client library will pick an arbitrary server and try to connect to
it. If this connection fails, or if the client becomes
disconnected from the server for any reason, the client will
automatically try the next server in the list, until a connection
is (re-)established.</p>
<p>
<strong>Added in 3.2.0</strong>: An
optional "chroot" suffix may also be appended to the connection
string. This will run the client commands while interpreting all
paths relative to this root (similar to the unix chroot
command). If used the example would look like:
"127.0.0.1:4545/app/a" or
"127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002/app/a" where the
client would be rooted at "/app/a" and all paths would be relative
to this root - ie getting/setting/etc... "/foo/bar" would result
in operations being run on "/app/a/foo/bar" (from the server
perspective). This feature is particularly useful in multi-tenant
environments where each user of a particular ZooKeeper service
could be rooted differently. This makes re-use much simpler as
each user can code his/her application as if it were rooted at
"/", while actual location (say /app/a) could be determined at
deployment time.</p>
<p>When a client gets a handle to the ZooKeeper service,
ZooKeeper creates a ZooKeeper session, represented as a 64-bit
number, that it assigns to the client. If the client connects to a
different ZooKeeper server, it will send the session id as a part
of the connection handshake. As a security measure, the server
creates a password for the session id that any ZooKeeper server
can validate.The password is sent to the client with the session
id when the client establishes the session. The client sends this
password with the session id whenever it reestablishes the session
with a new server.</p>
<p>One of the parameters to the ZooKeeper client library call
to create a ZooKeeper session is the session timeout in
milliseconds. The client sends a requested timeout, the server
responds with the timeout that it can give the client. The current
implementation requires that the timeout be a minimum of 2 times
the tickTime (as set in the server configuration) and a maximum of
20 times the tickTime. The ZooKeeper client API allows access to
the negotiated timeout.</p>
<p>When a client (session) becomes partitioned from the ZK
serving cluster it will begin searching the list of servers that
were specified during session creation. Eventually, when
connectivity between the client and at least one of the servers is
re-established, the session will either again transition to the
"connected" state (if reconnected within the session timeout
value) or it will transition to the "expired" state (if
reconnected after the session timeout). It is not advisable to
create a new session object (a new ZooKeeper.class or zookeeper
handle in the c binding) for disconnection. The ZK client library
will handle reconnect for you. In particular we have heuristics
built into the client library to handle things like "herd effect",
etc... Only create a new session when you are notified of session
expiration (mandatory).</p>
<p>Session expiration is managed by the ZooKeeper cluster
itself, not by the client. When the ZK client establishes a
session with the cluster it provides a "timeout" value detailed
above. This value is used by the cluster to determine when the
client's session expires. Expirations happens when the cluster
does not hear from the client within the specified session timeout
period (i.e. no heartbeat). At session expiration the cluster will
delete any/all ephemeral nodes owned by that session and
immediately notify any/all connected clients of the change (anyone
watching those znodes). At this point the client of the expired
session is still disconnected from the cluster, it will not be
notified of the session expiration until/unless it is able to
re-establish a connection to the cluster. The client will stay in
disconnected state until the TCP connection is re-established with
the cluster, at which point the watcher of the expired session
will receive the "session expired" notification.</p>
<p>Example state transitions for an expired session as seen by
the expired session's watcher:</p>
<ol>
<li>
<p>'connected' : session is established and client
is communicating with cluster (client/server communication is
operating properly)</p>
</li>
<li>
<p>.... client is partitioned from the
cluster</p>
</li>
<li>
<p>'disconnected' : client has lost connectivity
with the cluster</p>
</li>
<li>
<p>.... time elapses, after 'timeout' period the
cluster expires the session, nothing is seen by client as it is
disconnected from cluster</p>
</li>
<li>
<p>.... time elapses, the client regains network
level connectivity with the cluster</p>
</li>
<li>
<p>'expired' : eventually the client reconnects to
the cluster, it is then notified of the
expiration</p>
</li>
</ol>
<p>Another parameter to the ZooKeeper session establishment
call is the default watcher. Watchers are notified when any state
change occurs in the client. For example if the client loses
connectivity to the server the client will be notified, or if the
client's session expires, etc... This watcher should consider the
initial state to be disconnected (i.e. before any state changes
events are sent to the watcher by the client lib). In the case of
a new connection, the first event sent to the watcher is typically
the session connection event.</p>
<p>The session is kept alive by requests sent by the client. If
the session is idle for a period of time that would timeout the
session, the client will send a PING request to keep the session
alive. This PING request not only allows the ZooKeeper server to
know that the client is still active, but it also allows the
client to verify that its connection to the ZooKeeper server is
still active. The timing of the PING is conservative enough to
ensure reasonable time to detect a dead connection and reconnect
to a new server.</p>
<p>
Once a connection to the server is successfully established
(connected) there are basically two cases where the client lib generates
connectionloss (the result code in c binding, exception in Java -- see
the API documentation for binding specific details) when either a synchronous or
asynchronous operation is performed and one of the following holds:
</p>
<ol>
<li>
<p>The application calls an operation on a session that is no
longer alive/valid</p>
</li>
<li>
<p>The ZooKeeper client disconnects from a server when there
are pending operations to that server, i.e., there is a pending asynchronous call.
</p>
</li>
</ol>
<p>
<strong>Added in 3.2.0 -- SessionMovedException</strong>. There is an internal
exception that is generally not seen by clients called the SessionMovedException.
This exception occurs because a request was received on a connection for a session
which has been reestablished on a different server. The normal cause of this error is
a client that sends a request to a server, but the network packet gets delayed, so
the client times out and connects to a new server. When the delayed packet arrives at
the first server, the old server detects that the session has moved, and closes the
client connection. Clients normally do not see this error since they do not read
from those old connections. (Old connections are usually closed.) One situation in which this
condition can be seen is when two clients try to reestablish the same connection using
a saved session id and password. One of the clients will reestablish the connection
and the second client will be disconnected (causing the pair to attempt to re-establish
its connection/session indefinitely).</p>
<p>
<strong>Updating the list of servers</strong>. We allow a client to
update the connection string by providing a new comma separated list of host:port pairs,
each corresponding to a ZooKeeper server. The function invokes a probabilistic load-balancing
algorithm which may cause the client to disconnect from its current host with the goal
to achieve expected uniform number of connections per server in the new list.
In case the current host to which the client is connected is not in the new list
this call will always cause the connection to be dropped. Otherwise, the decision
is based on whether the number of servers has increased or decreased and by how much.
</p>
<p>
For example, if the previous connection string contained 3 hosts and now the list contains
these 3 hosts and 2 more hosts, 40% of clients connected to each of the 3 hosts will
move to one of the new hosts in order to balance the load. The algorithm will cause the client
to drop its connection to the current host to which it is connected with probability 0.4 and in this
case cause the client to connect to one of the 2 new hosts, chosen at random.
</p>
<p>
Another example -- suppose we have 5 hosts and now update the list to remove 2 of the hosts,
the clients connected to the 3 remaining hosts will stay connected, whereas all clients connected
to the 2 removed hosts will need to move to one of the 3 hosts, chosen at random. If the connection
is dropped, the client moves to a special mode where he chooses a new server to connect to using the
probabilistic algorithm, and not just round robin.
</p>
<p>
In the first example, each client decides to disconnect with probability 0.4 but once the decision is
made, it will try to connect to a random new server and only if it cannot connect to any of the new
servers will it try to connect to the old ones. After finding a server, or trying all servers in the
new list and failing to connect, the client moves back to the normal mode of operation where it picks
an arbitrary server from the connectString and attempt to connect to it. If that fails, is will continue
trying different random servers in round robin. (see above the algorithm used to initially choose a server)
</p>
</div>
<a name="ch_zkWatches"></a>
<h2 class="h3">ZooKeeper Watches</h2>
<div class="section">
<p>All of the read operations in ZooKeeper - <strong>getData()</strong>, <strong>getChildren()</strong>, and <strong>exists()</strong> - have the option of setting a watch as a
side effect. Here is ZooKeeper's definition of a watch: a watch event is
one-time trigger, sent to the client that set the watch, which occurs when
the data for which the watch was set changes. There are three key points
to consider in this definition of a watch:</p>
<ul>
<li>
<p>
<strong>One-time trigger</strong>
</p>
<p>One watch event will be sent to the client when the data has changed.