forked from apache/kafka-site
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathquickstart.html
388 lines (306 loc) · 17.8 KB
/
quickstart.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script>
<!--#include virtual="js/templateData.js" -->
</script>
<script id="quickstart-template" type="text/x-handlebars-template">
<div class="quickstart-step">
<h4 class="anchor-heading">
<a class="anchor-link" id="quickstart_download" href="#quickstart_download"></a>
<a href="#quickstart_download">Step 1: Get Kafka</a>
</h4>
<p>
<a href="https://www.apache.org/dyn/closer.cgi?path=/kafka/{{fullDotVersion}}/kafka_{{scalaVersion}}-{{fullDotVersion}}.tgz">Download</a>
the latest Kafka release and extract it:
</p>
<pre class="line-numbers"><code class="language-bash">$ tar -xzf kafka_{{scalaVersion}}-{{fullDotVersion}}.tgz
$ cd kafka_{{scalaVersion}}-{{fullDotVersion}}</code></pre>
</div>
<div class="quickstart-step">
<h4 class="anchor-heading">
<a class="anchor-link" id="quickstart_startserver" href="#quickstart_startserver"></a>
<a href="#quickstart_startserver">Step 2: Start the Kafka environment</a>
</h4>
<p class="note">
NOTE: Your local environment must have Java 8+ installed.
</p>
<p>
Apache Kafka can be started using ZooKeeper or KRaft. To get started with either configuration follow one the sections below but not both.
</p>
<h5>
Kafka with ZooKeeper
</h5>
<p>
Run the following commands in order to start all services in the correct order:
</p>
<pre class="line-numbers"><code class="language-bash"># Start the ZooKeeper service
$ bin/zookeeper-server-start.sh config/zookeeper.properties</code></pre>
<p>
Open another terminal session and run:
</p>
<pre class="line-numbers"><code class="language-bash"># Start the Kafka broker service
$ bin/kafka-server-start.sh config/server.properties</code></pre>
<p>
Once all services have successfully launched, you will have a basic Kafka environment running and ready to use.
</p>
<h5>
Kafka with KRaft
</h5>
<p>
Generate a Cluster UUID
</p>
<pre class="line-numbers"><code class="language-bash">$ KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"</code></pre>
<p>
Format Log Directories
</p>
<pre class="line-numbers"><code class="language-bash">$ bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties</code></pre>
<p>
Start the Kafka Server
</p>
<pre class="line-numbers"><code class="language-bash">$ bin/kafka-server-start.sh config/kraft/server.properties</code></pre>
<p>
Once the Kafka server has successfully launched, you will have a basic Kafka environment running and ready to use.
</p>
</div>
<div class="quickstart-step">
<h4 class="anchor-heading">
<a class="anchor-link" id="quickstart_createtopic" href="#quickstart_createtopic"></a>
<a href="#quickstart_createtopic">Step 3: Create a topic to store your events</a>
</h4>
<p>
Kafka is a distributed <em>event streaming platform</em> that lets you read, write, store, and process
<a href="/documentation/#messages"><em>events</em></a> (also called <em>records</em> or
<em>messages</em> in the documentation)
across many machines.
</p>
<p>
Example events are payment transactions, geolocation updates from mobile phones, shipping orders, sensor measurements
from IoT devices or medical equipment, and much more. These events are organized and stored in
<a href="/documentation/#intro_concepts_and_terms"><em>topics</em></a>.
Very simplified, a topic is similar to a folder in a filesystem, and the events are the files in that folder.
</p>
<p>
So before you can write your first events, you must create a topic. Open another terminal session and run:
</p>
<pre class="line-numbers"><code class="language-bash">$ bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092</code></pre>
<p>
All of Kafka's command line tools have additional options: run the <code>kafka-topics.sh</code> command without any
arguments to display usage information. For example, it can also show you
<a href="/documentation/#intro_concepts_and_terms">details such as the partition count</a>
of the new topic:
</p>
<pre class="line-numbers"><code class="language-bash">$ bin/kafka-topics.sh --describe --topic quickstart-events --bootstrap-server localhost:9092
Topic: quickstart-events TopicId: NPmZHyhbR9y00wMglMH2sg PartitionCount: 1 ReplicationFactor: 1 Configs:
Topic: quickstart-events Partition: 0 Leader: 0 Replicas: 0 Isr: 0</code></pre>
</div>
<div class="quickstart-step">
<h4 class="anchor-heading">
<a class="anchor-link" id="quickstart_send" href="#quickstart_send"></a>
<a href="#quickstart_send">Step 4: Write some events into the topic</a>
</h4>
<p>
A Kafka client communicates with the Kafka brokers via the network for writing (or reading) events.
Once received, the brokers will store the events in a durable and fault-tolerant manner for as long as you
need—even forever.
</p>
<p>
Run the console producer client to write a few events into your topic.
By default, each line you enter will result in a separate event being written to the topic.
</p>
<pre class="line-numbers"><code class="language-bash">$ bin/kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092
This is my first event
This is my second event</code></pre>
<p>
You can stop the producer client with <code>Ctrl-C</code> at any time.
</p>
</div>
<div class="quickstart-step">
<h4 class="anchor-heading">
<a class="anchor-link" id="quickstart_consume" href="#quickstart_consume"></a>
<a href="#quickstart_consume">Step 5: Read the events</a>
</h4>
<p>Open another terminal session and run the console consumer client to read the events you just created:</p>
<pre class="line-numbers"><code class="language-bash">$ bin/kafka-console-consumer.sh --topic quickstart-events --from-beginning --bootstrap-server localhost:9092
This is my first event
This is my second event</code></pre>
<p>You can stop the consumer client with <code>Ctrl-C</code> at any time.</p>
<p>Feel free to experiment: for example, switch back to your producer terminal (previous step) to write
additional events, and see how the events immediately show up in your consumer terminal.</p>
<p>Because events are durably stored in Kafka, they can be read as many times and by as many consumers as you want.
You can easily verify this by opening yet another terminal session and re-running the previous command again.</p>
</div>
<div class="quickstart-step">
<h4 class="anchor-heading">
<a class="anchor-link" id="quickstart_kafkaconnect" href="#quickstart_kafkaconnect"></a>
<a href="#quickstart_kafkaconnect">Step 6: Import/export your data as streams of events with Kafka Connect</a>
</h4>
<p>
You probably have lots of data in existing systems like relational databases or traditional messaging systems,
along with many applications that already use these systems.
<a href="/documentation/#connect">Kafka Connect</a> allows you to continuously ingest
data from external systems into Kafka, and vice versa. It is an extensible tool that runs
<i>connectors</i>, which implement the custom logic for interacting with an external system.
It is thus very easy to integrate existing systems with Kafka. To make this process even easier,
there are hundreds of such connectors readily available.
</p>
<p>
In this quickstart we'll see how to run Kafka Connect with simple connectors that import data
from a file to a Kafka topic and export data from a Kafka topic to a file.
</p>
<p>
First, make sure to add <code class="language-bash">connect-file-{{fullDotVersion}}.jar</code> to the <code>plugin.path</code> property in the Connect worker's configuration.
For the purpose of this quickstart we'll use a relative path and consider the connectors' package as an uber jar, which works when the quickstart commands are run from the installation directory.
However, it's worth noting that for production deployments using absolute paths is always preferable. See <a href="/documentation/#connectconfigs_plugin.path">plugin.path</a> for a detailed description of how to set this config.
</p>
<p>
Edit the <code class="language-bash">config/connect-standalone.properties</code> file, add or change the <code>plugin.path</code> configuration property match the following, and save the file:
</p>
<pre class="brush: bash;">
> echo "plugin.path=libs/connect-file-{{fullDotVersion}}.jar"</pre>
<p>
Then, start by creating some seed data to test with:
</p>
<pre class="brush: bash;">
> echo -e "foo\nbar" > test.txt</pre>
Or on Windows:
<pre class="brush: bash;">
> echo foo> test.txt
> echo bar>> test.txt</pre>
<p>
Next, we'll start two connectors running in <i>standalone</i> mode, which means they run in a single, local, dedicated
process. We provide three configuration files as parameters. The first is always the configuration for the Kafka Connect
process, containing common configuration such as the Kafka brokers to connect to and the serialization format for data.
The remaining configuration files each specify a connector to create. These files include a unique connector name, the connector
class to instantiate, and any other configuration required by the connector.
</p>
<pre class="brush: bash;">
> bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties</pre>
<p>
These sample configuration files, included with Kafka, use the default local cluster configuration you started earlier
and create two connectors: the first is a source connector that reads lines from an input file and produces each to a Kafka topic
and the second is a sink connector that reads messages from a Kafka topic and produces each as a line in an output file.
</p>
<p>
During startup you'll see a number of log messages, including some indicating that the connectors are being instantiated.
Once the Kafka Connect process has started, the source connector should start reading lines from <code>test.txt</code> and
producing them to the topic <code>connect-test</code>, and the sink connector should start reading messages from the topic <code>connect-test</code>
and write them to the file <code>test.sink.txt</code>. We can verify the data has been delivered through the entire pipeline
by examining the contents of the output file:
</p>
<pre class="brush: bash;">
> more test.sink.txt
foo
bar</pre>
<p>
Note that the data is being stored in the Kafka topic <code>connect-test</code>, so we can also run a console consumer to see the
data in the topic (or use custom consumer code to process it):
</p>
<pre class="brush: bash;">
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning
{"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}
...</pre>
<p>The connectors continue to process data, so we can add data to the file and see it move through the pipeline:</p>
<pre class="brush: bash;">
> echo Another line>> test.txt</pre>
<p>You should see the line appear in the console consumer output and in the sink file.</p>
</div>
<div class="quickstart-step">
<h4 class="anchor-heading">
<a class="anchor-link" id="quickstart_kafkastreams" href="#quickstart_kafkastreams"></a>
<a href="#quickstart_kafkastreams">Step 7: Process your events with Kafka Streams</a>
</h4>
<p>
Once your data is stored in Kafka as events, you can process the data with the
<a href="/documentation/streams">Kafka Streams</a> client library for Java/Scala.
It allows you to implement mission-critical real-time applications and microservices, where the input
and/or output data is stored in Kafka topics. Kafka Streams combines the simplicity of writing and deploying
standard Java and Scala applications on the client side with the benefits of Kafka's server-side cluster
technology to make these applications highly scalable, elastic, fault-tolerant, and distributed. The library
supports exactly-once processing, stateful operations and aggregations, windowing, joins, processing based
on event-time, and much more.
</p>
<p>To give you a first taste, here's how one would implement the popular <code>WordCount</code> algorithm:</p>
<pre class="line-numbers"><code class="language-bash">KStream<String, String> textLines = builder.stream("quickstart-events");
KTable<String, Long> wordCounts = textLines
.flatMapValues(line -> Arrays.asList(line.toLowerCase().split(" ")))
.groupBy((keyIgnored, word) -> word)
.count();
wordCounts.toStream().to("output-topic", Produced.with(Serdes.String(), Serdes.Long()));</code></pre>
<p>
The <a href="/documentation/streams/quickstart">Kafka Streams demo</a>
and the <a href="/{{version}}/documentation/streams/tutorial">app development tutorial</a>
demonstrate how to code and run such a streaming application from start to finish.
</p>
</div>
<div class="quickstart-step">
<h4 class="anchor-heading">
<a class="anchor-link" id="quickstart_kafkaterminate" href="#quickstart_kafkaterminate"></a>
<a href="#quickstart_kafkaterminate">Step 8: Terminate the Kafka environment</a>
</h4>
<p>
Now that you reached the end of the quickstart, feel free to tear down the Kafka environment—or
continue playing around.
</p>
<ol>
<li>
Stop the producer and consumer clients with <code>Ctrl-C</code>, if you haven't done so already.
</li>
<li>
Stop the Kafka broker with <code>Ctrl-C</code>.
</li>
<li>
Lastly, if the Kafka with ZooKeeper section was followed, stop the ZooKeeper server with <code>Ctrl-C</code>.
</li>
</ol>
<p>
If you also want to delete any data of your local Kafka environment including any events you have created
along the way, run the command:
</p>
<pre class="line-numbers"><code class="language-bash">$ rm -rf /tmp/kafka-logs /tmp/zookeeper /tmp/kraft-combined-logs</code></pre>
</div>
<div class="quickstart-step">
<h4 class="anchor-heading">
<a class="anchor-link" id="quickstart_kafkacongrats" href="#quickstart_kafkacongrats"></a>
<a href="#quickstart_kafkacongrats">Congratulations!</a>
</h4>
<p>You have successfully finished the Apache Kafka quickstart.<div>
<p>To learn more, we suggest the following next steps:</p>
<ul>
<li>
Read through the brief <a href="/intro">Introduction</a>
to learn how Kafka works at a high level, its main concepts, and how it compares to other
technologies. To understand Kafka in more detail, head over to the
<a href="/documentation/">Documentation</a>.
</li>
<li>
Browse through the <a href="/powered-by">Use Cases</a> to learn how
other users in our world-wide community are getting value out of Kafka.
</li>
<!--
<li>
Learn how _Kafka compares to other technologies_ you might be familiar with.
[note to design team: this new page is not yet written]
</li>
-->
<li>
Join a <a href="/events">local Kafka meetup group</a> and
<a href="https://kafka-summit.org/past-events/">watch talks from Kafka Summit</a>,
the main conference of the Kafka community.
</li>
</ul>
</div>
</script>
<div class="p-quickstart"></div>