From b294ebcacfad0f7a452f172f016a0c7ee0dbbebd Mon Sep 17 00:00:00 2001 From: Ewen Cheslack-Postava Date: Tue, 14 Feb 2017 10:34:48 -0800 Subject: [PATCH] Add 0.10.2 docs from 0.10.2.0 RC2 --- 0102/generated/topic_config.html | 4 ++-- 0102/quickstart.html | 5 +---- 0102/streams.html | 29 +++++++++++++++++++++-------- 3 files changed, 24 insertions(+), 14 deletions(-) diff --git a/0102/generated/topic_config.html b/0102/generated/topic_config.html index 87eb7bd73..897414716 100644 --- a/0102/generated/topic_config.html +++ b/0102/generated/topic_config.html @@ -21,11 +21,11 @@ flush.msThis setting allows specifying a time interval at which we will force an fsync of data written to the log. For example if this was set to 1000 we would fsync after 1000 ms had passed. In general we recommend you not set this and use replication for durability and allow the operating system's background flush capabilities as it is more efficient.long9223372036854775807[0,...]log.flush.interval.msmedium -follower.replication.throttled.replicasA list of replicas for which log replication should be throttled on the follower side. The list should describe a set of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively the wildcard '*' can be used to throttle all replicas for this topic.list""kafka.server.ThrottledReplicaListValidator$@6c503cb2follower.replication.throttled.replicasmedium +follower.replication.throttled.replicasA list of replicas for which log replication should be throttled on the follower side. The list should describe a set of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively the wildcard '*' can be used to throttle all replicas for this topic.list""kafka.server.ThrottledReplicaListValidator$@7be8c2a2follower.replication.throttled.replicasmedium index.interval.bytesThis setting controls how frequently Kafka adds an index entry to it's offset index. The default setting ensures that we index a message roughly every 4096 bytes. More indexing allows reads to jump closer to the exact position in the log but makes the index larger. You probably don't need to change this.int4096[0,...]log.index.interval.bytesmedium -leader.replication.throttled.replicasA list of replicas for which log replication should be throttled on the leader side. The list should describe a set of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively the wildcard '*' can be used to throttle all replicas for this topic.list""kafka.server.ThrottledReplicaListValidator$@6c503cb2leader.replication.throttled.replicasmedium +leader.replication.throttled.replicasA list of replicas for which log replication should be throttled on the leader side. The list should describe a set of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively the wildcard '*' can be used to throttle all replicas for this topic.list""kafka.server.ThrottledReplicaListValidator$@7be8c2a2leader.replication.throttled.replicasmedium max.message.bytesThis is largest message size Kafka will allow to be appended. Note that if you increase this size you must also increase your consumer's fetch size so they can fetch messages this large.int1000012[0,...]message.max.bytesmedium diff --git a/0102/quickstart.html b/0102/quickstart.html index bfc9af3f7..69f6a7a5d 100644 --- a/0102/quickstart.html +++ b/0102/quickstart.html @@ -359,7 +359,7 @@

Step 8: Use
-> cat file-input.txt | ./bin/kafka-console-producer --broker-list localhost:9092 --topic streams-file-input
+> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-file-input < file-input.txt
 

@@ -397,12 +397,9 @@

Step 8: Use
 all     1
-streams 1
 lead    1
 to      1
-kafka   1
 hello   1
-kafka   2
 streams 2
 join    1
 kafka   3
diff --git a/0102/streams.html b/0102/streams.html
index 19af2b326..94ce7a951 100644
--- a/0102/streams.html
+++ b/0102/streams.html
@@ -497,25 +497,31 @@ 

Duality of Streams and Table

The same mechanism is used, for example, to replicate databases via change data capture (CDC) and, within Kafka Streams, to replicate its so-called state stores across machines for fault-tolerance. - The stream-table duality is such an important concept that Kafka Streams models it explicitly via the KStream and KTable interfaces, which we describe in the next sections. + The stream-table duality is such an important concept that Kafka Streams models it explicitly via the KStream, KTable, and GlobalKTable interfaces, which we describe in the next sections.

-

KStream and KTable
- The DSL uses two main abstractions. A KStream is an abstraction of a record stream, where each data record represents a self-contained datum in the unbounded data set. +
KStream, KTable, and GlobalKTable
+ The DSL uses three main abstractions. A KStream is an abstraction of a record stream, where each data record represents a self-contained datum in the unbounded data set. A KTable is an abstraction of a changelog stream, where each data record represents an update. More precisely, the value in a data record is considered to be an update of the last value for the same record key, - if any (if a corresponding key doesn't exist yet, the update will be considered a create). To illustrate the difference between KStreams and KTables, let's imagine the following two data records are being sent to the stream: + if any (if a corresponding key doesn't exist yet, the update will be considered a create). + Like a KTable, a GlobalKTable is an abstraction of a changelog stream, where each data record represents an update. + However, a GlobalKTable is different from a KTable in that it is fully replicated on each KafkaStreams instance. + GlobalKTable also provides the ability to look up current values of data records by keys. + This table-lookup functionality is available through join operations. + + To illustrate the difference between KStreams and KTables/GlobalKTables, let’s imagine the following two data records are being sent to the stream:
             ("alice", 1) --> ("alice", 3)
         
- If these records a KStream and the stream processing application were to sum the values it would return 4. If these records were a KTable, the return would be 3, since the last record would be considered as an update. + If these records a KStream and the stream processing application were to sum the values it would return 4. If these records were a KTable or GlobalKTable, the return would be 3, since the last record would be considered as an update.

Create Source Streams from Kafka

- Either a record stream (defined as KStream) or a changelog stream (defined as KTable) - can be created as a source stream from one or more Kafka topics (for KTable you can only create the source stream + Either a record stream (defined as KStream) or a changelog stream (defined as KTable or GlobalKTable) + can be created as a source stream from one or more Kafka topics (for KTable and GlobalKTable you can only create the source stream from a single topic).

@@ -524,6 +530,7 @@

Create Source Streams KStream<String, GenericRecord> source1 = builder.stream("topic1", "topic2"); KTable<String, GenericRecord> source2 = builder.table("topic3", "stateStoreName"); + GlobalKTable<String, GenericRecord> source2 = builder.globalTable("topic4", "globalStoreName");

Windowing a stream

@@ -551,7 +558,13 @@

Join multiple streamsKStream-to-KStream Joins are always windowed joins, since otherwise the memory and state required to compute the join would grow infinitely in size. Here, a newly received record from one of the streams is joined with the other stream's records within the specified window interval to produce one result for each matching pair based on user-provided ValueJoiner. A new KStream instance representing the result stream of the join is returned from this operator.
  • KTable-to-KTable Joins are join operations designed to be consistent with the ones in relational databases. Here, both changelog streams are materialized into local state stores first. When a new record is received from one of the streams, it is joined with the other stream's materialized state stores to produce one result for each matching pair based on user-provided ValueJoiner. A new KTable instance representing the result stream of the join, which is also a changelog stream of the represented table, is returned from this operator.
  • -
  • KStream-to-KTable Joins allow you to perform table lookups against a changelog stream (KTable) upon receiving a new record from another record stream (KStream). An example use case would be to enrich a stream of user activities (KStream) with the latest user profile information (KTable). Only records received from the record stream will trigger the join and produce results via ValueJoiner, not vice versa (i.e., records received from the changelog stream will be used only to update the materialized state store). A new KStream instance representing the result stream of the join is returned from this operator.
  • +
  • KStream-to-KTable Joins allow you to perform table lookups against a changelog stream (KTable) upon receiving a new record from another record stream (KStream). An example use case would be to enrich a stream of user activities (KStream) with the latest user profile information (KTable). Only records received from the record stream will trigger the join and produce results via ValueJoiner, not vice versa (i.e., records received from the changelog stream will be used only to update the materialized state store). A new KStream instance representing the result stream of the join is returned from this operator.
  • +
  • KStream-to-GlobalKTable Joins allow you to perform table lookups against a fully replicated changelog stream (GlobalKTable) upon receiving a new record from another record stream (KStream). + Joins with a GlobalKTable don't require repartitioning of the input KStream as all partitions of the GlobalKTable are available on every KafkaStreams instance. + The KeyValueMapper provided with the join operation is applied to each KStream record to extract the join-key that is used to do the lookup to the GlobalKTable so non-record-key joins are possible. + An example use case would be to enrich a stream of user activities (KStream) with the latest user profile information (GlobalKTable). + Only records received from the record stream will trigger the join and produce results via ValueJoiner, not vice versa (i.e., records received from the changelog stream will be used only to update the materialized state store). + A new KStream instance representing the result stream of the join is returned from this operator.
  • Depending on the operands the following join operations are supported: inner joins, outer joins and left joins. Their semantics are similar to the corresponding operators in relational databases.