diff --git a/docs/js/templateData.js b/docs/js/templateData.js index 3eca71ede69..50997bd6b26 100644 --- a/docs/js/templateData.js +++ b/docs/js/templateData.js @@ -20,4 +20,5 @@ var context={ "version": "0110", "dotVersion": "0.11.0", "fullDotVersion": "0.11.0.0" + "scalaVersion:" "2.11" }; diff --git a/docs/streams/quickstart.html b/docs/streams/quickstart.html index 1c45e16a729..031a37516ec 100644 --- a/docs/streams/quickstart.html +++ b/docs/streams/quickstart.html @@ -40,10 +40,10 @@ of the As the first step, we will start Kafka (unless you already have it started) and then we will prepare input data to a Kafka topic, which will subsequently be processed by a Kafka Streams application. +

-

Step 1: Download the code

+

Step 1: Download the code

-Download the {{fullDotVersion}} release and un-tar it. +Download the {{fullDotVersion}} release and un-tar it. +Note that there are multiple downloadable Scala versions and we choose to use the recommended version ({{scalaVersion}}) here:
-> tar -xzf kafka_2.11-{{fullDotVersion}}.tgz
-> cd kafka_2.11-{{fullDotVersion}}
+> tar -xzf kafka_{{scalaVersion}}-{{fullDotVersion}}.tgz
+> cd kafka_{{scalaVersion}}-{{fullDotVersion}}
 
-

+

Step 2: Start the Kafka server

@@ -102,19 +104,9 @@ Kafka uses ZooKeeper so you need to -

Step 3: Prepare data

+

Step 3: Prepare input topic and start Kafka producer

 > echo -e "all streams lead to kafka\nhello kafka streams\njoin kafka summit" > file-input.txt
@@ -126,41 +118,59 @@ Or on Windows:
 > echo|set /p=join kafka summit>> file-input.txt
 
-

-Next, we send this input data to the input topic named streams-file-input using the console producer, -which reads the data from STDIN line-by-line, and publishes each line as a separate Kafka message with null key and value encoded a string to the topic (in practice, -stream data will likely be flowing continuously into Kafka where the application will be up and running): -

+--> + +Next, we create the input topic named streams-wordcount-input and the output topic named streams-wordcount-output:
 > bin/kafka-topics.sh --create \
     --zookeeper localhost:2181 \
     --replication-factor 1 \
     --partitions 1 \
-    --topic streams-file-input
+    --topic streams-wordcount-input
+Created topic "streams-wordcount-input".
+
+> bin/kafka-topics.sh --create \
+    --zookeeper localhost:2181 \
+    --replication-factor 1 \
+    --partitions 1 \
+    --topic streams-wordcount-output
+Created topic "streams-wordcount-output".
 
+The created topic can be described with the same kafka-topics tool:
-> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-file-input < file-input.txt
+> bin/kafka-topics.sh --zookeeper localhost:2181 --describe
+
+Topic:streams-wordcount-input	PartitionCount:1	ReplicationFactor:1	Configs:
+    Topic: streams-wordcount-input	Partition: 0	Leader: 0	Replicas: 0	Isr: 0
+Topic:streams-wordcount-output	PartitionCount:1	ReplicationFactor:1	Configs:
+	Topic: streams-wordcount-output	Partition: 0	Leader: 0	Replicas: 0	Isr: 0
 
-

Step 4: Process data

+

Step 4: Start the Wordcount Application

+ +The following command starts the WordCount demo application:
 > bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo
 

-The demo application will read from the input topic streams-file-input, perform the computations of the WordCount algorithm on each of the read messages, +The demo application will read from the input topic streams-wordcount-input, perform the computations of the WordCount algorithm on each of the read messages, and continuously write its current results to the output topic streams-wordcount-output. Hence there won't be any STDOUT output except log entries as the results are written back into in Kafka. -The demo will run for a few seconds and then, unlike typical stream processing applications, terminate automatically. -

-

-We can now inspect the output of the WordCount demo application by reading from its output topic:

+Now we can start the console producer in a separate terminal to write some input data to this topic: + +
+> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-wordcount-input
+
+ +and inspect the output of the WordCount demo application by reading from its output topic with the console consumer in a separate terminal: +
 > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
     --topic streams-wordcount-output \
@@ -172,27 +182,115 @@ We can now inspect the output of the WordCount demo application by reading from
     --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
 
-

-with the following output data being printed to the console: -

+ +

Step 5: Process some data

+ +Now let's write some message with the console producer into the input topic streams-wordcount-input by entering a single line of text and then hit <RETURN>. +This will send a new message to the input topic, where the message key is null and the message value is the string encoded text line that you just entered +(in practice, input data for applications will typically be streaming continuously into Kafka, rather than being manually entered as we do in this quickstart):
-all     1
-lead    1
-to      1
-hello   1
-streams 2
-join    1
-kafka   3
-summit  1
+> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-wordcount-input
+all streams lead to kafka
 

-Here, the first column is the Kafka message key in java.lang.String format, and the second column is the message value in java.lang.Long format. -Note that the output is actually a continuous stream of updates, where each data record (i.e. each line in the original output above) is -an updated count of a single word, aka record key such as "kafka". For multiple records with the same key, each later record is an update of the previous one. +This message will be processed by the Wordcount application and the following output data will be written to the streams-wordcount-output topic and printed by the console consumer:

+
+> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092
+    --topic streams-wordcount-output \
+    --from-beginning \
+    --formatter kafka.tools.DefaultMessageFormatter \
+    --property print.key=true \
+    --property print.value=true \
+    --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
+    --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
+
+all	    1
+streams	1
+lead	1
+to	    1
+kafka	1
+
+ +

+Here, the first column is the Kafka message key in java.lang.String format and represents a word that is being counted, and the second column is the message value in java.lang.Longformat, representing the word's latest count. +

+ +Now let's continue writing one more message with the console producer into the input topic streams-wordcount-input. +Enter the text line "hello kafka streams" and hit <RETURN>. +Your terminal should look as follows: + +
+> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-wordcount-input
+all streams lead to kafka
+hello kafka streams
+
+ +In your other terminal in which the console consumer is running, you will observe that the WordCount application wrote new output data: + +
+> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092
+    --topic streams-wordcount-output \
+    --from-beginning \
+    --formatter kafka.tools.DefaultMessageFormatter \
+    --property print.key=true \
+    --property print.value=true \
+    --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
+    --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
+
+all	    1
+streams	1
+lead	1
+to	    1
+kafka	1
+hello	1
+kafka	2
+streams	2
+
+ +Here the last printed lines kafka 2 and streams 2 indicate updates to the keys kafka and streams whose counts have been incremented from 1 to 2. +Whenever you write further input messages to the input topic, you will observe new messages being added to the streams-wordcount-output topic, +representing the most recent word counts as computed by the WordCount application. +Let's enter one final input text line "join kafka summit" and hit <RETURN> in the console producer to the input topic streams-wordcount-input before we wrap up this quickstart: + +
+> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-wordcount-input
+all streams lead to kafka
+hello kafka streams
+join kafka summit
+
+ +The streams-wordcount-output topic will subsequently show the corresponding updated word counts (see last three lines): + +
+> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092
+    --topic streams-wordcount-output \
+    --from-beginning \
+    --formatter kafka.tools.DefaultMessageFormatter \
+    --property print.key=true \
+    --property print.value=true \
+    --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
+    --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
+
+all	    1
+streams	1
+lead	1
+to	    1
+kafka	1
+hello	1
+kafka	2
+streams	2
+join	1
+kafka	3
+summit	1
+
+ +As one can see, outputs of the Wordcount application is actually a continuous stream of updates, where each output record (i.e. each line in the original output above) is +an updated count of a single word, aka record key such as "kafka". For multiple records with the same key, each later record is an update of the previous one. +

The two diagrams below illustrate what is essentially happening behind the scenes. The first column shows the evolution of the current state of the KTable<String, Long> that is counting word occurrences for count. @@ -217,13 +315,9 @@ And so on (we skip the illustration of how the third line is being processed). T Looking beyond the scope of this concrete example, what Kafka Streams is doing here is to leverage the duality between a table and a changelog stream (here: table = the KTable, changelog stream = the downstream KStream): you can publish every change of the table to a stream, and if you consume the entire changelog stream from beginning to end, you can reconstruct the contents of the table.

-

-Now you can write more input messages to the streams-file-input topic and observe additional messages added -to streams-wordcount-output topic, reflecting updated word counts (e.g., using the console producer and the -console consumer, as described above). -

+

Step 6: Teardown the application

-

You can stop the console consumer via Ctrl-C.

+

You can now stop the console consumer, the console producer, the Wordcount application, the Kafka broker and the Zookeeper server in order via Ctrl-C.