Separate Streams documentation and setup docs with easy to set variables

- Seperate Streams documentation out to a standalone page. - Setup templates to use handlebars.js - Create template variables to swap in frequently updated values like version number from a single file templateData.js Author: Derrick Or <derrickor@gmail.com> Reviewers: Guozhang Wang <wangguoz@gmail.com> Closes #2245 from derrickdoo/docTemplates
2016-12-13 17:59:49 -08:00 · 2016-12-13 17:59:49 -08:00 · 53428694a6
parent 448f194c70
commit 53428694a6
11 changed files with 4062 additions and 3948 deletions
--- a/docs/api.html
+++ b/docs/api.html
@ -15,6 +15,7 @@
 limitations under the License.
 -->
 <script id="api-template" type="text/x-handlebars-template">
 	Kafka includes four core apis:
 	<ol>
 	<li>The <a href="#producerapi">Producer</a> API allows applications to send streams of data to topics in the Kafka cluster.
@ -30,7 +31,7 @@ Kafka exposes all its functionality over a language independent protocol which h
 	The Producer API allows applications to send streams of data to topics in the Kafka cluster.
 	<p>
 	Examples showing how to use the producer are given in the
-<a href="/0100/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html" title="Kafka 0.10.0 Javadoc">javadocs</a>.
+	<a href="/{{version}}/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html" title="Kafka 0.10.0 Javadoc">javadocs</a>.
 	<p>
 	To use the producer, you can use the following maven dependency:
@ -47,7 +48,7 @@ To use the producer, you can use the following maven dependency:
 	The Consumer API allows applications to read streams of data from topics in the Kafka cluster.
 	<p>
 	Examples showing how to use the consumer are given in the
-<a href="/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html" title="Kafka 0.10.0 Javadoc">javadocs</a>.
+	<a href="/{{version}}/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html" title="Kafka 0.10.0 Javadoc">javadocs</a>.
 	<p>
 	To use the consumer, you can use the following maven dependency:
 	<pre>
@ -63,7 +64,7 @@ To use the consumer, you can use the following maven dependency:
 	The <a href="#streamsapi">Streams</a> API allows transforming streams of data from input topics to output topics.
 	<p>
 	Examples showing how to use this library are given in the
-<a href="/0100/javadoc/index.html?org/apache/kafka/streams/KafkaStreams.html" title="Kafka 0.10.0 Javadoc">javadocs</a>
+	<a href="/{{version}}/javadoc/index.html?org/apache/kafka/streams/KafkaStreams.html" title="Kafka 0.10.0 Javadoc">javadocs</a>
 	<p>
 	Additional documentation on using the Streams API is available <a href="/documentation.html#streams">here</a>.
 	<p>
@ -83,7 +84,7 @@ The Connect API allows implementing connectors that continually pull from some s
 	<p>
 	Many users of Connect won't need to use this API directly, though, they can use pre-built connectors without needing to write any code. Additional information on using Connect is available <a href="/documentation.html#connect">here</a>.
 	<p>
-Those who want to implement custom connectors can see the <a href="/0100/javadoc/index.html?org/apache/kafka/connect" title="Kafka 0.10.0 Javadoc">javadoc</a>.
+	Those who want to implement custom connectors can see the <a href="/{{version}}/javadoc/index.html?org/apache/kafka/connect" title="Kafka 0.10.0 Javadoc">javadoc</a>.
 	<p>
 	<h3><a id="legacyapis" href="#streamsapi">2.5 Legacy APIs</a></h3>
@ -92,3 +93,6 @@ Those who want to implement custom connectors can see the <a href="/0100/javadoc
 	A more limited legacy producer and consumer api is also included in Kafka. These old Scala APIs are deprecated and only still available for compatibility purposes. Information on them can be found here <a href="/081/documentation.html#producerapi"  title="Kafka 0.8.1 Docs">
 	here</a>.
 	</p>
 </script>
 <div class="p-api"></div>
--- a/docs/configuration.html
+++ b/docs/configuration.html
@ -15,6 +15,7 @@
 limitations under the License.
 -->
 <script id="configuration-template" type="text/x-handlebars-template">
  Kafka uses key-value pairs in the <a href="http://en.wikipedia.org/wiki/.properties">property file format</a> for configuration. These values can be supplied either from a file or programmatically.
  <h3><a id="brokerconfigs" href="#brokerconfigs">3.1 Broker Configs</a></h3>
@ -251,3 +252,6 @@ Below is the configuration of the Kafka Connect framework.
  <h3><a id="streamsconfigs" href="#streamsconfigs">3.5 Kafka Streams Configs</a></h3>
  Below is the configuration of the Kafka Streams client library.
  <!--#include virtual="generated/streams_config.html" -->
 </script>
 <div class="p-configuration"></div>
--- a/docs/connect.html
+++ b/docs/connect.html
@ -15,6 +15,7 @@
  ~ limitations under the License.
  ~-->
 <script id="connect-template" type="text/x-handlebars-template">
    <h3><a id="connect_overview" href="#connect_overview">8.1 Overview</a></h3>
    Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define <i>connectors</i> that move large collections of data into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. An export job can deliver data from Kafka topics into secondary storage and query systems or into batch systems for offline analysis.
@ -423,3 +424,6 @@ In most cases, connector and task states will match, though they may be differen
    <p>
    It's sometimes useful to temporarily stop the message processing of a connector. For example, if the remote system is undergoing maintenance, it would be preferable for source connectors to stop polling it for new data instead of filling logs with exception spam. For this use case, Connect offers a pause/resume API. While a source connector is paused, Connect will stop polling it for additional records. While a sink connector is paused, Connect will stop pushing new messages to it. The pause state is persistent, so even if you restart the cluster, the connector will not begin message processing again until the task has been resumed. Note that there may be a delay before all of a connector's tasks have transitioned to the PAUSED state since it may take time for them to finish whatever processing they were in the middle of when being paused. Additionally, failed tasks will not transition to the PAUSED state until they have been restarted.
    </p>
 </script>
 <div class="p-connect"></div>
--- a/docs/design.html
+++ b/docs/design.html
@ -15,6 +15,7 @@
 limitations under the License.
 -->
 <script id="design-template" type="text/x-handlebars-template">
    <h3><a id="majordesignelements" href="#majordesignelements">4.1 Motivation</a></h3>
    <p>
    We designed Kafka to be able to act as a unified platform for handling all the real-time data feeds <a href="#introduction">a large company might have</a>. To do this we had to think through a fairly broad set of use cases.
@ -464,7 +465,7 @@ situations where the upstream data source would not otherwise be replayable.
    Here is a high-level picture that shows the logical structure of a Kafka log with the offset for each message.
    <p>
-<img class="centered" src="images/log_cleaner_anatomy.png">
+    <img class="centered" src="/{{version}}/images/log_cleaner_anatomy.png">
    <p>
    The head of the log is identical to a traditional Kafka log. It has dense, sequential offsets and retains all messages. Log compaction adds an option for handling the tail of the log. The picture above shows a log
    with a compacted tail. Note that the messages in the tail of the log retain the original offset assigned when they were first written&mdash;that never changes. Note also that all offsets remain valid positions in
@ -478,7 +479,7 @@ marked as the "delete retention point" in the above diagram.
    The compaction is done in the background by periodically recopying log segments. Cleaning does not block reads and can be throttled to use no more than a configurable amount of I/O throughput to avoid impacting
    producers and consumers. The actual process of compacting a log segment looks something like this:
    <p>
-<img class="centered" src="images/log_compaction.png">
+    <img class="centered" src="/{{version}}/images/log_compaction.png">
    <p>
    <h4><a id="design_compactionguarantees" href="#design_compactionguarantees">What guarantees does log compaction provide?</a></h4>
@ -581,3 +582,6 @@ In fact, when running Kafka as a service this even makes it possible to enforce
    Client byte rate is measured over multiple small windows (e.g. 30 windows of 1 second each) in order to detect and correct quota violations quickly. Typically, having large measurement windows
    (for e.g. 10 windows of 30 seconds each) leads to large bursts of traffic followed by long delays which is not great in terms of user experience.
    </p>
 </script>
 <div class="p-design"></div>
--- a/docs/documentation.html
+++ b/docs/documentation.html
@ -15,6 +15,8 @@
 limitations under the License.
 -->
 <script><!--#include virtual="js/templateData.js" --></script>
 <!--#include virtual="../includes/_header.htm" -->
 <!--#include virtual="../includes/_top.htm" -->
--- a/docs/implementation.html
+++ b/docs/implementation.html
@ -15,6 +15,7 @@
 limitations under the License.
 -->
 <script id="implementation-template" type="text/x-handlebars-template">
    <h3><a id="apidesign" href="#apidesign">5.1 API Design</a></h3>
    <h4><a id="impl_producer" href="#impl_producer">Producer APIs</a></h4>
@ -201,7 +202,7 @@ value          : V bytes
    <p>
    The use of the message offset as the message id is unusual. Our original idea was to use a GUID generated by the producer, and maintain a mapping from GUID to offset on each broker. But since a consumer must maintain an ID for each server, the global uniqueness of the GUID provides no value. Furthermore, the complexity of maintaining the mapping from a random id to an offset requires a heavy weight index structure which must be synchronized with disk, essentially requiring a full persistent random-access data structure. Thus to simplify the lookup structure we decided to use a simple per-partition atomic counter which could be coupled with the partition id and node id to uniquely identify a message; this makes the lookup structure simpler, though multiple seeks per consumer request are still likely. However once we settled on a counter, the jump to directly using the offset seemed natural&mdash;both after all are monotonically increasing integers unique to a partition. Since the offset is hidden from the consumer API this decision is ultimately an implementation detail and we went with the more efficient approach.
    </p>
-<img class="centered" src="images/kafka_log.png">
+    <img class="centered" src="/{{version}}/images/kafka_log.png">
    <h4><a id="impl_writes" href="#impl_writes">Writes</a></h4>
    <p>
    The log allows serial appends which always go to the last file. This file is rolled over to a fresh file when it reaches a configurable size (say 1GB). The log takes two configuration parameters: <i>M</i>, which gives the number of messages to write before forcing the OS to flush the file to disk, and <i>S</i>, which gives a number of seconds after which a flush is forced. This gives a durability guarantee of losing at most <i>M</i> messages or <i>S</i> seconds of data in the event of a system crash.
@ -403,3 +404,6 @@ Each consumer does the following during rebalancing:
    <p>
    When rebalancing is triggered at one consumer, rebalancing should be triggered in other consumers within the same group about the same time.
    </p>
 </script>
 <div class="p-implementation"></div>
--- a/docs/introduction.html
+++ b/docs/introduction.html
@ -14,6 +14,10 @@
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
 <script><!--#include virtual="js/templateData.js" --></script>
 <script id="introduction-template" type="text/x-handlebars-template">
  <h3> Kafka is <i>a distributed streaming platform</i>. What exactly does that mean?</h3>
  <p>We think of a streaming platform as having three key capabilities:</p>
  <ol>
@ -42,7 +46,7 @@
    <li>The <a href="/documentation.html#streams">Streams API</a> allows an application to act as a <i>stream processor</i>, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
    <li>The <a href="/documentation.html#connect">Connector API</a> allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.
  </ul>
-    <img src="images/kafka-apis.png" style="float: right; width: 50%;">
+      <img src="/{{version}}/images/kafka-apis.png" style="float: right; width: 50%;">
      </div>
  <p>
  In Kafka the communication between the clients and the servers is done with a simple, high-performance, language agnostic <a href="https://kafka.apache.org/protocol.html">TCP protocol</a>. This protocol is versioned and maintains backwards compatibility with older version. We provide a Java client for Kafka, but clients are available in <a href="https://cwiki.apache.org/confluence/display/KAFKA/Clients">many languages</a>.</p>
@ -51,14 +55,14 @@ In Kafka the communication between the clients and the servers is done with a si
  <p>Let's first dive into the core abstraction Kafka provides for a stream of records&mdash;the topic.</p>
  <p>A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.</p>
  <p> For each topic, the Kafka cluster maintains a partitioned log that looks like this: </p>
-<img class="centered" src="images/log_anatomy.png">
+  <img class="centered" src="/{{version}}/images/log_anatomy.png">
  <p> Each partition is an ordered, immutable sequence of records that is continually appended to&mdash;a structured commit log. The records in the partitions are each assigned a sequential id number called the <i>offset</i> that uniquely identifies each record within the partition.
  </p>
  <p>
  The Kafka cluster retains all published records&mdash;whether or not they have been consumed&mdash;using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so storing data for a long time is not a problem.
  </p>
-<img class="centered" src="images/log_consumer.png" style="width:400px">
+  <img class="centered" src="/{{version}}/images/log_consumer.png" style="width:400px">
  <p>
  In fact, the only metadata retained on a per-consumer basis is the offset or position of that consumer in the log. This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads records, but, in fact, since the position is controlled by the consumer it can consume records in any order it likes. For example a consumer can reset to an older offset to reprocess data from the past or skip ahead to the most recent record and start consuming from "now".
  </p>
@ -93,7 +97,7 @@ If all the consumer instances have the same consumer group, then the records wil
  <p>
  If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.
  </p>
-<img class="centered" src="images/consumer-groups.png">
+  <img class="centered" src="/{{version}}/images/consumer-groups.png">
  <p>
    A two server Kafka cluster hosting four partitions (P0-P3) with two consumer groups. Consumer group A has two consumer instances and group B has four.
  </p>
@ -197,3 +201,19 @@ Likewise for streaming data pipelines the combination of subscription to real-ti
  <p>
  For more information on the guarantees, apis, and capabilities Kafka provides see the rest of the <a href="/documentation.html">documentation</a>.
  </p>
 </script>
 <!--#include virtual="../includes/_header.htm" -->
 <!--#include virtual="../includes/_top.htm" -->
 <div class="content documentation documentation--current">
 	<!--#include virtual="../includes/_nav.htm" -->
 	<div class="right">
    <div class="p-introduction"></div>
  </div>
 </div>
 <!--#include virtual="../includes/_footer.htm" -->
 <script>
 // Show selected style on nav item
 $(function() { $('.b-nav__intro').addClass('selected'); });
 </script>
--- a/docs/js/templateData.js
+++ b/docs/js/templateData.js
@ -0,0 +1,21 @@
 /*
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 */
 // Define variables for doc templates
 var context={
    "version": "0101"
 };
--- a/docs/ops.html
+++ b/docs/ops.html
@ -15,6 +15,7 @@
 limitations under the License.
 -->
 <script id="ops-template" type="text/x-handlebars-template">
  Here is some information on actually running Kafka as a production system based on usage and experience at LinkedIn. Please send us any additional tips you know of.
  <h3><a id="basic_ops" href="#basic_ops">6.1 Basic Kafka Operations</a></h3>
@ -1357,3 +1358,6 @@ Operationally, we do the following for a healthy ZooKeeper installation:
    <li>Don't overbuild the cluster: large clusters, especially in a write heavy usage pattern, means a lot of intracluster communication (quorums on the writes and subsequent cluster member updates), but don't underbuild it (and risk swamping the cluster). Having more servers adds to your read capacity.</li>
  </ul>
  Overall, we try to keep the ZooKeeper system as small as will handle the load (plus standard growth capacity planning) and as simple as possible. We try not to do anything fancy with the configuration or application layout as compared to the official release as well as keep it as self contained as possible. For these reasons, we tend to skip the OS packaged versions, since it has a tendency to try to put things in the OS standard hierarchy, which can be 'messy', for want of a better way to word it.
 </script>
 <div class="p-ops"></div>
--- a/docs/security.html
+++ b/docs/security.html
@ -15,6 +15,7 @@
 limitations under the License.
 -->
 <script id="security-template" type="text/x-handlebars-template">
    <h3><a id="security_overview" href="#security_overview">7.1 Security Overview</a></h3>
    In release 0.9.0.0, the Kafka community added a number of features that, used either separately or together, increases security in a Kafka cluster. These features are considered to be of beta quality. The following security measures are currently supported:
    <ol>
@ -744,3 +745,6 @@ It is also necessary to enable authentication on the ZooKeeper ensemble. To do i
        <li><a href="http://zookeeper.apache.org/doc/r3.4.9/zookeeperProgrammers.html#sc_ZooKeeperAccessControl">Apache ZooKeeper documentation</a></li>
        <li><a href="https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zookeeper+and+SASL">Apache ZooKeeper wiki</a></li>
    </ol>
 </script>
 <div class="p-security"></div>
--- a/docs/streams.html
+++ b/docs/streams.html
@ -15,7 +15,26 @@
  ~ limitations under the License.
  ~-->
-<h3><a id="streams_overview" href="#streams_overview">9.1 Overview</a></h3>
+<script><!--#include virtual="js/templateData.js" --></script>
 <script id="streams-template" type="text/x-handlebars-template">
    <h1>Streams</h1>
        <ol class="toc">
            <li>
                <a href="#streams_overview">Overview</a>
            </li>
            <li>
                <a href="#streams_developer">Developer guide</a>
                <ul>
                    <li><a href="#streams_concepts">Core concepts</a>
                    <li><a href="#streams_processor">Low-level processor API</a>
                    <li><a href="#streams_dsl">High-level streams DSL</a>
                </ul>
            </li>
        </ol>
        <h2><a id="streams_overview" href="#overview">Overview</a></h2>
        <p>
        Kafka Streams is a client library for processing and analyzing data stored in Kafka and either write the resulting data back to Kafka or send the final output to an external system. It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management of application state.
@ -34,7 +53,7 @@ Some highlights of Kafka Streams:
        </ul>
-<h3><a id="streams_developer" href="#streams_developer">9.2 Developer Guide</a></h3>
+        <h2><a id="streams_developer" href="#streams_developer">Developer Guide</a></h2>
        <p>
        There is a <a href="#quickstart_kafkastreams">quickstart</a> example that provides how to run a stream processing program coded in the Kafka Streams library.
@ -279,8 +298,8 @@ from a single topic).
        <pre>
            KStreamBuilder builder = new KStreamBuilder();
-    KStream&lt;String, GenericRecord&gt; source1 = builder.stream("topic1", "topic2");
+            KStream<String, GenericRecord> source1 = builder.stream("topic1", "topic2");
-    KTable&lt;String, GenericRecord&gt; source2 = builder.table("topic3", "stateStoreName");
+            KTable<String, GenericRecord> source2 = builder.table("topic3", "stateStoreName");
        </pre>
        <h5><a id="streams_dsl_windowing" href="#streams_dsl_windowing">Windowing a stream</a></h5>
@ -301,7 +320,7 @@ A <b>join</b> operation merges two streams based on the keys of their data recor
        </ul>
        Depending on the operands the following join operations are supported: <b>inner joins</b>, <b>outer joins</b> and <b>left joins</b>. Their semantics are similar to the corresponding operators in relational databases.
-
+        a
        <h5><a id="streams_dsl_transform" href="#streams_dsl_transform">Transform a stream</a></h5>
        <p>
@ -323,12 +342,12 @@ where users can usually pass a customized function to these functions as a param
        <pre>
            // written in Java 8+, using lambda expressions
-    KStream&lt;String, GenericRecord&gt; mapped = source1.mapValue(record -> record.get("category"));
+            KStream<String, GenericRecord> mapped = source1.mapValue(record -> record.get("category"));
        </pre>
        <p>
        Stateless transformations, by definition, do not depend on any state for processing, and hence implementation-wise
-they do not require a state store associated with the stream processor; stateful transformations, on the other hand,
+        they do not require a state store associated with the stream processor; Stateful transformations, on the other hand,
        require accessing an associated state for processing and producing outputs.
        For example, in <code>join</code> and <code>aggregate</code> operations, a windowing state is usually used to store all the received records
        within the defined window boundary so far. The operators can then access these accumulated records in the store and compute
@ -337,14 +356,14 @@ based on them.
        <pre>
            // written in Java 8+, using lambda expressions
-    KTable&lt;Windowed&lt;String&gt;, Long&gt; counts = source1.groupByKey().aggregate(
+            KTable<Windowed<String>, Long> counts = source1.groupByKey().aggregate(
                () -> 0L,  // initial value
                (aggKey, value, aggregate) -> aggregate + 1L,   // aggregating value
                TimeWindows.of("counts", 5000L).advanceBy(1000L), // intervals in milliseconds
                Serdes.Long() // serde for aggregated value
            );
-    KStream&lt;String, String&gt; joined = source1.leftJoin(source2,
+            KStream<String, String> joined = source1.leftJoin(source2,
                (record1, record2) -> record1.get("user") + "-" + record2.get("region");
            );
        </pre>
@ -369,7 +388,7 @@ Kafka Streams provides a convenience method called <code>through</code>:
            //
            // joined.to("topic4");
            // materialized = builder.stream("topic4");
-    KStream&lt;String, String&gt; materialized = joined.through("topic4");
+            KStream<String, String> materialized = joined.through("topic4");
        </pre>
@ -379,3 +398,27 @@ Besides defining the topology, developers will also need to configure their appl
        in <code>StreamsConfig</code> before running it. A complete list of
        Kafka Streams configs can be found <a href="#streamsconfigs"><b>here</b></a>.
        </p>
 </script>
 <!--#include virtual="../includes/_header.htm" -->
 <!--#include virtual="../includes/_top.htm" -->
 <div class="content documentation documentation--current">
 	<!--#include virtual="../includes/_nav.htm" -->
 	<div class="right">
 		<!--#include virtual="../includes/_docs_banner.htm" -->
        <ul class="breadcrumbs">
            <li><a href="/documentation">Documentation</a></li>
        </ul>
        <div class="p-streams"></div>
    </div>
 </div>
 <!--#include virtual="../includes/_footer.htm" -->
 <script>
 $(function() {
  // Show selected style on nav item
  $('.b-nav__streams').addClass('selected');
  // Display docs subnav items
  $('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
 });
 </script>