Separate Streams documentation and setup docs with easy to set variables

- Seperate Streams documentation out to a standalone page.
- Setup templates to use handlebars.js
- Create template variables to swap in frequently updated values like version number from a single file templateData.js

Author: Derrick Or <derrickor@gmail.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #2245 from derrickdoo/docTemplates
This commit is contained in:
Derrick Or 2016-12-13 17:59:49 -08:00 committed by Guozhang Wang
parent 448f194c70
commit 53428694a6
11 changed files with 4062 additions and 3948 deletions

View File

@ -15,80 +15,84 @@
limitations under the License.
-->
Kafka includes four core apis:
<ol>
<li>The <a href="#producerapi">Producer</a> API allows applications to send streams of data to topics in the Kafka cluster.
<li>The <a href="#consumerapi">Consumer</a> API allows applications to read streams of data from topics in the Kafka cluster.
<li>The <a href="#streamsapi">Streams</a> API allows transforming streams of data from input topics to output topics.
<li>The <a href="#connectapi">Connect</a> API allows implementing connectors that continually pull from some source system or application into Kafka or push from Kafka into some sink system or application.
</ol>
<script id="api-template" type="text/x-handlebars-template">
Kafka includes four core apis:
<ol>
<li>The <a href="#producerapi">Producer</a> API allows applications to send streams of data to topics in the Kafka cluster.
<li>The <a href="#consumerapi">Consumer</a> API allows applications to read streams of data from topics in the Kafka cluster.
<li>The <a href="#streamsapi">Streams</a> API allows transforming streams of data from input topics to output topics.
<li>The <a href="#connectapi">Connect</a> API allows implementing connectors that continually pull from some source system or application into Kafka or push from Kafka into some sink system or application.
</ol>
Kafka exposes all its functionality over a language independent protocol which has clients available in many programming languages. However only the Java clients are maintained as part of the main Kafka project, the others are available as independent open source projects. A list of non-Java clients is available <a href="https://cwiki.apache.org/confluence/display/KAFKA/Clients">here</a>.
Kafka exposes all its functionality over a language independent protocol which has clients available in many programming languages. However only the Java clients are maintained as part of the main Kafka project, the others are available as independent open source projects. A list of non-Java clients is available <a href="https://cwiki.apache.org/confluence/display/KAFKA/Clients">here</a>.
<h3><a id="producerapi" href="#producerapi">2.1 Producer API</a></h3>
<h3><a id="producerapi" href="#producerapi">2.1 Producer API</a></h3>
The Producer API allows applications to send streams of data to topics in the Kafka cluster.
<p>
Examples showing how to use the producer are given in the
<a href="/0100/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html" title="Kafka 0.10.0 Javadoc">javadocs</a>.
<p>
To use the producer, you can use the following maven dependency:
The Producer API allows applications to send streams of data to topics in the Kafka cluster.
<p>
Examples showing how to use the producer are given in the
<a href="/{{version}}/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html" title="Kafka 0.10.0 Javadoc">javadocs</a>.
<p>
To use the producer, you can use the following maven dependency:
<pre>
&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;0.10.0.0&lt;/version&gt;
&lt;/dependency&gt;
</pre>
<pre>
&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;0.10.0.0&lt;/version&gt;
&lt;/dependency&gt;
</pre>
<h3><a id="consumerapi" href="#consumerapi">2.2 Consumer API</a></h3>
<h3><a id="consumerapi" href="#consumerapi">2.2 Consumer API</a></h3>
The Consumer API allows applications to read streams of data from topics in the Kafka cluster.
<p>
Examples showing how to use the consumer are given in the
<a href="/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html" title="Kafka 0.10.0 Javadoc">javadocs</a>.
<p>
To use the consumer, you can use the following maven dependency:
<pre>
&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;0.10.0.0&lt;/version&gt;
&lt;/dependency&gt;
</pre>
The Consumer API allows applications to read streams of data from topics in the Kafka cluster.
<p>
Examples showing how to use the consumer are given in the
<a href="/{{version}}/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html" title="Kafka 0.10.0 Javadoc">javadocs</a>.
<p>
To use the consumer, you can use the following maven dependency:
<pre>
&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-clients&lt;/artifactId&gt;
&lt;version&gt;0.10.0.0&lt;/version&gt;
&lt;/dependency&gt;
</pre>
<h3><a id="streamsapi" href="#streamsapi">2.3 Streams API</a></h3>
<h3><a id="streamsapi" href="#streamsapi">2.3 Streams API</a></h3>
The <a href="#streamsapi">Streams</a> API allows transforming streams of data from input topics to output topics.
<p>
Examples showing how to use this library are given in the
<a href="/0100/javadoc/index.html?org/apache/kafka/streams/KafkaStreams.html" title="Kafka 0.10.0 Javadoc">javadocs</a>
<p>
Additional documentation on using the Streams API is available <a href="/documentation.html#streams">here</a>.
<p>
To use Kafka Streams you can use the following maven dependency:
The <a href="#streamsapi">Streams</a> API allows transforming streams of data from input topics to output topics.
<p>
Examples showing how to use this library are given in the
<a href="/{{version}}/javadoc/index.html?org/apache/kafka/streams/KafkaStreams.html" title="Kafka 0.10.0 Javadoc">javadocs</a>
<p>
Additional documentation on using the Streams API is available <a href="/documentation.html#streams">here</a>.
<p>
To use Kafka Streams you can use the following maven dependency:
<pre>
&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-streams&lt;/artifactId&gt;
&lt;version&gt;0.10.0.0&lt;/version&gt;
&lt;/dependency&gt;
</pre>
<pre>
&lt;dependency&gt;
&lt;groupId&gt;org.apache.kafka&lt;/groupId&gt;
&lt;artifactId&gt;kafka-streams&lt;/artifactId&gt;
&lt;version&gt;0.10.0.0&lt;/version&gt;
&lt;/dependency&gt;
</pre>
<h3><a id="connectapi" href="#connectapi">2.4 Connect API</a></h3>
<h3><a id="connectapi" href="#connectapi">2.4 Connect API</a></h3>
The Connect API allows implementing connectors that continually pull from some source data system into Kafka or push from Kafka into some sink data system.
<p>
Many users of Connect won't need to use this API directly, though, they can use pre-built connectors without needing to write any code. Additional information on using Connect is available <a href="/documentation.html#connect">here</a>.
<p>
Those who want to implement custom connectors can see the <a href="/0100/javadoc/index.html?org/apache/kafka/connect" title="Kafka 0.10.0 Javadoc">javadoc</a>.
<p>
The Connect API allows implementing connectors that continually pull from some source data system into Kafka or push from Kafka into some sink data system.
<p>
Many users of Connect won't need to use this API directly, though, they can use pre-built connectors without needing to write any code. Additional information on using Connect is available <a href="/documentation.html#connect">here</a>.
<p>
Those who want to implement custom connectors can see the <a href="/{{version}}/javadoc/index.html?org/apache/kafka/connect" title="Kafka 0.10.0 Javadoc">javadoc</a>.
<p>
<h3><a id="legacyapis" href="#streamsapi">2.5 Legacy APIs</a></h3>
<h3><a id="legacyapis" href="#streamsapi">2.5 Legacy APIs</a></h3>
<p>
A more limited legacy producer and consumer api is also included in Kafka. These old Scala APIs are deprecated and only still available for compatibility purposes. Information on them can be found here <a href="/081/documentation.html#producerapi" title="Kafka 0.8.1 Docs">
here</a>.
</p>
<p>
A more limited legacy producer and consumer api is also included in Kafka. These old Scala APIs are deprecated and only still available for compatibility purposes. Information on them can be found here <a href="/081/documentation.html#producerapi" title="Kafka 0.8.1 Docs">
here</a>.
</p>
</script>
<div class="p-api"></div>

View File

@ -15,239 +15,243 @@
limitations under the License.
-->
Kafka uses key-value pairs in the <a href="http://en.wikipedia.org/wiki/.properties">property file format</a> for configuration. These values can be supplied either from a file or programmatically.
<script id="configuration-template" type="text/x-handlebars-template">
Kafka uses key-value pairs in the <a href="http://en.wikipedia.org/wiki/.properties">property file format</a> for configuration. These values can be supplied either from a file or programmatically.
<h3><a id="brokerconfigs" href="#brokerconfigs">3.1 Broker Configs</a></h3>
<h3><a id="brokerconfigs" href="#brokerconfigs">3.1 Broker Configs</a></h3>
The essential configurations are the following:
<ul>
<li><code>broker.id</code>
<li><code>log.dirs</code>
<li><code>zookeeper.connect</code>
</ul>
The essential configurations are the following:
<ul>
<li><code>broker.id</code>
<li><code>log.dirs</code>
<li><code>zookeeper.connect</code>
</ul>
Topic-level configurations and defaults are discussed in more detail <a href="#topic-config">below</a>.
Topic-level configurations and defaults are discussed in more detail <a href="#topic-config">below</a>.
<!--#include virtual="generated/kafka_config.html" -->
<!--#include virtual="generated/kafka_config.html" -->
<p>More details about broker configuration can be found in the scala class <code>kafka.server.KafkaConfig</code>.</p>
<p>More details about broker configuration can be found in the scala class <code>kafka.server.KafkaConfig</code>.</p>
<a id="topic-config" href="#topic-config">Topic-level configuration</a>
<a id="topic-config" href="#topic-config">Topic-level configuration</a>
Configurations pertinent to topics have both a server default as well an optional per-topic override. If no per-topic configuration is given the server default is used. The override can be set at topic creation time by giving one or more <code>--config</code> options. This example creates a topic named <i>my-topic</i> with a custom max message size and flush rate:
<pre>
<b> &gt; bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic --partitions 1
--replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1</b>
</pre>
Overrides can also be changed or set later using the alter configs command. This example updates the max message size for <i>my-topic</i>:
<pre>
<b> &gt; bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name my-topic --alter --add-config max.message.bytes=128000</b>
</pre>
Configurations pertinent to topics have both a server default as well an optional per-topic override. If no per-topic configuration is given the server default is used. The override can be set at topic creation time by giving one or more <code>--config</code> options. This example creates a topic named <i>my-topic</i> with a custom max message size and flush rate:
<pre>
<b> &gt; bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic --partitions 1
--replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1</b>
</pre>
Overrides can also be changed or set later using the alter configs command. This example updates the max message size for <i>my-topic</i>:
<pre>
<b> &gt; bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name my-topic --alter --add-config max.message.bytes=128000</b>
</pre>
To check overrides set on the topic you can do
<pre>
<b> &gt; bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name my-topic --describe</b>
</pre>
To check overrides set on the topic you can do
<pre>
<b> &gt; bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name my-topic --describe</b>
</pre>
To remove an override you can do
<pre>
<b> &gt; bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name my-topic --alter --delete-config max.message.bytes</b>
</pre>
To remove an override you can do
<pre>
<b> &gt; bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name my-topic --alter --delete-config max.message.bytes</b>
</pre>
The following are the topic-level configurations. The server's default configuration for this property is given under the Server Default Property heading. A given server default config value only applies to a topic if it does not have an explicit topic config override.
The following are the topic-level configurations. The server's default configuration for this property is given under the Server Default Property heading. A given server default config value only applies to a topic if it does not have an explicit topic config override.
<!--#include virtual="generated/topic_config.html" -->
<!--#include virtual="generated/topic_config.html" -->
<h3><a id="producerconfigs" href="#producerconfigs">3.2 Producer Configs</a></h3>
<h3><a id="producerconfigs" href="#producerconfigs">3.2 Producer Configs</a></h3>
Below is the configuration of the Java producer:
<!--#include virtual="generated/producer_config.html" -->
Below is the configuration of the Java producer:
<!--#include virtual="generated/producer_config.html" -->
<p>
For those interested in the legacy Scala producer configs, information can be found <a href="http://kafka.apache.org/082/documentation.html#producerconfigs">
here</a>.
</p>
<p>
For those interested in the legacy Scala producer configs, information can be found <a href="http://kafka.apache.org/082/documentation.html#producerconfigs">
here</a>.
</p>
<h3><a id="consumerconfigs" href="#consumerconfigs">3.3 Consumer Configs</a></h3>
<h3><a id="consumerconfigs" href="#consumerconfigs">3.3 Consumer Configs</a></h3>
In 0.9.0.0 we introduced the new Java consumer as a replacement for the older Scala-based simple and high-level consumers.
The configs for both new and old consumers are described below.
In 0.9.0.0 we introduced the new Java consumer as a replacement for the older Scala-based simple and high-level consumers.
The configs for both new and old consumers are described below.
<h4><a id="newconsumerconfigs" href="#newconsumerconfigs">3.3.1 New Consumer Configs</a></h4>
Below is the configuration for the new consumer:
<!--#include virtual="generated/consumer_config.html" -->
<h4><a id="newconsumerconfigs" href="#newconsumerconfigs">3.3.1 New Consumer Configs</a></h4>
Below is the configuration for the new consumer:
<!--#include virtual="generated/consumer_config.html" -->
<h4><a id="oldconsumerconfigs" href="#oldconsumerconfigs">3.3.2 Old Consumer Configs</a></h4>
<h4><a id="oldconsumerconfigs" href="#oldconsumerconfigs">3.3.2 Old Consumer Configs</a></h4>
The essential old consumer configurations are the following:
<ul>
<li><code>group.id</code>
<li><code>zookeeper.connect</code>
</ul>
The essential old consumer configurations are the following:
<ul>
<li><code>group.id</code>
<li><code>zookeeper.connect</code>
</ul>
<table class="data-table">
<tbody><tr>
<th>Property</th>
<th>Default</th>
<th>Description</th>
</tr>
<tr>
<td>group.id</td>
<td colspan="1"></td>
<td>A string that uniquely identifies the group of consumer processes to which this consumer belongs. By setting the same group id multiple processes indicate that they are all part of the same consumer group.</td>
</tr>
<tr>
<td>zookeeper.connect</td>
<td colspan="1"></td>
<td>Specifies the ZooKeeper connection string in the form <code>hostname:port</code> where host and port are the host and port of a ZooKeeper server. To allow connecting through other ZooKeeper nodes when that ZooKeeper machine is down you can also specify multiple hosts in the form <code>hostname1:port1,hostname2:port2,hostname3:port3</code>.
<p>
The server may also have a ZooKeeper chroot path as part of its ZooKeeper connection string which puts its data under some path in the global ZooKeeper namespace. If so the consumer should use the same chroot path in its connection string. For example to give a chroot path of <code>/chroot/path</code> you would give the connection string as <code>hostname1:port1,hostname2:port2,hostname3:port3/chroot/path</code>.</td>
</tr>
<tr>
<td>consumer.id</td>
<td colspan="1">null</td>
<td>
<p>Generated automatically if not set.</p>
</td>
</tr>
<tr>
<td>socket.timeout.ms</td>
<td colspan="1">30 * 1000</td>
<td>The socket timeout for network requests. The actual timeout set will be max.fetch.wait + socket.timeout.ms.</td>
</tr>
<tr>
<td>socket.receive.buffer.bytes</td>
<td colspan="1">64 * 1024</td>
<td>The socket receive buffer for network requests</td>
</tr>
<tr>
<td>fetch.message.max.bytes</td>
<td nowrap>1024 * 1024</td>
<td>The number of bytes of messages to attempt to fetch for each topic-partition in each fetch request. These bytes will be read into memory for each partition, so this helps control the memory used by the consumer. The fetch request size must be at least as large as the maximum message size the server allows or else it is possible for the producer to send messages larger than the consumer can fetch.</td>
</tr>
<tr>
<td>num.consumer.fetchers</td>
<td colspan="1">1</td>
<td>The number fetcher threads used to fetch data.</td>
</tr>
<tr>
<td>auto.commit.enable</td>
<td colspan="1">true</td>
<td>If true, periodically commit to ZooKeeper the offset of messages already fetched by the consumer. This committed offset will be used when the process fails as the position from which the new consumer will begin.</td>
</tr>
<tr>
<td>auto.commit.interval.ms</td>
<td colspan="1">60 * 1000</td>
<td>The frequency in ms that the consumer offsets are committed to zookeeper.</td>
</tr>
<tr>
<td>queued.max.message.chunks</td>
<td colspan="1">2</td>
<td>Max number of message chunks buffered for consumption. Each chunk can be up to fetch.message.max.bytes.</td>
</tr>
<tr>
<td>rebalance.max.retries</td>
<td colspan="1">4</td>
<td>When a new consumer joins a consumer group the set of consumers attempt to "rebalance" the load to assign partitions to each consumer. If the set of consumers changes while this assignment is taking place the rebalance will fail and retry. This setting controls the maximum number of attempts before giving up.</td>
</tr>
<tr>
<td>fetch.min.bytes</td>
<td colspan="1">1</td>
<td>The minimum amount of data the server should return for a fetch request. If insufficient data is available the request will wait for that much data to accumulate before answering the request.</td>
</tr>
<tr>
<td>fetch.wait.max.ms</td>
<td colspan="1">100</td>
<td>The maximum amount of time the server will block before answering the fetch request if there isn't sufficient data to immediately satisfy fetch.min.bytes</td>
</tr>
<tr>
<td>rebalance.backoff.ms</td>
<td>2000</td>
<td>Backoff time between retries during rebalance. If not set explicitly, the value in zookeeper.sync.time.ms is used.
<table class="data-table">
<tbody><tr>
<th>Property</th>
<th>Default</th>
<th>Description</th>
</tr>
<tr>
<td>group.id</td>
<td colspan="1"></td>
<td>A string that uniquely identifies the group of consumer processes to which this consumer belongs. By setting the same group id multiple processes indicate that they are all part of the same consumer group.</td>
</tr>
<tr>
<td>zookeeper.connect</td>
<td colspan="1"></td>
<td>Specifies the ZooKeeper connection string in the form <code>hostname:port</code> where host and port are the host and port of a ZooKeeper server. To allow connecting through other ZooKeeper nodes when that ZooKeeper machine is down you can also specify multiple hosts in the form <code>hostname1:port1,hostname2:port2,hostname3:port3</code>.
<p>
The server may also have a ZooKeeper chroot path as part of its ZooKeeper connection string which puts its data under some path in the global ZooKeeper namespace. If so the consumer should use the same chroot path in its connection string. For example to give a chroot path of <code>/chroot/path</code> you would give the connection string as <code>hostname1:port1,hostname2:port2,hostname3:port3/chroot/path</code>.</td>
</tr>
<tr>
<td>consumer.id</td>
<td colspan="1">null</td>
<td>
<p>Generated automatically if not set.</p>
</td>
</tr>
<tr>
<td>refresh.leader.backoff.ms</td>
<td colspan="1">200</td>
<td>Backoff time to wait before trying to determine the leader of a partition that has just lost its leader.</td>
</tr>
<tr>
<td>auto.offset.reset</td>
<td colspan="1">largest</td>
<td>
<p>What to do when there is no initial offset in ZooKeeper or if an offset is out of range:<br/>* smallest : automatically reset the offset to the smallest offset<br/>* largest : automatically reset the offset to the largest offset<br/>* anything else: throw exception to the consumer</p>
</td>
</tr>
<tr>
<td>consumer.timeout.ms</td>
<td colspan="1">-1</td>
<td>Throw a timeout exception to the consumer if no message is available for consumption after the specified interval</td>
</tr>
<tr>
<td>exclude.internal.topics</td>
<td colspan="1">true</td>
<td>Whether messages from internal topics (such as offsets) should be exposed to the consumer.</td>
</tr>
<tr>
<td>client.id</td>
<td colspan="1">group id value</td>
<td>The client id is a user-specified string sent in each request to help trace calls. It should logically identify the application making the request.</td>
</tr>
<tr>
<td>zookeeper.session.timeout.ms </td>
<td colspan="1">6000</td>
<td>ZooKeeper session timeout. If the consumer fails to heartbeat to ZooKeeper for this period of time it is considered dead and a rebalance will occur.</td>
</tr>
<tr>
<td>zookeeper.connection.timeout.ms</td>
<td colspan="1">6000</td>
<td>The max time that the client waits while establishing a connection to zookeeper.</td>
</tr>
<tr>
<td>zookeeper.sync.time.ms </td>
<td colspan="1">2000</td>
<td>How far a ZK follower can be behind a ZK leader</td>
</tr>
<tr>
<td>offsets.storage</td>
<td colspan="1">zookeeper</td>
<td>Select where offsets should be stored (zookeeper or kafka).</td>
</tr>
<tr>
<td>offsets.channel.backoff.ms</td>
<td colspan="1">1000</td>
<td>The backoff period when reconnecting the offsets channel or retrying failed offset fetch/commit requests.</td>
</tr>
<tr>
<td>offsets.channel.socket.timeout.ms</td>
<td colspan="1">10000</td>
<td>Socket timeout when reading responses for offset fetch/commit requests. This timeout is also used for ConsumerMetadata requests that are used to query for the offset manager.</td>
</tr>
<tr>
<td>offsets.commit.max.retries</td>
<td colspan="1">5</td>
<td>Retry the offset commit up to this many times on failure. This retry count only applies to offset commits during shut-down. It does not apply to commits originating from the auto-commit thread. It also does not apply to attempts to query for the offset coordinator before committing offsets. i.e., if a consumer metadata request fails for any reason, it will be retried and that retry does not count toward this limit.</td>
</tr>
<tr>
<td>dual.commit.enabled</td>
<td colspan="1">true</td>
<td>If you are using "kafka" as offsets.storage, you can dual commit offsets to ZooKeeper (in addition to Kafka). This is required during migration from zookeeper-based offset storage to kafka-based offset storage. With respect to any given consumer group, it is safe to turn this off after all instances within that group have been migrated to the new version that commits offsets to the broker (instead of directly to ZooKeeper).</td>
</tr>
<tr>
<td>partition.assignment.strategy</td>
<td colspan="1">range</td>
<td><p>Select between the "range" or "roundrobin" strategy for assigning partitions to consumer streams.<p>The round-robin partition assignor lays out all the available partitions and all the available consumer threads. It then proceeds to do a round-robin assignment from partition to consumer thread. If the subscriptions of all consumer instances are identical, then the partitions will be uniformly distributed. (i.e., the partition ownership counts will be within a delta of exactly one across all consumer threads.) Round-robin assignment is permitted only if: (a) Every topic has the same number of streams within a consumer instance (b) The set of subscribed topics is identical for every consumer instance within the group.<p> Range partitioning works on a per-topic basis. For each topic, we lay out the available partitions in numeric order and the consumer threads in lexicographic order. We then divide the number of partitions by the total number of consumer streams (threads) to determine the number of partitions to assign to each consumer. If it does not evenly divide, then the first few consumers will have one extra partition.</td>
</tr>
</tbody>
</table>
</tr>
<tr>
<td>socket.timeout.ms</td>
<td colspan="1">30 * 1000</td>
<td>The socket timeout for network requests. The actual timeout set will be max.fetch.wait + socket.timeout.ms.</td>
</tr>
<tr>
<td>socket.receive.buffer.bytes</td>
<td colspan="1">64 * 1024</td>
<td>The socket receive buffer for network requests</td>
</tr>
<tr>
<td>fetch.message.max.bytes</td>
<td nowrap>1024 * 1024</td>
<td>The number of bytes of messages to attempt to fetch for each topic-partition in each fetch request. These bytes will be read into memory for each partition, so this helps control the memory used by the consumer. The fetch request size must be at least as large as the maximum message size the server allows or else it is possible for the producer to send messages larger than the consumer can fetch.</td>
</tr>
<tr>
<td>num.consumer.fetchers</td>
<td colspan="1">1</td>
<td>The number fetcher threads used to fetch data.</td>
</tr>
<tr>
<td>auto.commit.enable</td>
<td colspan="1">true</td>
<td>If true, periodically commit to ZooKeeper the offset of messages already fetched by the consumer. This committed offset will be used when the process fails as the position from which the new consumer will begin.</td>
</tr>
<tr>
<td>auto.commit.interval.ms</td>
<td colspan="1">60 * 1000</td>
<td>The frequency in ms that the consumer offsets are committed to zookeeper.</td>
</tr>
<tr>
<td>queued.max.message.chunks</td>
<td colspan="1">2</td>
<td>Max number of message chunks buffered for consumption. Each chunk can be up to fetch.message.max.bytes.</td>
</tr>
<tr>
<td>rebalance.max.retries</td>
<td colspan="1">4</td>
<td>When a new consumer joins a consumer group the set of consumers attempt to "rebalance" the load to assign partitions to each consumer. If the set of consumers changes while this assignment is taking place the rebalance will fail and retry. This setting controls the maximum number of attempts before giving up.</td>
</tr>
<tr>
<td>fetch.min.bytes</td>
<td colspan="1">1</td>
<td>The minimum amount of data the server should return for a fetch request. If insufficient data is available the request will wait for that much data to accumulate before answering the request.</td>
</tr>
<tr>
<td>fetch.wait.max.ms</td>
<td colspan="1">100</td>
<td>The maximum amount of time the server will block before answering the fetch request if there isn't sufficient data to immediately satisfy fetch.min.bytes</td>
</tr>
<tr>
<td>rebalance.backoff.ms</td>
<td>2000</td>
<td>Backoff time between retries during rebalance. If not set explicitly, the value in zookeeper.sync.time.ms is used.
</td>
</tr>
<tr>
<td>refresh.leader.backoff.ms</td>
<td colspan="1">200</td>
<td>Backoff time to wait before trying to determine the leader of a partition that has just lost its leader.</td>
</tr>
<tr>
<td>auto.offset.reset</td>
<td colspan="1">largest</td>
<td>
<p>What to do when there is no initial offset in ZooKeeper or if an offset is out of range:<br/>* smallest : automatically reset the offset to the smallest offset<br/>* largest : automatically reset the offset to the largest offset<br/>* anything else: throw exception to the consumer</p>
</td>
</tr>
<tr>
<td>consumer.timeout.ms</td>
<td colspan="1">-1</td>
<td>Throw a timeout exception to the consumer if no message is available for consumption after the specified interval</td>
</tr>
<tr>
<td>exclude.internal.topics</td>
<td colspan="1">true</td>
<td>Whether messages from internal topics (such as offsets) should be exposed to the consumer.</td>
</tr>
<tr>
<td>client.id</td>
<td colspan="1">group id value</td>
<td>The client id is a user-specified string sent in each request to help trace calls. It should logically identify the application making the request.</td>
</tr>
<tr>
<td>zookeeper.session.timeout.ms </td>
<td colspan="1">6000</td>
<td>ZooKeeper session timeout. If the consumer fails to heartbeat to ZooKeeper for this period of time it is considered dead and a rebalance will occur.</td>
</tr>
<tr>
<td>zookeeper.connection.timeout.ms</td>
<td colspan="1">6000</td>
<td>The max time that the client waits while establishing a connection to zookeeper.</td>
</tr>
<tr>
<td>zookeeper.sync.time.ms </td>
<td colspan="1">2000</td>
<td>How far a ZK follower can be behind a ZK leader</td>
</tr>
<tr>
<td>offsets.storage</td>
<td colspan="1">zookeeper</td>
<td>Select where offsets should be stored (zookeeper or kafka).</td>
</tr>
<tr>
<td>offsets.channel.backoff.ms</td>
<td colspan="1">1000</td>
<td>The backoff period when reconnecting the offsets channel or retrying failed offset fetch/commit requests.</td>
</tr>
<tr>
<td>offsets.channel.socket.timeout.ms</td>
<td colspan="1">10000</td>
<td>Socket timeout when reading responses for offset fetch/commit requests. This timeout is also used for ConsumerMetadata requests that are used to query for the offset manager.</td>
</tr>
<tr>
<td>offsets.commit.max.retries</td>
<td colspan="1">5</td>
<td>Retry the offset commit up to this many times on failure. This retry count only applies to offset commits during shut-down. It does not apply to commits originating from the auto-commit thread. It also does not apply to attempts to query for the offset coordinator before committing offsets. i.e., if a consumer metadata request fails for any reason, it will be retried and that retry does not count toward this limit.</td>
</tr>
<tr>
<td>dual.commit.enabled</td>
<td colspan="1">true</td>
<td>If you are using "kafka" as offsets.storage, you can dual commit offsets to ZooKeeper (in addition to Kafka). This is required during migration from zookeeper-based offset storage to kafka-based offset storage. With respect to any given consumer group, it is safe to turn this off after all instances within that group have been migrated to the new version that commits offsets to the broker (instead of directly to ZooKeeper).</td>
</tr>
<tr>
<td>partition.assignment.strategy</td>
<td colspan="1">range</td>
<td><p>Select between the "range" or "roundrobin" strategy for assigning partitions to consumer streams.<p>The round-robin partition assignor lays out all the available partitions and all the available consumer threads. It then proceeds to do a round-robin assignment from partition to consumer thread. If the subscriptions of all consumer instances are identical, then the partitions will be uniformly distributed. (i.e., the partition ownership counts will be within a delta of exactly one across all consumer threads.) Round-robin assignment is permitted only if: (a) Every topic has the same number of streams within a consumer instance (b) The set of subscribed topics is identical for every consumer instance within the group.<p> Range partitioning works on a per-topic basis. For each topic, we lay out the available partitions in numeric order and the consumer threads in lexicographic order. We then divide the number of partitions by the total number of consumer streams (threads) to determine the number of partitions to assign to each consumer. If it does not evenly divide, then the first few consumers will have one extra partition.</td>
</tr>
</tbody>
</table>
<p>More details about consumer configuration can be found in the scala class <code>kafka.consumer.ConsumerConfig</code>.</p>
<p>More details about consumer configuration can be found in the scala class <code>kafka.consumer.ConsumerConfig</code>.</p>
<h3><a id="connectconfigs" href="#connectconfigs">3.4 Kafka Connect Configs</a></h3>
Below is the configuration of the Kafka Connect framework.
<!--#include virtual="generated/connect_config.html" -->
<h3><a id="connectconfigs" href="#connectconfigs">3.4 Kafka Connect Configs</a></h3>
Below is the configuration of the Kafka Connect framework.
<!--#include virtual="generated/connect_config.html" -->
<h3><a id="streamsconfigs" href="#streamsconfigs">3.5 Kafka Streams Configs</a></h3>
Below is the configuration of the Kafka Streams client library.
<!--#include virtual="generated/streams_config.html" -->
<h3><a id="streamsconfigs" href="#streamsconfigs">3.5 Kafka Streams Configs</a></h3>
Below is the configuration of the Kafka Streams client library.
<!--#include virtual="generated/streams_config.html" -->
</script>
<div class="p-configuration"></div>

View File

@ -15,411 +15,415 @@
~ limitations under the License.
~-->
<h3><a id="connect_overview" href="#connect_overview">8.1 Overview</a></h3>
<script id="connect-template" type="text/x-handlebars-template">
<h3><a id="connect_overview" href="#connect_overview">8.1 Overview</a></h3>
Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define <i>connectors</i> that move large collections of data into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. An export job can deliver data from Kafka topics into secondary storage and query systems or into batch systems for offline analysis.
Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define <i>connectors</i> that move large collections of data into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. An export job can deliver data from Kafka topics into secondary storage and query systems or into batch systems for offline analysis.
Kafka Connect features include:
<ul>
<li><b>A common framework for Kafka connectors</b> - Kafka Connect standardizes integration of other data systems with Kafka, simplifying connector development, deployment, and management</li>
<li><b>Distributed and standalone modes</b> - scale up to a large, centrally managed service supporting an entire organization or scale down to development, testing, and small production deployments</li>
<li><b>REST interface</b> - submit and manage connectors to your Kafka Connect cluster via an easy to use REST API</li>
<li><b>Automatic offset management</b> - with just a little information from connectors, Kafka Connect can manage the offset commit process automatically so connector developers do not need to worry about this error prone part of connector development</li>
<li><b>Distributed and scalable by default</b> - Kafka Connect builds on the existing group management protocol. More workers can be added to scale up a Kafka Connect cluster.</li>
<li><b>Streaming/batch integration</b> - leveraging Kafka's existing capabilities, Kafka Connect is an ideal solution for bridging streaming and batch data systems</li>
</ul>
Kafka Connect features include:
<ul>
<li><b>A common framework for Kafka connectors</b> - Kafka Connect standardizes integration of other data systems with Kafka, simplifying connector development, deployment, and management</li>
<li><b>Distributed and standalone modes</b> - scale up to a large, centrally managed service supporting an entire organization or scale down to development, testing, and small production deployments</li>
<li><b>REST interface</b> - submit and manage connectors to your Kafka Connect cluster via an easy to use REST API</li>
<li><b>Automatic offset management</b> - with just a little information from connectors, Kafka Connect can manage the offset commit process automatically so connector developers do not need to worry about this error prone part of connector development</li>
<li><b>Distributed and scalable by default</b> - Kafka Connect builds on the existing group management protocol. More workers can be added to scale up a Kafka Connect cluster.</li>
<li><b>Streaming/batch integration</b> - leveraging Kafka's existing capabilities, Kafka Connect is an ideal solution for bridging streaming and batch data systems</li>
</ul>
<h3><a id="connect_user" href="#connect_user">8.2 User Guide</a></h3>
<h3><a id="connect_user" href="#connect_user">8.2 User Guide</a></h3>
The quickstart provides a brief example of how to run a standalone version of Kafka Connect. This section describes how to configure, run, and manage Kafka Connect in more detail.
The quickstart provides a brief example of how to run a standalone version of Kafka Connect. This section describes how to configure, run, and manage Kafka Connect in more detail.
<h4><a id="connect_running" href="#connect_running">Running Kafka Connect</a></h4>
<h4><a id="connect_running" href="#connect_running">Running Kafka Connect</a></h4>
Kafka Connect currently supports two modes of execution: standalone (single process) and distributed.
Kafka Connect currently supports two modes of execution: standalone (single process) and distributed.
In standalone mode all work is performed in a single process. This configuration is simpler to setup and get started with and may be useful in situations where only one worker makes sense (e.g. collecting log files), but it does not benefit from some of the features of Kafka Connect such as fault tolerance. You can start a standalone process with the following command:
In standalone mode all work is performed in a single process. This configuration is simpler to setup and get started with and may be useful in situations where only one worker makes sense (e.g. collecting log files), but it does not benefit from some of the features of Kafka Connect such as fault tolerance. You can start a standalone process with the following command:
<pre>
&gt; bin/connect-standalone.sh config/connect-standalone.properties connector1.properties [connector2.properties ...]
</pre>
<pre>
&gt; bin/connect-standalone.sh config/connect-standalone.properties connector1.properties [connector2.properties ...]
</pre>
The first parameter is the configuration for the worker. This includes settings such as the Kafka connection parameters, serialization format, and how frequently to commit offsets. The provided example should work well with a local cluster running with the default configuration provided by <code>config/server.properties</code>. It will require tweaking to use with a different configuration or production deployment. All workers (both standalone and distributed) require a few configs:
<ul>
<li><code>bootstrap.servers</code> - List of Kafka servers used to bootstrap connections to Kafka</li>
<li><code>key.converter</code> - Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the keys in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro.</li>
<li><code>value.converter</code> - Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the values in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro.</li>
</ul>
The first parameter is the configuration for the worker. This includes settings such as the Kafka connection parameters, serialization format, and how frequently to commit offsets. The provided example should work well with a local cluster running with the default configuration provided by <code>config/server.properties</code>. It will require tweaking to use with a different configuration or production deployment. All workers (both standalone and distributed) require a few configs:
<ul>
<li><code>bootstrap.servers</code> - List of Kafka servers used to bootstrap connections to Kafka</li>
<li><code>key.converter</code> - Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the keys in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro.</li>
<li><code>value.converter</code> - Converter class used to convert between Kafka Connect format and the serialized form that is written to Kafka. This controls the format of the values in messages written to or read from Kafka, and since this is independent of connectors it allows any connector to work with any serialization format. Examples of common formats include JSON and Avro.</li>
</ul>
The important configuration options specific to standalone mode are:
<ul>
<li><code>offset.storage.file.filename</code> - File to store offset data in</li>
</ul>
The important configuration options specific to standalone mode are:
<ul>
<li><code>offset.storage.file.filename</code> - File to store offset data in</li>
</ul>
The remaining parameters are connector configuration files. You may include as many as you want, but all will execute within the same process (on different threads).
The remaining parameters are connector configuration files. You may include as many as you want, but all will execute within the same process (on different threads).
Distributed mode handles automatic balancing of work, allows you to scale up (or down) dynamically, and offers fault tolerance both in the active tasks and for configuration and offset commit data. Execution is very similar to standalone mode:
Distributed mode handles automatic balancing of work, allows you to scale up (or down) dynamically, and offers fault tolerance both in the active tasks and for configuration and offset commit data. Execution is very similar to standalone mode:
<pre>
&gt; bin/connect-distributed.sh config/connect-distributed.properties
</pre>
<pre>
&gt; bin/connect-distributed.sh config/connect-distributed.properties
</pre>
The difference is in the class which is started and the configuration parameters which change how the Kafka Connect process decides where to store configurations, how to assign work, and where to store offsets and task statues. In the distributed mode, Kafka Connect stores the offsets, configs and task statuses in Kafka topics. It is recommended to manually create the topics for offset, configs and statuses in order to achieve the desired the number of partitions and replication factors. If the topics are not yet created when starting Kafka Connect, the topics will be auto created with default number of partitions and replication factor, which may not be best suited for its usage.
The difference is in the class which is started and the configuration parameters which change how the Kafka Connect process decides where to store configurations, how to assign work, and where to store offsets and task statues. In the distributed mode, Kafka Connect stores the offsets, configs and task statuses in Kafka topics. It is recommended to manually create the topics for offset, configs and statuses in order to achieve the desired the number of partitions and replication factors. If the topics are not yet created when starting Kafka Connect, the topics will be auto created with default number of partitions and replication factor, which may not be best suited for its usage.
In particular, the following configuration parameters, in addition to the common settings mentioned above, are critical to set before starting your cluster:
<ul>
<li><code>group.id</code> (default <code>connect-cluster</code>) - unique name for the cluster, used in forming the Connect cluster group; note that this <b>must not conflict</b> with consumer group IDs</li>
<li><code>config.storage.topic</code> (default <code>connect-configs</code>) - topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated, compacted topic. You may need to manually create the topic to ensure the correct configuration as auto created topics may have multiple partitions or be automatically configured for deletion rather than compaction</li>
<li><code>offset.storage.topic</code> (default <code>connect-offsets</code>) - topic to use for storing offsets; this topic should have many partitions, be replicated, and be configured for compaction</li>
<li><code>status.storage.topic</code> (default <code>connect-status</code>) - topic to use for storing statuses; this topic can have multiple partitions, and should be replicated and configured for compaction</li>
</ul>
In particular, the following configuration parameters, in addition to the common settings mentioned above, are critical to set before starting your cluster:
<ul>
<li><code>group.id</code> (default <code>connect-cluster</code>) - unique name for the cluster, used in forming the Connect cluster group; note that this <b>must not conflict</b> with consumer group IDs</li>
<li><code>config.storage.topic</code> (default <code>connect-configs</code>) - topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated, compacted topic. You may need to manually create the topic to ensure the correct configuration as auto created topics may have multiple partitions or be automatically configured for deletion rather than compaction</li>
<li><code>offset.storage.topic</code> (default <code>connect-offsets</code>) - topic to use for storing offsets; this topic should have many partitions, be replicated, and be configured for compaction</li>
<li><code>status.storage.topic</code> (default <code>connect-status</code>) - topic to use for storing statuses; this topic can have multiple partitions, and should be replicated and configured for compaction</li>
</ul>
Note that in distributed mode the connector configurations are not passed on the command line. Instead, use the REST API described below to create, modify, and destroy connectors.
Note that in distributed mode the connector configurations are not passed on the command line. Instead, use the REST API described below to create, modify, and destroy connectors.
<h4><a id="connect_configuring" href="#connect_configuring">Configuring Connectors</a></h4>
<h4><a id="connect_configuring" href="#connect_configuring">Configuring Connectors</a></h4>
Connector configurations are simple key-value mappings. For standalone mode these are defined in a properties file and passed to the Connect process on the command line. In distributed mode, they will be included in the JSON payload for the request that creates (or modifies) the connector.
Connector configurations are simple key-value mappings. For standalone mode these are defined in a properties file and passed to the Connect process on the command line. In distributed mode, they will be included in the JSON payload for the request that creates (or modifies) the connector.
Most configurations are connector dependent, so they can't be outlined here. However, there are a few common options:
Most configurations are connector dependent, so they can't be outlined here. However, there are a few common options:
<ul>
<li><code>name</code> - Unique name for the connector. Attempting to register again with the same name will fail.</li>
<li><code>connector.class</code> - The Java class for the connector</li>
<li><code>tasks.max</code> - The maximum number of tasks that should be created for this connector. The connector may create fewer tasks if it cannot achieve this level of parallelism.</li>
<li><code>key.converter</code> - (optional) Override the default key converter set by the worker.</li>
<li><code>value.converter</code> - (optional) Override the default value converter set by the worker.</li>
</ul>
<ul>
<li><code>name</code> - Unique name for the connector. Attempting to register again with the same name will fail.</li>
<li><code>connector.class</code> - The Java class for the connector</li>
<li><code>tasks.max</code> - The maximum number of tasks that should be created for this connector. The connector may create fewer tasks if it cannot achieve this level of parallelism.</li>
<li><code>key.converter</code> - (optional) Override the default key converter set by the worker.</li>
<li><code>value.converter</code> - (optional) Override the default value converter set by the worker.</li>
</ul>
The <code>connector.class</code> config supports several formats: the full name or alias of the class for this connector. If the connector is org.apache.kafka.connect.file.FileStreamSinkConnector, you can either specify this full name or use FileStreamSink or FileStreamSinkConnector to make the configuration a bit shorter.
The <code>connector.class</code> config supports several formats: the full name or alias of the class for this connector. If the connector is org.apache.kafka.connect.file.FileStreamSinkConnector, you can either specify this full name or use FileStreamSink or FileStreamSinkConnector to make the configuration a bit shorter.
Sink connectors also have one additional option to control their input:
<ul>
<li><code>topics</code> - A list of topics to use as input for this connector</li>
</ul>
Sink connectors also have one additional option to control their input:
<ul>
<li><code>topics</code> - A list of topics to use as input for this connector</li>
</ul>
For any other options, you should consult the documentation for the connector.
For any other options, you should consult the documentation for the connector.
<h4><a id="connect_rest" href="#connect_rest">REST API</a></h4>
<h4><a id="connect_rest" href="#connect_rest">REST API</a></h4>
Since Kafka Connect is intended to be run as a service, it also provides a REST API for managing connectors. By default, this service runs on port 8083. The following are the currently supported endpoints:
Since Kafka Connect is intended to be run as a service, it also provides a REST API for managing connectors. By default, this service runs on port 8083. The following are the currently supported endpoints:
<ul>
<li><code>GET /connectors</code> - return a list of active connectors</li>
<li><code>POST /connectors</code> - create a new connector; the request body should be a JSON object containing a string <code>name</code> field and an object <code>config</code> field with the connector configuration parameters</li>
<li><code>GET /connectors/{name}</code> - get information about a specific connector</li>
<li><code>GET /connectors/{name}/config</code> - get the configuration parameters for a specific connector</li>
<li><code>PUT /connectors/{name}/config</code> - update the configuration parameters for a specific connector</li>
<li><code>GET /connectors/{name}/status</code> - get current status of the connector, including if it is running, failed, paused, etc., which worker it is assigned to, error information if it has failed, and the state of all its tasks</li>
<li><code>GET /connectors/{name}/tasks</code> - get a list of tasks currently running for a connector</li>
<li><code>GET /connectors/{name}/tasks/{taskid}/status</code> - get current status of the task, including if it is running, failed, paused, etc., which worker it is assigned to, and error information if it has failed</li>
<li><code>PUT /connectors/{name}/pause</code> - pause the connector and its tasks, which stops message processing until the connector is resumed</li>
<li><code>PUT /connectors/{name}/resume</code> - resume a paused connector (or do nothing if the connector is not paused)</li>
<li><code>POST /connectors/{name}/restart</code> - restart a connector (typically because it has failed)</li>
<li><code>POST /connectors/{name}/tasks/{taskId}/restart</code> - restart an individual task (typically because it has failed)</li>
<li><code>DELETE /connectors/{name}</code> - delete a connector, halting all tasks and deleting its configuration</li>
</ul>
<ul>
<li><code>GET /connectors</code> - return a list of active connectors</li>
<li><code>POST /connectors</code> - create a new connector; the request body should be a JSON object containing a string <code>name</code> field and an object <code>config</code> field with the connector configuration parameters</li>
<li><code>GET /connectors/{name}</code> - get information about a specific connector</li>
<li><code>GET /connectors/{name}/config</code> - get the configuration parameters for a specific connector</li>
<li><code>PUT /connectors/{name}/config</code> - update the configuration parameters for a specific connector</li>
<li><code>GET /connectors/{name}/status</code> - get current status of the connector, including if it is running, failed, paused, etc., which worker it is assigned to, error information if it has failed, and the state of all its tasks</li>
<li><code>GET /connectors/{name}/tasks</code> - get a list of tasks currently running for a connector</li>
<li><code>GET /connectors/{name}/tasks/{taskid}/status</code> - get current status of the task, including if it is running, failed, paused, etc., which worker it is assigned to, and error information if it has failed</li>
<li><code>PUT /connectors/{name}/pause</code> - pause the connector and its tasks, which stops message processing until the connector is resumed</li>
<li><code>PUT /connectors/{name}/resume</code> - resume a paused connector (or do nothing if the connector is not paused)</li>
<li><code>POST /connectors/{name}/restart</code> - restart a connector (typically because it has failed)</li>
<li><code>POST /connectors/{name}/tasks/{taskId}/restart</code> - restart an individual task (typically because it has failed)</li>
<li><code>DELETE /connectors/{name}</code> - delete a connector, halting all tasks and deleting its configuration</li>
</ul>
Kafka Connect also provides a REST API for getting information about connector plugins:
Kafka Connect also provides a REST API for getting information about connector plugins:
<ul>
<li><code>GET /connector-plugins</code>- return a list of connector plugins installed in the Kafka Connect cluster. Note that the API only checks for connectors on the worker that handles the request, which means you may see inconsistent results, especially during a rolling upgrade if you add new connector jars</li>
<li><code>PUT /connector-plugins/{connector-type}/config/validate</code> - validate the provided configuration values against the configuration definition. This API performs per config validation, returns suggested values and error messages during validation.</li>
</ul>
<ul>
<li><code>GET /connector-plugins</code>- return a list of connector plugins installed in the Kafka Connect cluster. Note that the API only checks for connectors on the worker that handles the request, which means you may see inconsistent results, especially during a rolling upgrade if you add new connector jars</li>
<li><code>PUT /connector-plugins/{connector-type}/config/validate</code> - validate the provided configuration values against the configuration definition. This API performs per config validation, returns suggested values and error messages during validation.</li>
</ul>
<h3><a id="connect_development" href="#connect_development">8.3 Connector Development Guide</a></h3>
<h3><a id="connect_development" href="#connect_development">8.3 Connector Development Guide</a></h3>
This guide describes how developers can write new connectors for Kafka Connect to move data between Kafka and other systems. It briefly reviews a few key concepts and then describes how to create a simple connector.
This guide describes how developers can write new connectors for Kafka Connect to move data between Kafka and other systems. It briefly reviews a few key concepts and then describes how to create a simple connector.
<h4><a id="connect_concepts" href="#connect_concepts">Core Concepts and APIs</a></h4>
<h4><a id="connect_concepts" href="#connect_concepts">Core Concepts and APIs</a></h4>
<h5><a id="connect_connectorsandtasks" href="#connect_connectorsandtasks">Connectors and Tasks</a></h5>
<h5><a id="connect_connectorsandtasks" href="#connect_connectorsandtasks">Connectors and Tasks</a></h5>
To copy data between Kafka and another system, users create a <code>Connector</code> for the system they want to pull data from or push data to. Connectors come in two flavors: <code>SourceConnectors</code> import data from another system (e.g. <code>JDBCSourceConnector</code> would import a relational database into Kafka) and <code>SinkConnectors</code> export data (e.g. <code>HDFSSinkConnector</code> would export the contents of a Kafka topic to an HDFS file).
To copy data between Kafka and another system, users create a <code>Connector</code> for the system they want to pull data from or push data to. Connectors come in two flavors: <code>SourceConnectors</code> import data from another system (e.g. <code>JDBCSourceConnector</code> would import a relational database into Kafka) and <code>SinkConnectors</code> export data (e.g. <code>HDFSSinkConnector</code> would export the contents of a Kafka topic to an HDFS file).
<code>Connectors</code> do not perform any data copying themselves: their configuration describes the data to be copied, and the <code>Connector</code> is responsible for breaking that job into a set of <code>Tasks</code> that can be distributed to workers. These <code>Tasks</code> also come in two corresponding flavors: <code>SourceTask</code> and <code>SinkTask</code>.
<code>Connectors</code> do not perform any data copying themselves: their configuration describes the data to be copied, and the <code>Connector</code> is responsible for breaking that job into a set of <code>Tasks</code> that can be distributed to workers. These <code>Tasks</code> also come in two corresponding flavors: <code>SourceTask</code> and <code>SinkTask</code>.
With an assignment in hand, each <code>Task</code> must copy its subset of the data to or from Kafka. In Kafka Connect, it should always be possible to frame these assignments as a set of input and output streams consisting of records with consistent schemas. Sometimes this mapping is obvious: each file in a set of log files can be considered a stream with each parsed line forming a record using the same schema and offsets stored as byte offsets in the file. In other cases it may require more effort to map to this model: a JDBC connector can map each table to a stream, but the offset is less clear. One possible mapping uses a timestamp column to generate queries incrementally returning new data, and the last queried timestamp can be used as the offset.
With an assignment in hand, each <code>Task</code> must copy its subset of the data to or from Kafka. In Kafka Connect, it should always be possible to frame these assignments as a set of input and output streams consisting of records with consistent schemas. Sometimes this mapping is obvious: each file in a set of log files can be considered a stream with each parsed line forming a record using the same schema and offsets stored as byte offsets in the file. In other cases it may require more effort to map to this model: a JDBC connector can map each table to a stream, but the offset is less clear. One possible mapping uses a timestamp column to generate queries incrementally returning new data, and the last queried timestamp can be used as the offset.
<h5><a id="connect_streamsandrecords" href="#connect_streamsandrecords">Streams and Records</a></h5>
<h5><a id="connect_streamsandrecords" href="#connect_streamsandrecords">Streams and Records</a></h5>
Each stream should be a sequence of key-value records. Both the keys and values can have complex structure -- many primitive types are provided, but arrays, objects, and nested data structures can be represented as well. The runtime data format does not assume any particular serialization format; this conversion is handled internally by the framework.
Each stream should be a sequence of key-value records. Both the keys and values can have complex structure -- many primitive types are provided, but arrays, objects, and nested data structures can be represented as well. The runtime data format does not assume any particular serialization format; this conversion is handled internally by the framework.
In addition to the key and value, records (both those generated by sources and those delivered to sinks) have associated stream IDs and offsets. These are used by the framework to periodically commit the offsets of data that have been processed so that in the event of failures, processing can resume from the last committed offsets, avoiding unnecessary reprocessing and duplication of events.
In addition to the key and value, records (both those generated by sources and those delivered to sinks) have associated stream IDs and offsets. These are used by the framework to periodically commit the offsets of data that have been processed so that in the event of failures, processing can resume from the last committed offsets, avoiding unnecessary reprocessing and duplication of events.
<h5><a id="connect_dynamicconnectors" href="#connect_dynamicconnectors">Dynamic Connectors</a></h5>
<h5><a id="connect_dynamicconnectors" href="#connect_dynamicconnectors">Dynamic Connectors</a></h5>
Not all jobs are static, so <code>Connector</code> implementations are also responsible for monitoring the external system for any changes that might require reconfiguration. For example, in the <code>JDBCSourceConnector</code> example, the <code>Connector</code> might assign a set of tables to each <code>Task</code>. When a new table is created, it must discover this so it can assign the new table to one of the <code>Tasks</code> by updating its configuration. When it notices a change that requires reconfiguration (or a change in the number of <code>Tasks</code>), it notifies the framework and the framework updates any corresponding <code>Tasks</code>.
Not all jobs are static, so <code>Connector</code> implementations are also responsible for monitoring the external system for any changes that might require reconfiguration. For example, in the <code>JDBCSourceConnector</code> example, the <code>Connector</code> might assign a set of tables to each <code>Task</code>. When a new table is created, it must discover this so it can assign the new table to one of the <code>Tasks</code> by updating its configuration. When it notices a change that requires reconfiguration (or a change in the number of <code>Tasks</code>), it notifies the framework and the framework updates any corresponding <code>Tasks</code>.
<h4><a id="connect_developing" href="#connect_developing">Developing a Simple Connector</a></h4>
<h4><a id="connect_developing" href="#connect_developing">Developing a Simple Connector</a></h4>
Developing a connector only requires implementing two interfaces, the <code>Connector</code> and <code>Task</code>. A simple example is included with the source code for Kafka in the <code>file</code> package. This connector is meant for use in standalone mode and has implementations of a <code>SourceConnector</code>/<code>SourceTask</code> to read each line of a file and emit it as a record and a <code>SinkConnector</code>/<code>SinkTask</code> that writes each record to a file.
Developing a connector only requires implementing two interfaces, the <code>Connector</code> and <code>Task</code>. A simple example is included with the source code for Kafka in the <code>file</code> package. This connector is meant for use in standalone mode and has implementations of a <code>SourceConnector</code>/<code>SourceTask</code> to read each line of a file and emit it as a record and a <code>SinkConnector</code>/<code>SinkTask</code> that writes each record to a file.
The rest of this section will walk through some code to demonstrate the key steps in creating a connector, but developers should also refer to the full example source code as many details are omitted for brevity.
The rest of this section will walk through some code to demonstrate the key steps in creating a connector, but developers should also refer to the full example source code as many details are omitted for brevity.
<h5><a id="connect_connectorexample" href="#connect_connectorexample">Connector Example</a></h5>
<h5><a id="connect_connectorexample" href="#connect_connectorexample">Connector Example</a></h5>
We'll cover the <code>SourceConnector</code> as a simple example. <code>SinkConnector</code> implementations are very similar. Start by creating the class that inherits from <code>SourceConnector</code> and add a couple of fields that will store parsed configuration information (the filename to read from and the topic to send data to):
We'll cover the <code>SourceConnector</code> as a simple example. <code>SinkConnector</code> implementations are very similar. Start by creating the class that inherits from <code>SourceConnector</code> and add a couple of fields that will store parsed configuration information (the filename to read from and the topic to send data to):
<pre>
public class FileStreamSourceConnector extends SourceConnector {
private String filename;
private String topic;
</pre>
<pre>
public class FileStreamSourceConnector extends SourceConnector {
private String filename;
private String topic;
</pre>
The easiest method to fill in is <code>getTaskClass()</code>, which defines the class that should be instantiated in worker processes to actually read the data:
The easiest method to fill in is <code>getTaskClass()</code>, which defines the class that should be instantiated in worker processes to actually read the data:
<pre>
@Override
public Class&lt;? extends Task&gt; getTaskClass() {
return FileStreamSourceTask.class;
}
</pre>
<pre>
@Override
public Class&lt;? extends Task&gt; getTaskClass() {
return FileStreamSourceTask.class;
}
</pre>
We will define the <code>FileStreamSourceTask</code> class below. Next, we add some standard lifecycle methods, <code>start()</code> and <code>stop()</code>:
<pre>
@Override
public void start(Map&lt;String, String&gt; props) {
// The complete version includes error handling as well.
filename = props.get(FILE_CONFIG);
topic = props.get(TOPIC_CONFIG);
}
@Override
public void stop() {
// Nothing to do since no background monitoring is required.
}
</pre>
Finally, the real core of the implementation is in <code>taskConfigs()</code>. In this case we are only
handling a single file, so even though we may be permitted to generate more tasks as per the
<code>maxTasks</code> argument, we return a list with only one entry:
<pre>
@Override
public List&lt;Map&lt;String, String&gt;&gt; taskConfigs(int maxTasks) {
ArrayList&lt;Map&lt;String, String&gt;&gt; configs = new ArrayList&lt;&gt;();
// Only one input stream makes sense.
Map&lt;String, String&gt; config = new HashMap&lt;&gt;();
if (filename != null)
config.put(FILE_CONFIG, filename);
config.put(TOPIC_CONFIG, topic);
configs.add(config);
return configs;
}
</pre>
Although not used in the example, <code>SourceTask</code> also provides two APIs to commit offsets in the source system: <code>commit</code> and <code>commitRecord</code>. The APIs are provided for source systems which have an acknowledgement mechanism for messages. Overriding these methods allows the source connector to acknowledge messages in the source system, either in bulk or individually, once they have been written to Kafka.
The <code>commit</code> API stores the offsets in the source system, up to the offsets that have been returned by <code>poll</code>. The implementation of this API should block until the commit is complete. The <code>commitRecord</code> API saves the offset in the source system for each <code>SourceRecord</code> after it is written to Kafka. As Kafka Connect will record offsets automatically, <code>SourceTask</code>s are not required to implement them. In cases where a connector does need to acknowledge messages in the source system, only one of the APIs is typically required.
Even with multiple tasks, this method implementation is usually pretty simple. It just has to determine the number of input tasks, which may require contacting the remote service it is pulling data from, and then divvy them up. Because some patterns for splitting work among tasks are so common, some utilities are provided in <code>ConnectorUtils</code> to simplify these cases.
Note that this simple example does not include dynamic input. See the discussion in the next section for how to trigger updates to task configs.
<h5><a id="connect_taskexample" href="#connect_taskexample">Task Example - Source Task</a></h5>
Next we'll describe the implementation of the corresponding <code>SourceTask</code>. The implementation is short, but too long to cover completely in this guide. We'll use pseudo-code to describe most of the implementation, but you can refer to the source code for the full example.
Just as with the connector, we need to create a class inheriting from the appropriate base <code>Task</code> class. It also has some standard lifecycle methods:
<pre>
public class FileStreamSourceTask extends SourceTask {
String filename;
InputStream stream;
String topic;
We will define the <code>FileStreamSourceTask</code> class below. Next, we add some standard lifecycle methods, <code>start()</code> and <code>stop()</code>:
<pre>
@Override
public void start(Map&lt;String, String&gt; props) {
filename = props.get(FileStreamSourceConnector.FILE_CONFIG);
stream = openOrThrowError(filename);
topic = props.get(FileStreamSourceConnector.TOPIC_CONFIG);
// The complete version includes error handling as well.
filename = props.get(FILE_CONFIG);
topic = props.get(TOPIC_CONFIG);
}
@Override
public synchronized void stop() {
stream.close();
public void stop() {
// Nothing to do since no background monitoring is required.
}
</pre>
</pre>
These are slightly simplified versions, but show that that these methods should be relatively simple and the only work they should perform is allocating or freeing resources. There are two points to note about this implementation. First, the <code>start()</code> method does not yet handle resuming from a previous offset, which will be addressed in a later section. Second, the <code>stop()</code> method is synchronized. This will be necessary because <code>SourceTasks</code> are given a dedicated thread which they can block indefinitely, so they need to be stopped with a call from a different thread in the Worker.
Finally, the real core of the implementation is in <code>taskConfigs()</code>. In this case we are only
handling a single file, so even though we may be permitted to generate more tasks as per the
<code>maxTasks</code> argument, we return a list with only one entry:
Next, we implement the main functionality of the task, the <code>poll()</code> method which gets events from the input system and returns a <code>List&lt;SourceRecord&gt;</code>:
<pre>
@Override
public List&lt;Map&lt;String, String&gt;&gt; taskConfigs(int maxTasks) {
ArrayList&lt;Map&lt;String, String&gt;&gt; configs = new ArrayList&lt;&gt;();
// Only one input stream makes sense.
Map&lt;String, String&gt; config = new HashMap&lt;&gt;();
if (filename != null)
config.put(FILE_CONFIG, filename);
config.put(TOPIC_CONFIG, topic);
configs.add(config);
return configs;
}
</pre>
<pre>
@Override
public List&lt;SourceRecord&gt; poll() throws InterruptedException {
try {
ArrayList&lt;SourceRecord&gt; records = new ArrayList&lt;&gt;();
while (streamValid(stream) &amp;&amp; records.isEmpty()) {
LineAndOffset line = readToNextLine(stream);
if (line != null) {
Map&lt;String, Object&gt; sourcePartition = Collections.singletonMap("filename", filename);
Map&lt;String, Object&gt; sourceOffset = Collections.singletonMap("position", streamOffset);
records.add(new SourceRecord(sourcePartition, sourceOffset, topic, Schema.STRING_SCHEMA, line));
} else {
Thread.sleep(1);
}
Although not used in the example, <code>SourceTask</code> also provides two APIs to commit offsets in the source system: <code>commit</code> and <code>commitRecord</code>. The APIs are provided for source systems which have an acknowledgement mechanism for messages. Overriding these methods allows the source connector to acknowledge messages in the source system, either in bulk or individually, once they have been written to Kafka.
The <code>commit</code> API stores the offsets in the source system, up to the offsets that have been returned by <code>poll</code>. The implementation of this API should block until the commit is complete. The <code>commitRecord</code> API saves the offset in the source system for each <code>SourceRecord</code> after it is written to Kafka. As Kafka Connect will record offsets automatically, <code>SourceTask</code>s are not required to implement them. In cases where a connector does need to acknowledge messages in the source system, only one of the APIs is typically required.
Even with multiple tasks, this method implementation is usually pretty simple. It just has to determine the number of input tasks, which may require contacting the remote service it is pulling data from, and then divvy them up. Because some patterns for splitting work among tasks are so common, some utilities are provided in <code>ConnectorUtils</code> to simplify these cases.
Note that this simple example does not include dynamic input. See the discussion in the next section for how to trigger updates to task configs.
<h5><a id="connect_taskexample" href="#connect_taskexample">Task Example - Source Task</a></h5>
Next we'll describe the implementation of the corresponding <code>SourceTask</code>. The implementation is short, but too long to cover completely in this guide. We'll use pseudo-code to describe most of the implementation, but you can refer to the source code for the full example.
Just as with the connector, we need to create a class inheriting from the appropriate base <code>Task</code> class. It also has some standard lifecycle methods:
<pre>
public class FileStreamSourceTask extends SourceTask {
String filename;
InputStream stream;
String topic;
@Override
public void start(Map&lt;String, String&gt; props) {
filename = props.get(FileStreamSourceConnector.FILE_CONFIG);
stream = openOrThrowError(filename);
topic = props.get(FileStreamSourceConnector.TOPIC_CONFIG);
}
return records;
} catch (IOException e) {
// Underlying stream was killed, probably as a result of calling stop. Allow to return
// null, and driving thread will handle any shutdown if necessary.
@Override
public synchronized void stop() {
stream.close();
}
</pre>
These are slightly simplified versions, but show that that these methods should be relatively simple and the only work they should perform is allocating or freeing resources. There are two points to note about this implementation. First, the <code>start()</code> method does not yet handle resuming from a previous offset, which will be addressed in a later section. Second, the <code>stop()</code> method is synchronized. This will be necessary because <code>SourceTasks</code> are given a dedicated thread which they can block indefinitely, so they need to be stopped with a call from a different thread in the Worker.
Next, we implement the main functionality of the task, the <code>poll()</code> method which gets events from the input system and returns a <code>List&lt;SourceRecord&gt;</code>:
<pre>
@Override
public List&lt;SourceRecord&gt; poll() throws InterruptedException {
try {
ArrayList&lt;SourceRecord&gt; records = new ArrayList&lt;&gt;();
while (streamValid(stream) &amp;&amp; records.isEmpty()) {
LineAndOffset line = readToNextLine(stream);
if (line != null) {
Map&lt;String, Object&gt; sourcePartition = Collections.singletonMap("filename", filename);
Map&lt;String, Object&gt; sourceOffset = Collections.singletonMap("position", streamOffset);
records.add(new SourceRecord(sourcePartition, sourceOffset, topic, Schema.STRING_SCHEMA, line));
} else {
Thread.sleep(1);
}
}
return records;
} catch (IOException e) {
// Underlying stream was killed, probably as a result of calling stop. Allow to return
// null, and driving thread will handle any shutdown if necessary.
}
return null;
}
return null;
}
</pre>
</pre>
Again, we've omitted some details, but we can see the important steps: the <code>poll()</code> method is going to be called repeatedly, and for each call it will loop trying to read records from the file. For each line it reads, it also tracks the file offset. It uses this information to create an output <code>SourceRecord</code> with four pieces of information: the source partition (there is only one, the single file being read), source offset (byte offset in the file), output topic name, and output value (the line, and we include a schema indicating this value will always be a string). Other variants of the <code>SourceRecord</code> constructor can also include a specific output partition and a key.
Again, we've omitted some details, but we can see the important steps: the <code>poll()</code> method is going to be called repeatedly, and for each call it will loop trying to read records from the file. For each line it reads, it also tracks the file offset. It uses this information to create an output <code>SourceRecord</code> with four pieces of information: the source partition (there is only one, the single file being read), source offset (byte offset in the file), output topic name, and output value (the line, and we include a schema indicating this value will always be a string). Other variants of the <code>SourceRecord</code> constructor can also include a specific output partition and a key.
Note that this implementation uses the normal Java <code>InputStream</code> interface and may sleep if data is not available. This is acceptable because Kafka Connect provides each task with a dedicated thread. While task implementations have to conform to the basic <code>poll()</code> interface, they have a lot of flexibility in how they are implemented. In this case, an NIO-based implementation would be more efficient, but this simple approach works, is quick to implement, and is compatible with older versions of Java.
Note that this implementation uses the normal Java <code>InputStream</code> interface and may sleep if data is not available. This is acceptable because Kafka Connect provides each task with a dedicated thread. While task implementations have to conform to the basic <code>poll()</code> interface, they have a lot of flexibility in how they are implemented. In this case, an NIO-based implementation would be more efficient, but this simple approach works, is quick to implement, and is compatible with older versions of Java.
<h5><a id="connect_sinktasks" href="#connect_sinktasks">Sink Tasks</a></h5>
<h5><a id="connect_sinktasks" href="#connect_sinktasks">Sink Tasks</a></h5>
The previous section described how to implement a simple <code>SourceTask</code>. Unlike <code>SourceConnector</code> and <code>SinkConnector</code>, <code>SourceTask</code> and <code>SinkTask</code> have very different interfaces because <code>SourceTask</code> uses a pull interface and <code>SinkTask</code> uses a push interface. Both share the common lifecycle methods, but the <code>SinkTask</code> interface is quite different:
The previous section described how to implement a simple <code>SourceTask</code>. Unlike <code>SourceConnector</code> and <code>SinkConnector</code>, <code>SourceTask</code> and <code>SinkTask</code> have very different interfaces because <code>SourceTask</code> uses a pull interface and <code>SinkTask</code> uses a push interface. Both share the common lifecycle methods, but the <code>SinkTask</code> interface is quite different:
<pre>
public abstract class SinkTask implements Task {
public void initialize(SinkTaskContext context) {
this.context = context;
}
<pre>
public abstract class SinkTask implements Task {
public void initialize(SinkTaskContext context) {
this.context = context;
}
public abstract void put(Collection&lt;SinkRecord&gt; records);
public abstract void flush(Map&lt;TopicPartition, Long&gt; offsets);
</pre>
public abstract void put(Collection&lt;SinkRecord&gt; records);
public abstract void flush(Map&lt;TopicPartition, Long&gt; offsets);
</pre>
The <code>SinkTask</code> documentation contains full details, but this interface is nearly as simple as the <code>SourceTask</code>. The <code>put()</code> method should contain most of the implementation, accepting sets of <code>SinkRecords</code>, performing any required translation, and storing them in the destination system. This method does not need to ensure the data has been fully written to the destination system before returning. In fact, in many cases internal buffering will be useful so an entire batch of records can be sent at once, reducing the overhead of inserting events into the downstream data store. The <code>SinkRecords</code> contain essentially the same information as <code>SourceRecords</code>: Kafka topic, partition, offset and the event key and value.
The <code>SinkTask</code> documentation contains full details, but this interface is nearly as simple as the <code>SourceTask</code>. The <code>put()</code> method should contain most of the implementation, accepting sets of <code>SinkRecords</code>, performing any required translation, and storing them in the destination system. This method does not need to ensure the data has been fully written to the destination system before returning. In fact, in many cases internal buffering will be useful so an entire batch of records can be sent at once, reducing the overhead of inserting events into the downstream data store. The <code>SinkRecords</code> contain essentially the same information as <code>SourceRecords</code>: Kafka topic, partition, offset and the event key and value.
The <code>flush()</code> method is used during the offset commit process, which allows tasks to recover from failures and resume from a safe point such that no events will be missed. The method should push any outstanding data to the destination system and then block until the write has been acknowledged. The <code>offsets</code> parameter can often be ignored, but is useful in some cases where implementations want to store offset information in the destination store to provide exactly-once
delivery. For example, an HDFS connector could do this and use atomic move operations to make sure the <code>flush()</code> operation atomically commits the data and offsets to a final location in HDFS.
The <code>flush()</code> method is used during the offset commit process, which allows tasks to recover from failures and resume from a safe point such that no events will be missed. The method should push any outstanding data to the destination system and then block until the write has been acknowledged. The <code>offsets</code> parameter can often be ignored, but is useful in some cases where implementations want to store offset information in the destination store to provide exactly-once
delivery. For example, an HDFS connector could do this and use atomic move operations to make sure the <code>flush()</code> operation atomically commits the data and offsets to a final location in HDFS.
<h5><a id="connect_resuming" href="#connect_resuming">Resuming from Previous Offsets</a></h5>
<h5><a id="connect_resuming" href="#connect_resuming">Resuming from Previous Offsets</a></h5>
The <code>SourceTask</code> implementation included a stream ID (the input filename) and offset (position in the file) with each record. The framework uses this to commit offsets periodically so that in the case of a failure, the task can recover and minimize the number of events that are reprocessed and possibly duplicated (or to resume from the most recent offset if Kafka Connect was stopped gracefully, e.g. in standalone mode or due to a job reconfiguration). This commit process is completely automated by the framework, but only the connector knows how to seek back to the right position in the input stream to resume from that location.
The <code>SourceTask</code> implementation included a stream ID (the input filename) and offset (position in the file) with each record. The framework uses this to commit offsets periodically so that in the case of a failure, the task can recover and minimize the number of events that are reprocessed and possibly duplicated (or to resume from the most recent offset if Kafka Connect was stopped gracefully, e.g. in standalone mode or due to a job reconfiguration). This commit process is completely automated by the framework, but only the connector knows how to seek back to the right position in the input stream to resume from that location.
To correctly resume upon startup, the task can use the <code>SourceContext</code> passed into its <code>initialize()</code> method to access the offset data. In <code>initialize()</code>, we would add a bit more code to read the offset (if it exists) and seek to that position:
To correctly resume upon startup, the task can use the <code>SourceContext</code> passed into its <code>initialize()</code> method to access the offset data. In <code>initialize()</code>, we would add a bit more code to read the offset (if it exists) and seek to that position:
<pre>
stream = new FileInputStream(filename);
Map&lt;String, Object&gt; offset = context.offsetStorageReader().offset(Collections.singletonMap(FILENAME_FIELD, filename));
if (offset != null) {
Long lastRecordedOffset = (Long) offset.get("position");
if (lastRecordedOffset != null)
seekToOffset(stream, lastRecordedOffset);
}
</pre>
<pre>
stream = new FileInputStream(filename);
Map&lt;String, Object&gt; offset = context.offsetStorageReader().offset(Collections.singletonMap(FILENAME_FIELD, filename));
if (offset != null) {
Long lastRecordedOffset = (Long) offset.get("position");
if (lastRecordedOffset != null)
seekToOffset(stream, lastRecordedOffset);
}
</pre>
Of course, you might need to read many keys for each of the input streams. The <code>OffsetStorageReader</code> interface also allows you to issue bulk reads to efficiently load all offsets, then apply them by seeking each input stream to the appropriate position.
Of course, you might need to read many keys for each of the input streams. The <code>OffsetStorageReader</code> interface also allows you to issue bulk reads to efficiently load all offsets, then apply them by seeking each input stream to the appropriate position.
<h4><a id="connect_dynamicio" href="#connect_dynamicio">Dynamic Input/Output Streams</a></h4>
<h4><a id="connect_dynamicio" href="#connect_dynamicio">Dynamic Input/Output Streams</a></h4>
Kafka Connect is intended to define bulk data copying jobs, such as copying an entire database rather than creating many jobs to copy each table individually. One consequence of this design is that the set of input or output streams for a connector can vary over time.
Kafka Connect is intended to define bulk data copying jobs, such as copying an entire database rather than creating many jobs to copy each table individually. One consequence of this design is that the set of input or output streams for a connector can vary over time.
Source connectors need to monitor the source system for changes, e.g. table additions/deletions in a database. When they pick up changes, they should notify the framework via the <code>ConnectorContext</code> object that reconfiguration is necessary. For example, in a <code>SourceConnector</code>:
Source connectors need to monitor the source system for changes, e.g. table additions/deletions in a database. When they pick up changes, they should notify the framework via the <code>ConnectorContext</code> object that reconfiguration is necessary. For example, in a <code>SourceConnector</code>:
<pre>
if (inputsChanged())
this.context.requestTaskReconfiguration();
</pre>
<pre>
if (inputsChanged())
this.context.requestTaskReconfiguration();
</pre>
The framework will promptly request new configuration information and update the tasks, allowing them to gracefully commit their progress before reconfiguring them. Note that in the <code>SourceConnector</code> this monitoring is currently left up to the connector implementation. If an extra thread is required to perform this monitoring, the connector must allocate it itself.
The framework will promptly request new configuration information and update the tasks, allowing them to gracefully commit their progress before reconfiguring them. Note that in the <code>SourceConnector</code> this monitoring is currently left up to the connector implementation. If an extra thread is required to perform this monitoring, the connector must allocate it itself.
Ideally this code for monitoring changes would be isolated to the <code>Connector</code> and tasks would not need to worry about them. However, changes can also affect tasks, most commonly when one of their input streams is destroyed in the input system, e.g. if a table is dropped from a database. If the <code>Task</code> encounters the issue before the <code>Connector</code>, which will be common if the <code>Connector</code> needs to poll for changes, the <code>Task</code> will need to handle the subsequent error. Thankfully, this can usually be handled simply by catching and handling the appropriate exception.
Ideally this code for monitoring changes would be isolated to the <code>Connector</code> and tasks would not need to worry about them. However, changes can also affect tasks, most commonly when one of their input streams is destroyed in the input system, e.g. if a table is dropped from a database. If the <code>Task</code> encounters the issue before the <code>Connector</code>, which will be common if the <code>Connector</code> needs to poll for changes, the <code>Task</code> will need to handle the subsequent error. Thankfully, this can usually be handled simply by catching and handling the appropriate exception.
<code>SinkConnectors</code> usually only have to handle the addition of streams, which may translate to new entries in their outputs (e.g., a new database table). The framework manages any changes to the Kafka input, such as when the set of input topics changes because of a regex subscription. <code>SinkTasks</code> should expect new input streams, which may require creating new resources in the downstream system, such as a new table in a database. The trickiest situation to handle in these cases may be conflicts between multiple <code>SinkTasks</code> seeing a new input stream for the first time and simultaneously trying to create the new resource. <code>SinkConnectors</code>, on the other hand, will generally require no special code for handling a dynamic set of streams.
<code>SinkConnectors</code> usually only have to handle the addition of streams, which may translate to new entries in their outputs (e.g., a new database table). The framework manages any changes to the Kafka input, such as when the set of input topics changes because of a regex subscription. <code>SinkTasks</code> should expect new input streams, which may require creating new resources in the downstream system, such as a new table in a database. The trickiest situation to handle in these cases may be conflicts between multiple <code>SinkTasks</code> seeing a new input stream for the first time and simultaneously trying to create the new resource. <code>SinkConnectors</code>, on the other hand, will generally require no special code for handling a dynamic set of streams.
<h4><a id="connect_configs" href="#connect_configs">Connect Configuration Validation</a></h4>
<h4><a id="connect_configs" href="#connect_configs">Connect Configuration Validation</a></h4>
Kafka Connect allows you to validate connector configurations before submitting a connector to be executed and can provide feedback about errors and recommended values. To take advantage of this, connector developers need to provide an implementation of <code>config()</code> to expose the configuration definition to the framework.
Kafka Connect allows you to validate connector configurations before submitting a connector to be executed and can provide feedback about errors and recommended values. To take advantage of this, connector developers need to provide an implementation of <code>config()</code> to expose the configuration definition to the framework.
The following code in <code>FileStreamSourceConnector</code> defines the configuration and exposes it to the framework.
The following code in <code>FileStreamSourceConnector</code> defines the configuration and exposes it to the framework.
<pre>
private static final ConfigDef CONFIG_DEF = new ConfigDef()
.define(FILE_CONFIG, Type.STRING, Importance.HIGH, "Source filename.")
.define(TOPIC_CONFIG, Type.STRING, Importance.HIGH, "The topic to publish data to");
<pre>
private static final ConfigDef CONFIG_DEF = new ConfigDef()
.define(FILE_CONFIG, Type.STRING, Importance.HIGH, "Source filename.")
.define(TOPIC_CONFIG, Type.STRING, Importance.HIGH, "The topic to publish data to");
public ConfigDef config() {
return CONFIG_DEF;
}
</pre>
public ConfigDef config() {
return CONFIG_DEF;
}
</pre>
<code>ConfigDef</code> class is used for specifying the set of expected configurations. For each configuration, you can specify the name, the type, the default value, the documentation, the group information, the order in the group, the width of the configuration value and the name suitable for display in the UI. Plus, you can provide special validation logic used for single configuration validation by overriding the <code>Validator</code> class. Moreover, as there may be dependencies between configurations, for example, the valid values and visibility of a configuration may change according to the values of other configurations. To handle this, <code>ConfigDef</code> allows you to specify the dependents of a configuration and to provide an implementation of <code>Recommender</code> to get valid values and set visibility of a configuration given the current configuration values.
<code>ConfigDef</code> class is used for specifying the set of expected configurations. For each configuration, you can specify the name, the type, the default value, the documentation, the group information, the order in the group, the width of the configuration value and the name suitable for display in the UI. Plus, you can provide special validation logic used for single configuration validation by overriding the <code>Validator</code> class. Moreover, as there may be dependencies between configurations, for example, the valid values and visibility of a configuration may change according to the values of other configurations. To handle this, <code>ConfigDef</code> allows you to specify the dependents of a configuration and to provide an implementation of <code>Recommender</code> to get valid values and set visibility of a configuration given the current configuration values.
Also, the <code>validate()</code> method in <code>Connector</code> provides a default validation implementation which returns a list of allowed configurations together with configuration errors and recommended values for each configuration. However, it does not use the recommended values for configuration validation. You may provide an override of the default implementation for customized configuration validation, which may use the recommended values.
Also, the <code>validate()</code> method in <code>Connector</code> provides a default validation implementation which returns a list of allowed configurations together with configuration errors and recommended values for each configuration. However, it does not use the recommended values for configuration validation. You may provide an override of the default implementation for customized configuration validation, which may use the recommended values.
<h4><a id="connect_schemas" href="#connect_schemas">Working with Schemas</a></h4>
<h4><a id="connect_schemas" href="#connect_schemas">Working with Schemas</a></h4>
The FileStream connectors are good examples because they are simple, but they also have trivially structured data -- each line is just a string. Almost all practical connectors will need schemas with more complex data formats.
The FileStream connectors are good examples because they are simple, but they also have trivially structured data -- each line is just a string. Almost all practical connectors will need schemas with more complex data formats.
To create more complex data, you'll need to work with the Kafka Connect <code>data</code> API. Most structured records will need to interact with two classes in addition to primitive types: <code>Schema</code> and <code>Struct</code>.
To create more complex data, you'll need to work with the Kafka Connect <code>data</code> API. Most structured records will need to interact with two classes in addition to primitive types: <code>Schema</code> and <code>Struct</code>.
The API documentation provides a complete reference, but here is a simple example creating a <code>Schema</code> and <code>Struct</code>:
The API documentation provides a complete reference, but here is a simple example creating a <code>Schema</code> and <code>Struct</code>:
<pre>
Schema schema = SchemaBuilder.struct().name(NAME)
.field("name", Schema.STRING_SCHEMA)
.field("age", Schema.INT_SCHEMA)
.field("admin", new SchemaBuilder.boolean().defaultValue(false).build())
.build();
<pre>
Schema schema = SchemaBuilder.struct().name(NAME)
.field("name", Schema.STRING_SCHEMA)
.field("age", Schema.INT_SCHEMA)
.field("admin", new SchemaBuilder.boolean().defaultValue(false).build())
.build();
Struct struct = new Struct(schema)
.put("name", "Barbara Liskov")
.put("age", 75);
</pre>
Struct struct = new Struct(schema)
.put("name", "Barbara Liskov")
.put("age", 75);
</pre>
If you are implementing a source connector, you'll need to decide when and how to create schemas. Where possible, you should avoid recomputing them as much as possible. For example, if your connector is guaranteed to have a fixed schema, create it statically and reuse a single instance.
If you are implementing a source connector, you'll need to decide when and how to create schemas. Where possible, you should avoid recomputing them as much as possible. For example, if your connector is guaranteed to have a fixed schema, create it statically and reuse a single instance.
However, many connectors will have dynamic schemas. One simple example of this is a database connector. Considering even just a single table, the schema will not be predefined for the entire connector (as it varies from table to table). But it also may not be fixed for a single table over the lifetime of the connector since the user may execute an <code>ALTER TABLE</code> command. The connector must be able to detect these changes and react appropriately.
However, many connectors will have dynamic schemas. One simple example of this is a database connector. Considering even just a single table, the schema will not be predefined for the entire connector (as it varies from table to table). But it also may not be fixed for a single table over the lifetime of the connector since the user may execute an <code>ALTER TABLE</code> command. The connector must be able to detect these changes and react appropriately.
Sink connectors are usually simpler because they are consuming data and therefore do not need to create schemas. However, they should take just as much care to validate that the schemas they receive have the expected format. When the schema does not match -- usually indicating the upstream producer is generating invalid data that cannot be correctly translated to the destination system -- sink connectors should throw an exception to indicate this error to the system.
Sink connectors are usually simpler because they are consuming data and therefore do not need to create schemas. However, they should take just as much care to validate that the schemas they receive have the expected format. When the schema does not match -- usually indicating the upstream producer is generating invalid data that cannot be correctly translated to the destination system -- sink connectors should throw an exception to indicate this error to the system.
<h4><a id="connect_administration" href="#connect_administration">Kafka Connect Administration</a></h4>
<h4><a id="connect_administration" href="#connect_administration">Kafka Connect Administration</a></h4>
<p>
Kafka Connect's <a href="#connect_rest">REST layer</a> provides a set of APIs to enable administration of the cluster. This includes APIs to view the configuration of connectors and the status of their tasks, as well as to alter their current behavior (e.g. changing configuration and restarting tasks).
</p>
<p>
Kafka Connect's <a href="#connect_rest">REST layer</a> provides a set of APIs to enable administration of the cluster. This includes APIs to view the configuration of connectors and the status of their tasks, as well as to alter their current behavior (e.g. changing configuration and restarting tasks).
</p>
<p>
When a connector is first submitted to the cluster, the workers rebalance the full set of connectors in the cluster and their tasks so that each worker has approximately the same amount of work. This same rebalancing procedure is also used when connectors increase or decrease the number of tasks they require, or when a connector's configuration is changed. You can use the REST API to view the current status of a connector and its tasks, including the id of the worker to which each was assigned. For example, querying the status of a file source (using <code>GET /connectors/file-source/status</code>) might produce output like the following:
</p>
<p>
When a connector is first submitted to the cluster, the workers rebalance the full set of connectors in the cluster and their tasks so that each worker has approximately the same amount of work. This same rebalancing procedure is also used when connectors increase or decrease the number of tasks they require, or when a connector's configuration is changed. You can use the REST API to view the current status of a connector and its tasks, including the id of the worker to which each was assigned. For example, querying the status of a file source (using <code>GET /connectors/file-source/status</code>) might produce output like the following:
</p>
<pre>
{
"name": "file-source",
"connector": {
"state": "RUNNING",
"worker_id": "192.168.1.208:8083"
},
"tasks": [
<pre>
{
"id": 0,
"state": "RUNNING",
"worker_id": "192.168.1.209:8083"
"name": "file-source",
"connector": {
"state": "RUNNING",
"worker_id": "192.168.1.208:8083"
},
"tasks": [
{
"id": 0,
"state": "RUNNING",
"worker_id": "192.168.1.209:8083"
}
]
}
]
}
</pre>
</pre>
<p>
Connectors and their tasks publish status updates to a shared topic (configured with <code>status.storage.topic</code>) which all workers in the cluster monitor. Because the workers consume this topic asynchronously, there is typically a (short) delay before a state change is visible through the status API. The following states are possible for a connector or one of its tasks:
</p>
<p>
Connectors and their tasks publish status updates to a shared topic (configured with <code>status.storage.topic</code>) which all workers in the cluster monitor. Because the workers consume this topic asynchronously, there is typically a (short) delay before a state change is visible through the status API. The following states are possible for a connector or one of its tasks:
</p>
<ul>
<li><b>UNASSIGNED:</b> The connector/task has not yet been assigned to a worker.</li>
<li><b>RUNNING:</b> The connector/task is running.</li>
<li><b>PAUSED:</b> The connector/task has been administratively paused.</li>
<li><b>FAILED:</b> The connector/task has failed (usually by raising an exception, which is reported in the status output).</li>
</ul>
<ul>
<li><b>UNASSIGNED:</b> The connector/task has not yet been assigned to a worker.</li>
<li><b>RUNNING:</b> The connector/task is running.</li>
<li><b>PAUSED:</b> The connector/task has been administratively paused.</li>
<li><b>FAILED:</b> The connector/task has failed (usually by raising an exception, which is reported in the status output).</li>
</ul>
<p>
In most cases, connector and task states will match, though they may be different for short periods of time when changes are occurring or if tasks have failed. For example, when a connector is first started, there may be a noticeable delay before the connector and its tasks have all transitioned to the RUNNING state. States will also diverge when tasks fail since Connect does not automatically restart failed tasks. To restart a connector/task manually, you can use the restart APIs listed above. Note that if you try to restart a task while a rebalance is taking place, Connect will return a 409 (Conflict) status code. You can retry after the rebalance completes, but it might not be necessary since rebalances effectively restart all the connectors and tasks in the cluster.
</p>
<p>
In most cases, connector and task states will match, though they may be different for short periods of time when changes are occurring or if tasks have failed. For example, when a connector is first started, there may be a noticeable delay before the connector and its tasks have all transitioned to the RUNNING state. States will also diverge when tasks fail since Connect does not automatically restart failed tasks. To restart a connector/task manually, you can use the restart APIs listed above. Note that if you try to restart a task while a rebalance is taking place, Connect will return a 409 (Conflict) status code. You can retry after the rebalance completes, but it might not be necessary since rebalances effectively restart all the connectors and tasks in the cluster.
</p>
<p>
It's sometimes useful to temporarily stop the message processing of a connector. For example, if the remote system is undergoing maintenance, it would be preferable for source connectors to stop polling it for new data instead of filling logs with exception spam. For this use case, Connect offers a pause/resume API. While a source connector is paused, Connect will stop polling it for additional records. While a sink connector is paused, Connect will stop pushing new messages to it. The pause state is persistent, so even if you restart the cluster, the connector will not begin message processing again until the task has been resumed. Note that there may be a delay before all of a connector's tasks have transitioned to the PAUSED state since it may take time for them to finish whatever processing they were in the middle of when being paused. Additionally, failed tasks will not transition to the PAUSED state until they have been restarted.
</p>
<p>
It's sometimes useful to temporarily stop the message processing of a connector. For example, if the remote system is undergoing maintenance, it would be preferable for source connectors to stop polling it for new data instead of filling logs with exception spam. For this use case, Connect offers a pause/resume API. While a source connector is paused, Connect will stop polling it for additional records. While a sink connector is paused, Connect will stop pushing new messages to it. The pause state is persistent, so even if you restart the cluster, the connector will not begin message processing again until the task has been resumed. Note that there may be a delay before all of a connector's tasks have transitioned to the PAUSED state since it may take time for them to finish whatever processing they were in the middle of when being paused. Additionally, failed tasks will not transition to the PAUSED state until they have been restarted.
</p>
</script>
<div class="p-connect"></div>

File diff suppressed because it is too large Load Diff

View File

@ -15,6 +15,8 @@
limitations under the License.
-->
<script><!--#include virtual="js/templateData.js" --></script>
<!--#include virtual="../includes/_header.htm" -->
<!--#include virtual="../includes/_top.htm" -->

View File

@ -15,391 +15,395 @@
limitations under the License.
-->
<h3><a id="apidesign" href="#apidesign">5.1 API Design</a></h3>
<script id="implementation-template" type="text/x-handlebars-template">
<h3><a id="apidesign" href="#apidesign">5.1 API Design</a></h3>
<h4><a id="impl_producer" href="#impl_producer">Producer APIs</a></h4>
<h4><a id="impl_producer" href="#impl_producer">Producer APIs</a></h4>
<p>
The Producer API that wraps the 2 low-level producers - <code>kafka.producer.SyncProducer</code> and <code>kafka.producer.async.AsyncProducer</code>.
<pre>
class Producer<T> {
<p>
The Producer API that wraps the 2 low-level producers - <code>kafka.producer.SyncProducer</code> and <code>kafka.producer.async.AsyncProducer</code>.
<pre>
class Producer<T> {
/* Sends the data, partitioned by key to the topic using either the */
/* synchronous or the asynchronous producer */
public void send(kafka.javaapi.producer.ProducerData&lt;K,V&gt; producerData);
/* Sends the data, partitioned by key to the topic using either the */
/* synchronous or the asynchronous producer */
public void send(kafka.javaapi.producer.ProducerData&lt;K,V&gt; producerData);
/* Sends a list of data, partitioned by key to the topic using either */
/* the synchronous or the asynchronous producer */
public void send(java.util.List&lt;kafka.javaapi.producer.ProducerData&lt;K,V&gt;&gt; producerData);
/* Sends a list of data, partitioned by key to the topic using either */
/* the synchronous or the asynchronous producer */
public void send(java.util.List&lt;kafka.javaapi.producer.ProducerData&lt;K,V&gt;&gt; producerData);
/* Closes the producer and cleans up */
public void close();
/* Closes the producer and cleans up */
public void close();
}
</pre>
}
</pre>
The goal is to expose all the producer functionality through a single API to the client.
The goal is to expose all the producer functionality through a single API to the client.
The Kafka producer
<ul>
<li>can handle queueing/buffering of multiple producer requests and asynchronous dispatch of the batched data:
<p><code>kafka.producer.Producer</code> provides the ability to batch multiple produce requests (<code>producer.type=async</code>), before serializing and dispatching them to the appropriate kafka broker partition. The size of the batch can be controlled by a few config parameters. As events enter a queue, they are buffered in a queue, until either <code>queue.time</code> or <code>batch.size</code> is reached. A background thread (<code>kafka.producer.async.ProducerSendThread</code>) dequeues the batch of data and lets the <code>kafka.producer.EventHandler</code> serialize and send the data to the appropriate kafka broker partition. A custom event handler can be plugged in through the <code>event.handler</code> config parameter. At various stages of this producer queue pipeline, it is helpful to be able to inject callbacks, either for plugging in custom logging/tracing code or custom monitoring logic. This is possible by implementing the <code>kafka.producer.async.CallbackHandler</code> interface and setting <code>callback.handler</code> config parameter to that class.
</p>
</li>
<li>handles the serialization of data through a user-specified <code>Encoder</code>:
<pre>
interface Encoder&lt;T&gt; {
public Message toMessage(T data);
}
</pre>
<p>The default is the no-op <code>kafka.serializer.DefaultEncoder</code></p>
</li>
<li>provides software load balancing through an optionally user-specified <code>Partitioner</code>:
<p>
The routing decision is influenced by the <code>kafka.producer.Partitioner</code>.
<pre>
interface Partitioner&lt;T&gt; {
int partition(T key, int numPartitions);
}
</pre>
The partition API uses the key and the number of available broker partitions to return a partition id. This id is used as an index into a sorted list of broker_ids and partitions to pick a broker partition for the producer request. The default partitioning strategy is <code>hash(key)%numPartitions</code>. If the key is null, then a random broker partition is picked. A custom partitioning strategy can also be plugged in using the <code>partitioner.class</code> config parameter.
</p>
</li>
</ul>
</p>
The Kafka producer
<ul>
<li>can handle queueing/buffering of multiple producer requests and asynchronous dispatch of the batched data:
<p><code>kafka.producer.Producer</code> provides the ability to batch multiple produce requests (<code>producer.type=async</code>), before serializing and dispatching them to the appropriate kafka broker partition. The size of the batch can be controlled by a few config parameters. As events enter a queue, they are buffered in a queue, until either <code>queue.time</code> or <code>batch.size</code> is reached. A background thread (<code>kafka.producer.async.ProducerSendThread</code>) dequeues the batch of data and lets the <code>kafka.producer.EventHandler</code> serialize and send the data to the appropriate kafka broker partition. A custom event handler can be plugged in through the <code>event.handler</code> config parameter. At various stages of this producer queue pipeline, it is helpful to be able to inject callbacks, either for plugging in custom logging/tracing code or custom monitoring logic. This is possible by implementing the <code>kafka.producer.async.CallbackHandler</code> interface and setting <code>callback.handler</code> config parameter to that class.
</p>
</li>
<li>handles the serialization of data through a user-specified <code>Encoder</code>:
<pre>
interface Encoder&lt;T&gt; {
public Message toMessage(T data);
}
</pre>
<p>The default is the no-op <code>kafka.serializer.DefaultEncoder</code></p>
</li>
<li>provides software load balancing through an optionally user-specified <code>Partitioner</code>:
<p>
The routing decision is influenced by the <code>kafka.producer.Partitioner</code>.
<pre>
interface Partitioner&lt;T&gt; {
int partition(T key, int numPartitions);
}
</pre>
The partition API uses the key and the number of available broker partitions to return a partition id. This id is used as an index into a sorted list of broker_ids and partitions to pick a broker partition for the producer request. The default partitioning strategy is <code>hash(key)%numPartitions</code>. If the key is null, then a random broker partition is picked. A custom partitioning strategy can also be plugged in using the <code>partitioner.class</code> config parameter.
</p>
</li>
</ul>
</p>
<h4><a id="impl_consumer" href="#impl_consumer">Consumer APIs</a></h4>
<p>
We have 2 levels of consumer APIs. The low-level "simple" API maintains a connection to a single broker and has a close correspondence to the network requests sent to the server. This API is completely stateless, with the offset being passed in on every request, allowing the user to maintain this metadata however they choose.
</p>
<p>
The high-level API hides the details of brokers from the consumer and allows consuming off the cluster of machines without concern for the underlying topology. It also maintains the state of what has been consumed. The high-level API also provides the ability to subscribe to topics that match a filter expression (i.e., either a whitelist or a blacklist regular expression).
</p>
<h4><a id="impl_consumer" href="#impl_consumer">Consumer APIs</a></h4>
<p>
We have 2 levels of consumer APIs. The low-level "simple" API maintains a connection to a single broker and has a close correspondence to the network requests sent to the server. This API is completely stateless, with the offset being passed in on every request, allowing the user to maintain this metadata however they choose.
</p>
<p>
The high-level API hides the details of brokers from the consumer and allows consuming off the cluster of machines without concern for the underlying topology. It also maintains the state of what has been consumed. The high-level API also provides the ability to subscribe to topics that match a filter expression (i.e., either a whitelist or a blacklist regular expression).
</p>
<h5><a id="impl_lowlevel" href="#impl_lowlevel">Low-level API</a></h5>
<pre>
class SimpleConsumer {
<h5><a id="impl_lowlevel" href="#impl_lowlevel">Low-level API</a></h5>
<pre>
class SimpleConsumer {
/* Send fetch request to a broker and get back a set of messages. */
public ByteBufferMessageSet fetch(FetchRequest request);
/* Send fetch request to a broker and get back a set of messages. */
public ByteBufferMessageSet fetch(FetchRequest request);
/* Send a list of fetch requests to a broker and get back a response set. */
public MultiFetchResponse multifetch(List&lt;FetchRequest&gt; fetches);
/* Send a list of fetch requests to a broker and get back a response set. */
public MultiFetchResponse multifetch(List&lt;FetchRequest&gt; fetches);
/**
* Get a list of valid offsets (up to maxSize) before the given time.
* The result is a list of offsets, in descending order.
* @param time: time in millisecs,
* if set to OffsetRequest$.MODULE$.LATEST_TIME(), get from the latest offset available.
* if set to OffsetRequest$.MODULE$.EARLIEST_TIME(), get from the earliest offset available.
*/
public long[] getOffsetsBefore(String topic, int partition, long time, int maxNumOffsets);
}
</pre>
The low-level API is used to implement the high-level API as well as being used directly for some of our offline consumers which have particular requirements around maintaining state.
<h5><a id="impl_highlevel" href="#impl_highlevel">High-level API</a></h5>
<pre>
/* create a connection to the cluster */
ConsumerConnector connector = Consumer.create(consumerConfig);
interface ConsumerConnector {
/**
* This method is used to get a list of KafkaStreams, which are iterators over
* MessageAndMetadata objects from which you can obtain messages and their
* associated metadata (currently only topic).
* Input: a map of &lt;topic, #streams&gt;
* Output: a map of &lt;topic, list of message streams&gt;
*/
public Map&lt;String,List&lt;KafkaStream&gt;&gt; createMessageStreams(Map&lt;String,Int&gt; topicCountMap);
/**
* You can also obtain a list of KafkaStreams, that iterate over messages
* from topics that match a TopicFilter. (A TopicFilter encapsulates a
* whitelist or a blacklist which is a standard Java regex.)
*/
public List&lt;KafkaStream&gt; createMessageStreamsByFilter(
TopicFilter topicFilter, int numStreams);
/* Commit the offsets of all messages consumed so far. */
public commitOffsets()
/* Shut down the connector */
public shutdown()
}
</pre>
<p>
This API is centered around iterators, implemented by the KafkaStream class. Each KafkaStream represents the stream of messages from one or more partitions on one or more servers. Each stream is used for single threaded processing, so the client can provide the number of desired streams in the create call. Thus a stream may represent the merging of multiple server partitions (to correspond to the number of processing threads), but each partition only goes to one stream.
</p>
<p>
The createMessageStreams call registers the consumer for the topic, which results in rebalancing the consumer/broker assignment. The API encourages creating many topic streams in a single call in order to minimize this rebalancing. The createMessageStreamsByFilter call (additionally) registers watchers to discover new topics that match its filter. Note that each stream that createMessageStreamsByFilter returns may iterate over messages from multiple topics (i.e., if multiple topics are allowed by the filter).
</p>
<h3><a id="networklayer" href="#networklayer">5.2 Network Layer</a></h3>
<p>
The network layer is a fairly straight-forward NIO server, and will not be described in great detail. The sendfile implementation is done by giving the <code>MessageSet</code> interface a <code>writeTo</code> method. This allows the file-backed message set to use the more efficient <code>transferTo</code> implementation instead of an in-process buffered write. The threading model is a single acceptor thread and <i>N</i> processor threads which handle a fixed number of connections each. This design has been pretty thoroughly tested <a href="http://sna-projects.com/blog/2009/08/introducing-the-nio-socketserver-implementation">elsewhere</a> and found to be simple to implement and fast. The protocol is kept quite simple to allow for future implementation of clients in other languages.
</p>
<h3><a id="messages" href="#messages">5.3 Messages</a></h3>
<p>
Messages consist of a fixed-size header, a variable length opaque key byte array and a variable length opaque value byte array. The header contains the following fields:
<ul>
<li> A CRC32 checksum to detect corruption or truncation. <li/>
<li> A format version. </li>
<li> An attributes identifier </li>
<li> A timestamp </li>
</ul>
Leaving the key and value opaque is the right decision: there is a great deal of progress being made on serialization libraries right now, and any particular choice is unlikely to be right for all uses. Needless to say a particular application using Kafka would likely mandate a particular serialization type as part of its usage. The <code>MessageSet</code> interface is simply an iterator over messages with specialized methods for bulk reading and writing to an NIO <code>Channel</code>.
<h3><a id="messageformat" href="#messageformat">5.4 Message Format</a></h3>
<pre>
/**
* 1. 4 byte CRC32 of the message
* 2. 1 byte "magic" identifier to allow format changes, value is 0 or 1
* 3. 1 byte "attributes" identifier to allow annotations on the message independent of the version
* bit 0 ~ 2 : Compression codec.
* 0 : no compression
* 1 : gzip
* 2 : snappy
* 3 : lz4
* bit 3 : Timestamp type
* 0 : create time
* 1 : log append time
* bit 4 ~ 7 : reserved
* 4. (Optional) 8 byte timestamp only if "magic" identifier is greater than 0
* 5. 4 byte key length, containing length K
* 6. K byte key
* 7. 4 byte payload length, containing length V
* 8. V byte payload
*/
</pre>
</p>
<h3><a id="log" href="#log">5.5 Log</a></h3>
<p>
A log for a topic named "my_topic" with two partitions consists of two directories (namely <code>my_topic_0</code> and <code>my_topic_1</code>) populated with data files containing the messages for that topic. The format of the log files is a sequence of "log entries""; each log entry is a 4 byte integer <i>N</i> storing the message length which is followed by the <i>N</i> message bytes. Each message is uniquely identified by a 64-bit integer <i>offset</i> giving the byte position of the start of this message in the stream of all messages ever sent to that topic on that partition. The on-disk format of each message is given below. Each log file is named with the offset of the first message it contains. So the first file created will be 00000000000.kafka, and each additional file will have an integer name roughly <i>S</i> bytes from the previous file where <i>S</i> is the max log file size given in the configuration.
</p>
<p>
The exact binary format for messages is versioned and maintained as a standard interface so message sets can be transferred between producer, broker, and client without recopying or conversion when desirable. This format is as follows:
</p>
<pre>
On-disk format of a message
* Get a list of valid offsets (up to maxSize) before the given time.
* The result is a list of offsets, in descending order.
* @param time: time in millisecs,
* if set to OffsetRequest$.MODULE$.LATEST_TIME(), get from the latest offset available.
* if set to OffsetRequest$.MODULE$.EARLIEST_TIME(), get from the earliest offset available.
*/
public long[] getOffsetsBefore(String topic, int partition, long time, int maxNumOffsets);
}
</pre>
offset : 8 bytes
message length : 4 bytes (value: 4 + 1 + 1 + 8(if magic value > 0) + 4 + K + 4 + V)
crc : 4 bytes
magic value : 1 byte
attributes : 1 byte
timestamp : 8 bytes (Only exists when magic value is greater than zero)
key length : 4 bytes
key : K bytes
value length : 4 bytes
value : V bytes
</pre>
<p>
The use of the message offset as the message id is unusual. Our original idea was to use a GUID generated by the producer, and maintain a mapping from GUID to offset on each broker. But since a consumer must maintain an ID for each server, the global uniqueness of the GUID provides no value. Furthermore, the complexity of maintaining the mapping from a random id to an offset requires a heavy weight index structure which must be synchronized with disk, essentially requiring a full persistent random-access data structure. Thus to simplify the lookup structure we decided to use a simple per-partition atomic counter which could be coupled with the partition id and node id to uniquely identify a message; this makes the lookup structure simpler, though multiple seeks per consumer request are still likely. However once we settled on a counter, the jump to directly using the offset seemed natural&mdash;both after all are monotonically increasing integers unique to a partition. Since the offset is hidden from the consumer API this decision is ultimately an implementation detail and we went with the more efficient approach.
</p>
<img class="centered" src="images/kafka_log.png">
<h4><a id="impl_writes" href="#impl_writes">Writes</a></h4>
<p>
The log allows serial appends which always go to the last file. This file is rolled over to a fresh file when it reaches a configurable size (say 1GB). The log takes two configuration parameters: <i>M</i>, which gives the number of messages to write before forcing the OS to flush the file to disk, and <i>S</i>, which gives a number of seconds after which a flush is forced. This gives a durability guarantee of losing at most <i>M</i> messages or <i>S</i> seconds of data in the event of a system crash.
</p>
<h4><a id="impl_reads" href="#impl_reads">Reads</a></h4>
<p>
Reads are done by giving the 64-bit logical offset of a message and an <i>S</i>-byte max chunk size. This will return an iterator over the messages contained in the <i>S</i>-byte buffer. <i>S</i> is intended to be larger than any single message, but in the event of an abnormally large message, the read can be retried multiple times, each time doubling the buffer size, until the message is read successfully. A maximum message and buffer size can be specified to make the server reject messages larger than some size, and to give a bound to the client on the maximum it needs to ever read to get a complete message. It is likely that the read buffer ends with a partial message, this is easily detected by the size delimiting.
</p>
<p>
The actual process of reading from an offset requires first locating the log segment file in which the data is stored, calculating the file-specific offset from the global offset value, and then reading from that file offset. The search is done as a simple binary search variation against an in-memory range maintained for each file.
</p>
<p>
The log provides the capability of getting the most recently written message to allow clients to start subscribing as of "right now". This is also useful in the case the consumer fails to consume its data within its SLA-specified number of days. In this case when the client attempts to consume a non-existent offset it is given an OutOfRangeException and can either reset itself or fail as appropriate to the use case.
</p>
The low-level API is used to implement the high-level API as well as being used directly for some of our offline consumers which have particular requirements around maintaining state.
<p> The following is the format of the results sent to the consumer.
<h5><a id="impl_highlevel" href="#impl_highlevel">High-level API</a></h5>
<pre>
<pre>
MessageSetSend (fetch result)
/* create a connection to the cluster */
ConsumerConnector connector = Consumer.create(consumerConfig);
total length : 4 bytes
error code : 2 bytes
message 1 : x bytes
...
message n : x bytes
</pre>
interface ConsumerConnector {
<pre>
MultiMessageSetSend (multiFetch result)
/**
* This method is used to get a list of KafkaStreams, which are iterators over
* MessageAndMetadata objects from which you can obtain messages and their
* associated metadata (currently only topic).
* Input: a map of &lt;topic, #streams&gt;
* Output: a map of &lt;topic, list of message streams&gt;
*/
public Map&lt;String,List&lt;KafkaStream&gt;&gt; createMessageStreams(Map&lt;String,Int&gt; topicCountMap);
total length : 4 bytes
error code : 2 bytes
messageSetSend 1
...
messageSetSend n
</pre>
<h4><a id="impl_deletes" href="#impl_deletes">Deletes</a></h4>
<p>
Data is deleted one log segment at a time. The log manager allows pluggable delete policies to choose which files are eligible for deletion. The current policy deletes any log with a modification time of more than <i>N</i> days ago, though a policy which retained the last <i>N</i> GB could also be useful. To avoid locking reads while still allowing deletes that modify the segment list we use a copy-on-write style segment list implementation that provides consistent views to allow a binary search to proceed on an immutable static snapshot view of the log segments while deletes are progressing.
</p>
<h4><a id="impl_guarantees" href="#impl_guarantees">Guarantees</a></h4>
<p>
The log provides a configuration parameter <i>M</i> which controls the maximum number of messages that are written before forcing a flush to disk. On startup a log recovery process is run that iterates over all messages in the newest log segment and verifies that each message entry is valid. A message entry is valid if the sum of its size and offset are less than the length of the file AND the CRC32 of the message payload matches the CRC stored with the message. In the event corruption is detected the log is truncated to the last valid offset.
</p>
<p>
Note that two kinds of corruption must be handled: truncation in which an unwritten block is lost due to a crash, and corruption in which a nonsense block is ADDED to the file. The reason for this is that in general the OS makes no guarantee of the write order between the file inode and the actual block data so in addition to losing written data the file can gain nonsense data if the inode is updated with a new size but a crash occurs before the block containing that data is written. The CRC detects this corner case, and prevents it from corrupting the log (though the unwritten messages are, of course, lost).
</p>
/**
* You can also obtain a list of KafkaStreams, that iterate over messages
* from topics that match a TopicFilter. (A TopicFilter encapsulates a
* whitelist or a blacklist which is a standard Java regex.)
*/
public List&lt;KafkaStream&gt; createMessageStreamsByFilter(
TopicFilter topicFilter, int numStreams);
<h3><a id="distributionimpl" href="#distributionimpl">5.6 Distribution</a></h3>
<h4><a id="impl_offsettracking" href="#impl_offsettracking">Consumer Offset Tracking</a></h4>
<p>
The high-level consumer tracks the maximum offset it has consumed in each partition and periodically commits its offset vector so that it can resume from those offsets in the event of a restart. Kafka provides the option to store all the offsets for a given consumer group in a designated broker (for that group) called the <i>offset manager</i>. i.e., any consumer instance in that consumer group should send its offset commits and fetches to that offset manager (broker). The high-level consumer handles this automatically. If you use the simple consumer you will need to manage offsets manually. This is currently unsupported in the Java simple consumer which can only commit or fetch offsets in ZooKeeper. If you use the Scala simple consumer you can discover the offset manager and explicitly commit or fetch offsets to the offset manager. A consumer can look up its offset manager by issuing a GroupCoordinatorRequest to any Kafka broker and reading the GroupCoordinatorResponse which will contain the offset manager. The consumer can then proceed to commit or fetch offsets from the offsets manager broker. In case the offset manager moves, the consumer will need to rediscover the offset manager. If you wish to manage your offsets manually, you can take a look at these <a href="https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka">code samples that explain how to issue OffsetCommitRequest and OffsetFetchRequest</a>.
</p>
/* Commit the offsets of all messages consumed so far. */
public commitOffsets()
<p>
When the offset manager receives an OffsetCommitRequest, it appends the request to a special <a href="#compaction">compacted</a> Kafka topic named <i>__consumer_offsets</i>. The offset manager sends a successful offset commit response to the consumer only after all the replicas of the offsets topic receive the offsets. In case the offsets fail to replicate within a configurable timeout, the offset commit will fail and the consumer may retry the commit after backing off. (This is done automatically by the high-level consumer.) The brokers periodically compact the offsets topic since it only needs to maintain the most recent offset commit per partition. The offset manager also caches the offsets in an in-memory table in order to serve offset fetches quickly.
</p>
/* Shut down the connector */
public shutdown()
}
</pre>
<p>
This API is centered around iterators, implemented by the KafkaStream class. Each KafkaStream represents the stream of messages from one or more partitions on one or more servers. Each stream is used for single threaded processing, so the client can provide the number of desired streams in the create call. Thus a stream may represent the merging of multiple server partitions (to correspond to the number of processing threads), but each partition only goes to one stream.
</p>
<p>
The createMessageStreams call registers the consumer for the topic, which results in rebalancing the consumer/broker assignment. The API encourages creating many topic streams in a single call in order to minimize this rebalancing. The createMessageStreamsByFilter call (additionally) registers watchers to discover new topics that match its filter. Note that each stream that createMessageStreamsByFilter returns may iterate over messages from multiple topics (i.e., if multiple topics are allowed by the filter).
</p>
<p>
When the offset manager receives an offset fetch request, it simply returns the last committed offset vector from the offsets cache. In case the offset manager was just started or if it just became the offset manager for a new set of consumer groups (by becoming a leader for a partition of the offsets topic), it may need to load the offsets topic partition into the cache. In this case, the offset fetch will fail with an OffsetsLoadInProgress exception and the consumer may retry the OffsetFetchRequest after backing off. (This is done automatically by the high-level consumer.)
</p>
<h3><a id="networklayer" href="#networklayer">5.2 Network Layer</a></h3>
<p>
The network layer is a fairly straight-forward NIO server, and will not be described in great detail. The sendfile implementation is done by giving the <code>MessageSet</code> interface a <code>writeTo</code> method. This allows the file-backed message set to use the more efficient <code>transferTo</code> implementation instead of an in-process buffered write. The threading model is a single acceptor thread and <i>N</i> processor threads which handle a fixed number of connections each. This design has been pretty thoroughly tested <a href="http://sna-projects.com/blog/2009/08/introducing-the-nio-socketserver-implementation">elsewhere</a> and found to be simple to implement and fast. The protocol is kept quite simple to allow for future implementation of clients in other languages.
</p>
<h3><a id="messages" href="#messages">5.3 Messages</a></h3>
<p>
Messages consist of a fixed-size header, a variable length opaque key byte array and a variable length opaque value byte array. The header contains the following fields:
<ul>
<li> A CRC32 checksum to detect corruption or truncation. <li/>
<li> A format version. </li>
<li> An attributes identifier </li>
<li> A timestamp </li>
</ul>
Leaving the key and value opaque is the right decision: there is a great deal of progress being made on serialization libraries right now, and any particular choice is unlikely to be right for all uses. Needless to say a particular application using Kafka would likely mandate a particular serialization type as part of its usage. The <code>MessageSet</code> interface is simply an iterator over messages with specialized methods for bulk reading and writing to an NIO <code>Channel</code>.
<h5><a id="offsetmigration" href="#offsetmigration">Migrating offsets from ZooKeeper to Kafka</a></h5>
<p>
Kafka consumers in earlier releases store their offsets by default in ZooKeeper. It is possible to migrate these consumers to commit offsets into Kafka by following these steps:
<ol>
<li>Set <code>offsets.storage=kafka</code> and <code>dual.commit.enabled=true</code> in your consumer config.
</li>
<li>Do a rolling bounce of your consumers and then verify that your consumers are healthy.
</li>
<li>Set <code>dual.commit.enabled=false</code> in your consumer config.
</li>
<li>Do a rolling bounce of your consumers and then verify that your consumers are healthy.
</li>
</ol>
A roll-back (i.e., migrating from Kafka back to ZooKeeper) can also be performed using the above steps if you set <code>offsets.storage=zookeeper</code>.
</p>
<h3><a id="messageformat" href="#messageformat">5.4 Message Format</a></h3>
<h4><a id="impl_zookeeper" href="#impl_zookeeper">ZooKeeper Directories</a></h4>
<p>
The following gives the ZooKeeper structures and algorithms used for co-ordination between consumers and brokers.
</p>
<pre>
/**
* 1. 4 byte CRC32 of the message
* 2. 1 byte "magic" identifier to allow format changes, value is 0 or 1
* 3. 1 byte "attributes" identifier to allow annotations on the message independent of the version
* bit 0 ~ 2 : Compression codec.
* 0 : no compression
* 1 : gzip
* 2 : snappy
* 3 : lz4
* bit 3 : Timestamp type
* 0 : create time
* 1 : log append time
* bit 4 ~ 7 : reserved
* 4. (Optional) 8 byte timestamp only if "magic" identifier is greater than 0
* 5. 4 byte key length, containing length K
* 6. K byte key
* 7. 4 byte payload length, containing length V
* 8. V byte payload
*/
</pre>
</p>
<h3><a id="log" href="#log">5.5 Log</a></h3>
<p>
A log for a topic named "my_topic" with two partitions consists of two directories (namely <code>my_topic_0</code> and <code>my_topic_1</code>) populated with data files containing the messages for that topic. The format of the log files is a sequence of "log entries""; each log entry is a 4 byte integer <i>N</i> storing the message length which is followed by the <i>N</i> message bytes. Each message is uniquely identified by a 64-bit integer <i>offset</i> giving the byte position of the start of this message in the stream of all messages ever sent to that topic on that partition. The on-disk format of each message is given below. Each log file is named with the offset of the first message it contains. So the first file created will be 00000000000.kafka, and each additional file will have an integer name roughly <i>S</i> bytes from the previous file where <i>S</i> is the max log file size given in the configuration.
</p>
<p>
The exact binary format for messages is versioned and maintained as a standard interface so message sets can be transferred between producer, broker, and client without recopying or conversion when desirable. This format is as follows:
</p>
<pre>
On-disk format of a message
<h4><a id="impl_zknotation" href="#impl_zknotation">Notation</a></h4>
<p>
When an element in a path is denoted [xyz], that means that the value of xyz is not fixed and there is in fact a ZooKeeper znode for each possible value of xyz. For example /topics/[topic] would be a directory named /topics containing a sub-directory for each topic name. Numerical ranges are also given such as [0...5] to indicate the subdirectories 0, 1, 2, 3, 4. An arrow -> is used to indicate the contents of a znode. For example /hello -> world would indicate a znode /hello containing the value "world".
</p>
offset : 8 bytes
message length : 4 bytes (value: 4 + 1 + 1 + 8(if magic value > 0) + 4 + K + 4 + V)
crc : 4 bytes
magic value : 1 byte
attributes : 1 byte
timestamp : 8 bytes (Only exists when magic value is greater than zero)
key length : 4 bytes
key : K bytes
value length : 4 bytes
value : V bytes
</pre>
<p>
The use of the message offset as the message id is unusual. Our original idea was to use a GUID generated by the producer, and maintain a mapping from GUID to offset on each broker. But since a consumer must maintain an ID for each server, the global uniqueness of the GUID provides no value. Furthermore, the complexity of maintaining the mapping from a random id to an offset requires a heavy weight index structure which must be synchronized with disk, essentially requiring a full persistent random-access data structure. Thus to simplify the lookup structure we decided to use a simple per-partition atomic counter which could be coupled with the partition id and node id to uniquely identify a message; this makes the lookup structure simpler, though multiple seeks per consumer request are still likely. However once we settled on a counter, the jump to directly using the offset seemed natural&mdash;both after all are monotonically increasing integers unique to a partition. Since the offset is hidden from the consumer API this decision is ultimately an implementation detail and we went with the more efficient approach.
</p>
<img class="centered" src="/{{version}}/images/kafka_log.png">
<h4><a id="impl_writes" href="#impl_writes">Writes</a></h4>
<p>
The log allows serial appends which always go to the last file. This file is rolled over to a fresh file when it reaches a configurable size (say 1GB). The log takes two configuration parameters: <i>M</i>, which gives the number of messages to write before forcing the OS to flush the file to disk, and <i>S</i>, which gives a number of seconds after which a flush is forced. This gives a durability guarantee of losing at most <i>M</i> messages or <i>S</i> seconds of data in the event of a system crash.
</p>
<h4><a id="impl_reads" href="#impl_reads">Reads</a></h4>
<p>
Reads are done by giving the 64-bit logical offset of a message and an <i>S</i>-byte max chunk size. This will return an iterator over the messages contained in the <i>S</i>-byte buffer. <i>S</i> is intended to be larger than any single message, but in the event of an abnormally large message, the read can be retried multiple times, each time doubling the buffer size, until the message is read successfully. A maximum message and buffer size can be specified to make the server reject messages larger than some size, and to give a bound to the client on the maximum it needs to ever read to get a complete message. It is likely that the read buffer ends with a partial message, this is easily detected by the size delimiting.
</p>
<p>
The actual process of reading from an offset requires first locating the log segment file in which the data is stored, calculating the file-specific offset from the global offset value, and then reading from that file offset. The search is done as a simple binary search variation against an in-memory range maintained for each file.
</p>
<p>
The log provides the capability of getting the most recently written message to allow clients to start subscribing as of "right now". This is also useful in the case the consumer fails to consume its data within its SLA-specified number of days. In this case when the client attempts to consume a non-existent offset it is given an OutOfRangeException and can either reset itself or fail as appropriate to the use case.
</p>
<h4><a id="impl_zkbroker" href="#impl_zkbroker">Broker Node Registry</a></h4>
<pre>
/brokers/ids/[0...N] --> {"jmx_port":...,"timestamp":...,"endpoints":[...],"host":...,"version":...,"port":...} (ephemeral node)
</pre>
<p>
This is a list of all present broker nodes, each of which provides a unique logical broker id which identifies it to consumers (which must be given as part of its configuration). On startup, a broker node registers itself by creating a znode with the logical broker id under /brokers/ids. The purpose of the logical broker id is to allow a broker to be moved to a different physical machine without affecting consumers. An attempt to register a broker id that is already in use (say because two servers are configured with the same broker id) results in an error.
</p>
<p>
Since the broker registers itself in ZooKeeper using ephemeral znodes, this registration is dynamic and will disappear if the broker is shutdown or dies (thus notifying consumers it is no longer available).
</p>
<h4><a id="impl_zktopic" href="#impl_zktopic">Broker Topic Registry</a></h4>
<pre>
/brokers/topics/[topic]/partitions/[0...N]/state --> {"controller_epoch":...,"leader":...,"version":...,"leader_epoch":...,"isr":[...]} (ephemeral node)
</pre>
<p> The following is the format of the results sent to the consumer.
<p>
Each broker registers itself under the topics it maintains and stores the number of partitions for that topic.
</p>
<pre>
MessageSetSend (fetch result)
<h4><a id="impl_zkconsumers" href="#impl_zkconsumers">Consumers and Consumer Groups</a></h4>
<p>
Consumers of topics also register themselves in ZooKeeper, in order to coordinate with each other and balance the consumption of data. Consumers can also store their offsets in ZooKeeper by setting <code>offsets.storage=zookeeper</code>. However, this offset storage mechanism will be deprecated in a future release. Therefore, it is recommended to <a href="#offsetmigration">migrate offsets storage to Kafka</a>.
</p>
total length : 4 bytes
error code : 2 bytes
message 1 : x bytes
...
message n : x bytes
</pre>
<p>
Multiple consumers can form a group and jointly consume a single topic. Each consumer in the same group is given a shared group_id.
For example if one consumer is your foobar process, which is run across three machines, then you might assign this group of consumers the id "foobar". This group id is provided in the configuration of the consumer, and is your way to tell the consumer which group it belongs to.
</p>
<pre>
MultiMessageSetSend (multiFetch result)
<p>
The consumers in a group divide up the partitions as fairly as possible, each partition is consumed by exactly one consumer in a consumer group.
</p>
total length : 4 bytes
error code : 2 bytes
messageSetSend 1
...
messageSetSend n
</pre>
<h4><a id="impl_deletes" href="#impl_deletes">Deletes</a></h4>
<p>
Data is deleted one log segment at a time. The log manager allows pluggable delete policies to choose which files are eligible for deletion. The current policy deletes any log with a modification time of more than <i>N</i> days ago, though a policy which retained the last <i>N</i> GB could also be useful. To avoid locking reads while still allowing deletes that modify the segment list we use a copy-on-write style segment list implementation that provides consistent views to allow a binary search to proceed on an immutable static snapshot view of the log segments while deletes are progressing.
</p>
<h4><a id="impl_guarantees" href="#impl_guarantees">Guarantees</a></h4>
<p>
The log provides a configuration parameter <i>M</i> which controls the maximum number of messages that are written before forcing a flush to disk. On startup a log recovery process is run that iterates over all messages in the newest log segment and verifies that each message entry is valid. A message entry is valid if the sum of its size and offset are less than the length of the file AND the CRC32 of the message payload matches the CRC stored with the message. In the event corruption is detected the log is truncated to the last valid offset.
</p>
<p>
Note that two kinds of corruption must be handled: truncation in which an unwritten block is lost due to a crash, and corruption in which a nonsense block is ADDED to the file. The reason for this is that in general the OS makes no guarantee of the write order between the file inode and the actual block data so in addition to losing written data the file can gain nonsense data if the inode is updated with a new size but a crash occurs before the block containing that data is written. The CRC detects this corner case, and prevents it from corrupting the log (though the unwritten messages are, of course, lost).
</p>
<h4><a id="impl_zkconsumerid" href="#impl_zkconsumerid">Consumer Id Registry</a></h4>
<p>
In addition to the group_id which is shared by all consumers in a group, each consumer is given a transient, unique consumer_id (of the form hostname:uuid) for identification purposes. Consumer ids are registered in the following directory.
<pre>
/consumers/[group_id]/ids/[consumer_id] --> {"version":...,"subscription":{...:...},"pattern":...,"timestamp":...} (ephemeral node)
</pre>
Each of the consumers in the group registers under its group and creates a znode with its consumer_id. The value of the znode contains a map of &lt;topic, #streams&gt;. This id is simply used to identify each of the consumers which is currently active within a group. This is an ephemeral node so it will disappear if the consumer process dies.
</p>
<h3><a id="distributionimpl" href="#distributionimpl">5.6 Distribution</a></h3>
<h4><a id="impl_offsettracking" href="#impl_offsettracking">Consumer Offset Tracking</a></h4>
<p>
The high-level consumer tracks the maximum offset it has consumed in each partition and periodically commits its offset vector so that it can resume from those offsets in the event of a restart. Kafka provides the option to store all the offsets for a given consumer group in a designated broker (for that group) called the <i>offset manager</i>. i.e., any consumer instance in that consumer group should send its offset commits and fetches to that offset manager (broker). The high-level consumer handles this automatically. If you use the simple consumer you will need to manage offsets manually. This is currently unsupported in the Java simple consumer which can only commit or fetch offsets in ZooKeeper. If you use the Scala simple consumer you can discover the offset manager and explicitly commit or fetch offsets to the offset manager. A consumer can look up its offset manager by issuing a GroupCoordinatorRequest to any Kafka broker and reading the GroupCoordinatorResponse which will contain the offset manager. The consumer can then proceed to commit or fetch offsets from the offsets manager broker. In case the offset manager moves, the consumer will need to rediscover the offset manager. If you wish to manage your offsets manually, you can take a look at these <a href="https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka">code samples that explain how to issue OffsetCommitRequest and OffsetFetchRequest</a>.
</p>
<h4><a id="impl_zkconsumeroffsets" href="#impl_zkconsumeroffsets">Consumer Offsets</a></h4>
<p>
Consumers track the maximum offset they have consumed in each partition. This value is stored in a ZooKeeper directory if <code>offsets.storage=zookeeper</code>.
</p>
<pre>
/consumers/[group_id]/offsets/[topic]/[partition_id] --> offset_counter_value (persistent node)
</pre>
<p>
When the offset manager receives an OffsetCommitRequest, it appends the request to a special <a href="#compaction">compacted</a> Kafka topic named <i>__consumer_offsets</i>. The offset manager sends a successful offset commit response to the consumer only after all the replicas of the offsets topic receive the offsets. In case the offsets fail to replicate within a configurable timeout, the offset commit will fail and the consumer may retry the commit after backing off. (This is done automatically by the high-level consumer.) The brokers periodically compact the offsets topic since it only needs to maintain the most recent offset commit per partition. The offset manager also caches the offsets in an in-memory table in order to serve offset fetches quickly.
</p>
<h4><a id="impl_zkowner" href="#impl_zkowner">Partition Owner registry</a></h4>
<p>
When the offset manager receives an offset fetch request, it simply returns the last committed offset vector from the offsets cache. In case the offset manager was just started or if it just became the offset manager for a new set of consumer groups (by becoming a leader for a partition of the offsets topic), it may need to load the offsets topic partition into the cache. In this case, the offset fetch will fail with an OffsetsLoadInProgress exception and the consumer may retry the OffsetFetchRequest after backing off. (This is done automatically by the high-level consumer.)
</p>
<p>
Each broker partition is consumed by a single consumer within a given consumer group. The consumer must establish its ownership of a given partition before any consumption can begin. To establish its ownership, a consumer writes its own id in an ephemeral node under the particular broker partition it is claiming.
</p>
<h5><a id="offsetmigration" href="#offsetmigration">Migrating offsets from ZooKeeper to Kafka</a></h5>
<p>
Kafka consumers in earlier releases store their offsets by default in ZooKeeper. It is possible to migrate these consumers to commit offsets into Kafka by following these steps:
<ol>
<li>Set <code>offsets.storage=kafka</code> and <code>dual.commit.enabled=true</code> in your consumer config.
</li>
<li>Do a rolling bounce of your consumers and then verify that your consumers are healthy.
</li>
<li>Set <code>dual.commit.enabled=false</code> in your consumer config.
</li>
<li>Do a rolling bounce of your consumers and then verify that your consumers are healthy.
</li>
</ol>
A roll-back (i.e., migrating from Kafka back to ZooKeeper) can also be performed using the above steps if you set <code>offsets.storage=zookeeper</code>.
</p>
<pre>
/consumers/[group_id]/owners/[topic]/[partition_id] --> consumer_node_id (ephemeral node)
</pre>
<h4><a id="impl_zookeeper" href="#impl_zookeeper">ZooKeeper Directories</a></h4>
<p>
The following gives the ZooKeeper structures and algorithms used for co-ordination between consumers and brokers.
</p>
<h4><a id="impl_clusterid" href="#impl_clusterid">Cluster Id</a></h4>
<h4><a id="impl_zknotation" href="#impl_zknotation">Notation</a></h4>
<p>
When an element in a path is denoted [xyz], that means that the value of xyz is not fixed and there is in fact a ZooKeeper znode for each possible value of xyz. For example /topics/[topic] would be a directory named /topics containing a sub-directory for each topic name. Numerical ranges are also given such as [0...5] to indicate the subdirectories 0, 1, 2, 3, 4. An arrow -> is used to indicate the contents of a znode. For example /hello -> world would indicate a znode /hello containing the value "world".
</p>
<p>
The cluster id is a unique and immutable identifier assigned to a Kafka cluster. The cluster id can have a maximum of 22 characters and the allowed characters are defined by the regular expression [a-zA-Z0-9_\-]+, which corresponds to the characters used by the URL-safe Base64 variant with no padding. Conceptually, it is auto-generated when a cluster is started for the first time.
</p>
<p>
Implementation-wise, it is generated when a broker with version 0.10.1 or later is successfully started for the first time. The broker tries to get the cluster id from the <code>/cluster/id</code> znode during startup. If the znode does not exist, the broker generates a new cluster id and creates the znode with this cluster id.
</p>
<h4><a id="impl_zkbroker" href="#impl_zkbroker">Broker Node Registry</a></h4>
<pre>
/brokers/ids/[0...N] --> {"jmx_port":...,"timestamp":...,"endpoints":[...],"host":...,"version":...,"port":...} (ephemeral node)
</pre>
<p>
This is a list of all present broker nodes, each of which provides a unique logical broker id which identifies it to consumers (which must be given as part of its configuration). On startup, a broker node registers itself by creating a znode with the logical broker id under /brokers/ids. The purpose of the logical broker id is to allow a broker to be moved to a different physical machine without affecting consumers. An attempt to register a broker id that is already in use (say because two servers are configured with the same broker id) results in an error.
</p>
<p>
Since the broker registers itself in ZooKeeper using ephemeral znodes, this registration is dynamic and will disappear if the broker is shutdown or dies (thus notifying consumers it is no longer available).
</p>
<h4><a id="impl_zktopic" href="#impl_zktopic">Broker Topic Registry</a></h4>
<pre>
/brokers/topics/[topic]/partitions/[0...N]/state --> {"controller_epoch":...,"leader":...,"version":...,"leader_epoch":...,"isr":[...]} (ephemeral node)
</pre>
<h4><a id="impl_brokerregistration" href="#impl_brokerregistration">Broker node registration</a></h4>
<p>
Each broker registers itself under the topics it maintains and stores the number of partitions for that topic.
</p>
<p>
The broker nodes are basically independent, so they only publish information about what they have. When a broker joins, it registers itself under the broker node registry directory and writes information about its host name and port. The broker also register the list of existing topics and their logical partitions in the broker topic registry. New topics are registered dynamically when they are created on the broker.
</p>
<h4><a id="impl_zkconsumers" href="#impl_zkconsumers">Consumers and Consumer Groups</a></h4>
<p>
Consumers of topics also register themselves in ZooKeeper, in order to coordinate with each other and balance the consumption of data. Consumers can also store their offsets in ZooKeeper by setting <code>offsets.storage=zookeeper</code>. However, this offset storage mechanism will be deprecated in a future release. Therefore, it is recommended to <a href="#offsetmigration">migrate offsets storage to Kafka</a>.
</p>
<h4><a id="impl_consumerregistration" href="#impl_consumerregistration">Consumer registration algorithm</a></h4>
<p>
Multiple consumers can form a group and jointly consume a single topic. Each consumer in the same group is given a shared group_id.
For example if one consumer is your foobar process, which is run across three machines, then you might assign this group of consumers the id "foobar". This group id is provided in the configuration of the consumer, and is your way to tell the consumer which group it belongs to.
</p>
<p>
When a consumer starts, it does the following:
<ol>
<li> Register itself in the consumer id registry under its group.
</li>
<li> Register a watch on changes (new consumers joining or any existing consumers leaving) under the consumer id registry. (Each change triggers rebalancing among all consumers within the group to which the changed consumer belongs.)
</li>
<li> Register a watch on changes (new brokers joining or any existing brokers leaving) under the broker id registry. (Each change triggers rebalancing among all consumers in all consumer groups.) </li>
<li> If the consumer creates a message stream using a topic filter, it also registers a watch on changes (new topics being added) under the broker topic registry. (Each change will trigger re-evaluation of the available topics to determine which topics are allowed by the topic filter. A new allowed topic will trigger rebalancing among all consumers within the consumer group.)</li>
<li> Force itself to rebalance within in its consumer group.
</li>
</ol>
</p>
<p>
The consumers in a group divide up the partitions as fairly as possible, each partition is consumed by exactly one consumer in a consumer group.
</p>
<h4><a id="impl_consumerrebalance" href="#impl_consumerrebalance">Consumer rebalancing algorithm</a></h4>
<p>
The consumer rebalancing algorithms allows all the consumers in a group to come into consensus on which consumer is consuming which partitions. Consumer rebalancing is triggered on each addition or removal of both broker nodes and other consumers within the same group. For a given topic and a given consumer group, broker partitions are divided evenly among consumers within the group. A partition is always consumed by a single consumer. This design simplifies the implementation. Had we allowed a partition to be concurrently consumed by multiple consumers, there would be contention on the partition and some kind of locking would be required. If there are more consumers than partitions, some consumers won't get any data at all. During rebalancing, we try to assign partitions to consumers in such a way that reduces the number of broker nodes each consumer has to connect to.
</p>
<p>
Each consumer does the following during rebalancing:
</p>
<pre>
1. For each topic T that C<sub>i</sub> subscribes to
2. let P<sub>T</sub> be all partitions producing topic T
3. let C<sub>G</sub> be all consumers in the same group as C<sub>i</sub> that consume topic T
4. sort P<sub>T</sub> (so partitions on the same broker are clustered together)
5. sort C<sub>G</sub>
6. let i be the index position of C<sub>i</sub> in C<sub>G</sub> and let N = size(P<sub>T</sub>)/size(C<sub>G</sub>)
7. assign partitions from i*N to (i+1)*N - 1 to consumer C<sub>i</sub>
8. remove current entries owned by C<sub>i</sub> from the partition owner registry
9. add newly assigned partitions to the partition owner registry
(we may need to re-try this until the original partition owner releases its ownership)
</pre>
<p>
When rebalancing is triggered at one consumer, rebalancing should be triggered in other consumers within the same group about the same time.
</p>
<h4><a id="impl_zkconsumerid" href="#impl_zkconsumerid">Consumer Id Registry</a></h4>
<p>
In addition to the group_id which is shared by all consumers in a group, each consumer is given a transient, unique consumer_id (of the form hostname:uuid) for identification purposes. Consumer ids are registered in the following directory.
<pre>
/consumers/[group_id]/ids/[consumer_id] --> {"version":...,"subscription":{...:...},"pattern":...,"timestamp":...} (ephemeral node)
</pre>
Each of the consumers in the group registers under its group and creates a znode with its consumer_id. The value of the znode contains a map of &lt;topic, #streams&gt;. This id is simply used to identify each of the consumers which is currently active within a group. This is an ephemeral node so it will disappear if the consumer process dies.
</p>
<h4><a id="impl_zkconsumeroffsets" href="#impl_zkconsumeroffsets">Consumer Offsets</a></h4>
<p>
Consumers track the maximum offset they have consumed in each partition. This value is stored in a ZooKeeper directory if <code>offsets.storage=zookeeper</code>.
</p>
<pre>
/consumers/[group_id]/offsets/[topic]/[partition_id] --> offset_counter_value (persistent node)
</pre>
<h4><a id="impl_zkowner" href="#impl_zkowner">Partition Owner registry</a></h4>
<p>
Each broker partition is consumed by a single consumer within a given consumer group. The consumer must establish its ownership of a given partition before any consumption can begin. To establish its ownership, a consumer writes its own id in an ephemeral node under the particular broker partition it is claiming.
</p>
<pre>
/consumers/[group_id]/owners/[topic]/[partition_id] --> consumer_node_id (ephemeral node)
</pre>
<h4><a id="impl_clusterid" href="#impl_clusterid">Cluster Id</a></h4>
<p>
The cluster id is a unique and immutable identifier assigned to a Kafka cluster. The cluster id can have a maximum of 22 characters and the allowed characters are defined by the regular expression [a-zA-Z0-9_\-]+, which corresponds to the characters used by the URL-safe Base64 variant with no padding. Conceptually, it is auto-generated when a cluster is started for the first time.
</p>
<p>
Implementation-wise, it is generated when a broker with version 0.10.1 or later is successfully started for the first time. The broker tries to get the cluster id from the <code>/cluster/id</code> znode during startup. If the znode does not exist, the broker generates a new cluster id and creates the znode with this cluster id.
</p>
<h4><a id="impl_brokerregistration" href="#impl_brokerregistration">Broker node registration</a></h4>
<p>
The broker nodes are basically independent, so they only publish information about what they have. When a broker joins, it registers itself under the broker node registry directory and writes information about its host name and port. The broker also register the list of existing topics and their logical partitions in the broker topic registry. New topics are registered dynamically when they are created on the broker.
</p>
<h4><a id="impl_consumerregistration" href="#impl_consumerregistration">Consumer registration algorithm</a></h4>
<p>
When a consumer starts, it does the following:
<ol>
<li> Register itself in the consumer id registry under its group.
</li>
<li> Register a watch on changes (new consumers joining or any existing consumers leaving) under the consumer id registry. (Each change triggers rebalancing among all consumers within the group to which the changed consumer belongs.)
</li>
<li> Register a watch on changes (new brokers joining or any existing brokers leaving) under the broker id registry. (Each change triggers rebalancing among all consumers in all consumer groups.) </li>
<li> If the consumer creates a message stream using a topic filter, it also registers a watch on changes (new topics being added) under the broker topic registry. (Each change will trigger re-evaluation of the available topics to determine which topics are allowed by the topic filter. A new allowed topic will trigger rebalancing among all consumers within the consumer group.)</li>
<li> Force itself to rebalance within in its consumer group.
</li>
</ol>
</p>
<h4><a id="impl_consumerrebalance" href="#impl_consumerrebalance">Consumer rebalancing algorithm</a></h4>
<p>
The consumer rebalancing algorithms allows all the consumers in a group to come into consensus on which consumer is consuming which partitions. Consumer rebalancing is triggered on each addition or removal of both broker nodes and other consumers within the same group. For a given topic and a given consumer group, broker partitions are divided evenly among consumers within the group. A partition is always consumed by a single consumer. This design simplifies the implementation. Had we allowed a partition to be concurrently consumed by multiple consumers, there would be contention on the partition and some kind of locking would be required. If there are more consumers than partitions, some consumers won't get any data at all. During rebalancing, we try to assign partitions to consumers in such a way that reduces the number of broker nodes each consumer has to connect to.
</p>
<p>
Each consumer does the following during rebalancing:
</p>
<pre>
1. For each topic T that C<sub>i</sub> subscribes to
2. let P<sub>T</sub> be all partitions producing topic T
3. let C<sub>G</sub> be all consumers in the same group as C<sub>i</sub> that consume topic T
4. sort P<sub>T</sub> (so partitions on the same broker are clustered together)
5. sort C<sub>G</sub>
6. let i be the index position of C<sub>i</sub> in C<sub>G</sub> and let N = size(P<sub>T</sub>)/size(C<sub>G</sub>)
7. assign partitions from i*N to (i+1)*N - 1 to consumer C<sub>i</sub>
8. remove current entries owned by C<sub>i</sub> from the partition owner registry
9. add newly assigned partitions to the partition owner registry
(we may need to re-try this until the original partition owner releases its ownership)
</pre>
<p>
When rebalancing is triggered at one consumer, rebalancing should be triggered in other consumers within the same group about the same time.
</p>
</script>
<div class="p-implementation"></div>

View File

@ -14,186 +14,206 @@
See the License for the specific language governing permissions and
limitations under the License.
-->
<h3> Kafka is <i>a distributed streaming platform</i>. What exactly does that mean?</h3>
<p>We think of a streaming platform as having three key capabilities:</p>
<ol>
<li>It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging system.
<li>It lets you store streams of records in a fault-tolerant way.
<li>It lets you process streams of records as they occur.
</ol>
<p>What is Kafka good for?</p>
<p>It gets used for two broad classes of application:</p>
<ol>
<li>Building real-time streaming data pipelines that reliably get data between systems or applications
<li>Building real-time streaming applications that transform or react to the streams of data
</ol>
<p>To understand how Kafka does these things, let's dive in and explore Kafka's capabilities from the bottom up.</p>
<p>First a few concepts:</p>
<ul>
<li>Kafka is run as a cluster on one or more servers.
<li>The Kafka cluster stores streams of <i>records</i> in categories called <i>topics</i>.
<li>Each record consists of a key, a value, and a timestamp.
</ul>
<p>Kafka has four core APIs:</p>
<div style="overflow: hidden;">
<ul style="float: left; width: 40%;">
<li>The <a href="/documentation.html#producerapi">Producer API</a> allows an application to publish a stream records to one or more Kafka topics.
<li>The <a href="/documentation.html#consumerapi">Consumer API</a> allows an application to subscribe to one or more topics and process the stream of records produced to them.
<li>The <a href="/documentation.html#streams">Streams API</a> allows an application to act as a <i>stream processor</i>, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
<li>The <a href="/documentation.html#connect">Connector API</a> allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.
</ul>
<img src="images/kafka-apis.png" style="float: right; width: 50%;">
</div>
<p>
In Kafka the communication between the clients and the servers is done with a simple, high-performance, language agnostic <a href="https://kafka.apache.org/protocol.html">TCP protocol</a>. This protocol is versioned and maintains backwards compatibility with older version. We provide a Java client for Kafka, but clients are available in <a href="https://cwiki.apache.org/confluence/display/KAFKA/Clients">many languages</a>.</p>
<h4><a id="intro_topics" href="#intro_topics">Topics and Logs</a></h4>
<p>Let's first dive into the core abstraction Kafka provides for a stream of records&mdash;the topic.</p>
<p>A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.</p>
<p> For each topic, the Kafka cluster maintains a partitioned log that looks like this: </p>
<img class="centered" src="images/log_anatomy.png">
<script><!--#include virtual="js/templateData.js" --></script>
<p> Each partition is an ordered, immutable sequence of records that is continually appended to&mdash;a structured commit log. The records in the partitions are each assigned a sequential id number called the <i>offset</i> that uniquely identifies each record within the partition.
</p>
<p>
The Kafka cluster retains all published records&mdash;whether or not they have been consumed&mdash;using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so storing data for a long time is not a problem.
</p>
<img class="centered" src="images/log_consumer.png" style="width:400px">
<p>
In fact, the only metadata retained on a per-consumer basis is the offset or position of that consumer in the log. This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads records, but, in fact, since the position is controlled by the consumer it can consume records in any order it likes. For example a consumer can reset to an older offset to reprocess data from the past or skip ahead to the most recent record and start consuming from "now".
</p>
<p>
This combination of features means that Kafka consumers are very cheap&mdash;they can come and go without much impact on the cluster or on other consumers. For example, you can use our command line tools to "tail" the contents of any topic without changing what is consumed by any existing consumers.
</p>
<p>
The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can handle an arbitrary amount of data. Second they act as the unit of parallelism&mdash;more on that in a bit.
</p>
<script id="introduction-template" type="text/x-handlebars-template">
<h3> Kafka is <i>a distributed streaming platform</i>. What exactly does that mean?</h3>
<p>We think of a streaming platform as having three key capabilities:</p>
<ol>
<li>It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging system.
<li>It lets you store streams of records in a fault-tolerant way.
<li>It lets you process streams of records as they occur.
</ol>
<p>What is Kafka good for?</p>
<p>It gets used for two broad classes of application:</p>
<ol>
<li>Building real-time streaming data pipelines that reliably get data between systems or applications
<li>Building real-time streaming applications that transform or react to the streams of data
</ol>
<p>To understand how Kafka does these things, let's dive in and explore Kafka's capabilities from the bottom up.</p>
<p>First a few concepts:</p>
<ul>
<li>Kafka is run as a cluster on one or more servers.
<li>The Kafka cluster stores streams of <i>records</i> in categories called <i>topics</i>.
<li>Each record consists of a key, a value, and a timestamp.
</ul>
<p>Kafka has four core APIs:</p>
<div style="overflow: hidden;">
<ul style="float: left; width: 40%;">
<li>The <a href="/documentation.html#producerapi">Producer API</a> allows an application to publish a stream records to one or more Kafka topics.
<li>The <a href="/documentation.html#consumerapi">Consumer API</a> allows an application to subscribe to one or more topics and process the stream of records produced to them.
<li>The <a href="/documentation.html#streams">Streams API</a> allows an application to act as a <i>stream processor</i>, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
<li>The <a href="/documentation.html#connect">Connector API</a> allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.
</ul>
<img src="/{{version}}/images/kafka-apis.png" style="float: right; width: 50%;">
</div>
<p>
In Kafka the communication between the clients and the servers is done with a simple, high-performance, language agnostic <a href="https://kafka.apache.org/protocol.html">TCP protocol</a>. This protocol is versioned and maintains backwards compatibility with older version. We provide a Java client for Kafka, but clients are available in <a href="https://cwiki.apache.org/confluence/display/KAFKA/Clients">many languages</a>.</p>
<h4><a id="intro_distribution" href="#intro_distribution">Distribution</a></h4>
<h4><a id="intro_topics" href="#intro_topics">Topics and Logs</a></h4>
<p>Let's first dive into the core abstraction Kafka provides for a stream of records&mdash;the topic.</p>
<p>A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.</p>
<p> For each topic, the Kafka cluster maintains a partitioned log that looks like this: </p>
<img class="centered" src="/{{version}}/images/log_anatomy.png">
<p>
The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. Each partition is replicated across a configurable number of servers for fault tolerance.
</p>
<p>
Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader. Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.
</p>
<p> Each partition is an ordered, immutable sequence of records that is continually appended to&mdash;a structured commit log. The records in the partitions are each assigned a sequential id number called the <i>offset</i> that uniquely identifies each record within the partition.
</p>
<p>
The Kafka cluster retains all published records&mdash;whether or not they have been consumed&mdash;using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so storing data for a long time is not a problem.
</p>
<img class="centered" src="/{{version}}/images/log_consumer.png" style="width:400px">
<p>
In fact, the only metadata retained on a per-consumer basis is the offset or position of that consumer in the log. This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads records, but, in fact, since the position is controlled by the consumer it can consume records in any order it likes. For example a consumer can reset to an older offset to reprocess data from the past or skip ahead to the most recent record and start consuming from "now".
</p>
<p>
This combination of features means that Kafka consumers are very cheap&mdash;they can come and go without much impact on the cluster or on other consumers. For example, you can use our command line tools to "tail" the contents of any topic without changing what is consumed by any existing consumers.
</p>
<p>
The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can handle an arbitrary amount of data. Second they act as the unit of parallelism&mdash;more on that in a bit.
</p>
<h4><a id="intro_producers" href="#intro_producers">Producers</a></h4>
<p>
Producers publish data to the topics of their choice. The producer is responsible for choosing which record to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the record). More on the use of partitioning in a second!
</p>
<h4><a id="intro_distribution" href="#intro_distribution">Distribution</a></h4>
<h4><a id="intro_consumers" href="#intro_consumers">Consumers</a></h4>
<p>
The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. Each partition is replicated across a configurable number of servers for fault tolerance.
</p>
<p>
Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader. Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.
</p>
<p>
Consumers label themselves with a <i>consumer group</i> name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
</p>
<p>
If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.</p>
<p>
If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.
</p>
<img class="centered" src="images/consumer-groups.png">
<p>
A two server Kafka cluster hosting four partitions (P0-P3) with two consumer groups. Consumer group A has two consumer instances and group B has four.
</p>
<h4><a id="intro_producers" href="#intro_producers">Producers</a></h4>
<p>
Producers publish data to the topics of their choice. The producer is responsible for choosing which record to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the record). More on the use of partitioning in a second!
</p>
<p>
More commonly, however, we have found that topics have a small number of consumer groups, one for each "logical subscriber". Each group is composed of many consumer instances for scalability and fault tolerance. This is nothing more than publish-subscribe semantics where the subscriber is a cluster of consumers instead of a single process.
</p>
<p>
The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a "fair share" of partitions at any point in time. This process of maintaining membership in the group is handled by the Kafka protocol dynamically. If new instances join the group they will take over some partitions from other members of the group; if an instance dies, its partitions will be distributed to the remaining instances.
</p>
<p>
Kafka only provides a total order over records <i>within</i> a partition, not between different partitions in a topic. Per-partition ordering combined with the ability to partition data by key is sufficient for most applications. However, if you require a total order over records this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group.
</p>
<h4><a id="intro_guarantees" href="#intro_guarantees">Guarantees</a></h4>
<p>
At a high-level Kafka gives the following guarantees:
</p>
<ul>
<li>Messages sent by a producer to a particular topic partition will be appended in the order they are sent. That is, if a record M1 is sent by the same producer as a record M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log.
<li>A consumer instance sees records in the order they are stored in the log.
<li>For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any records committed to the log.
</ul>
<p>
More details on these guarantees are given in the design section of the documentation.
</p>
<h4><a id="kafka_mq" href="#kafka_mq">Kafka as a Messaging System</a></h4>
<p>
How does Kafka's notion of streams compare to a traditional enterprise messaging system?
</p>
<p>
Messaging traditionally has two models: <a href="http://en.wikipedia.org/wiki/Message_queue">queuing</a> and <a href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern">publish-subscribe</a>. In a queue, a pool of consumers may read from a server and each record goes to one of them; in publish-subscribe the record is broadcast to all consumers. Each of these two models has a strength and a weakness. The strength of queuing is that it allows you to divide up the processing of data over multiple consumer instances, which lets you scale your processing. Unfortunately, queues aren't multi-subscriber&mdash;once one process reads the data it's gone. Publish-subscribe allows you broadcast data to multiple processes, but has no way of scaling processing since every message goes to every subscriber.
</p>
<p>
The consumer group concept in Kafka generalizes these two concepts. As with a queue the consumer group allows you to divide up processing over a collection of processes (the members of the consumer group). As with publish-subscribe, Kafka allows you to broadcast messages to multiple consumer groups.
</p>
<p>
The advantage of Kafka's model is that every topic has both these properties&mdash;it can scale processing and is also multi-subscriber&mdash;there is no need to choose one or the other.
</p>
<p>
Kafka has stronger ordering guarantees than a traditional messaging system, too.
</p>
<p>
A traditional queue retains records in-order on the server, and if multiple consumers consume from the queue then the server hands out records in the order they are stored. However, although the server hands out records in order, the records are delivered asynchronously to consumers, so they may arrive out of order on different consumers. This effectively means the ordering of the records is lost in the presence of parallel consumption. Messaging systems often work around this by having a notion of "exclusive consumer" that allows only one process to consume from a queue, but of course this means that there is no parallelism in processing.
</p>
<p>
Kafka does it better. By having a notion of parallelism&mdash;the partition&mdash;within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. Since there are many partitions this still balances the load over many consumer instances. Note however that there cannot be more consumer instances in a consumer group than partitions.
</p>
<h4><a id="intro_consumers" href="#intro_consumers">Consumers</a></h4>
<h4>Kafka as a Storage System</h4>
<p>
Consumers label themselves with a <i>consumer group</i> name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
</p>
<p>
If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.</p>
<p>
If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.
</p>
<img class="centered" src="/{{version}}/images/consumer-groups.png">
<p>
A two server Kafka cluster hosting four partitions (P0-P3) with two consumer groups. Consumer group A has two consumer instances and group B has four.
</p>
<p>
Any message queue that allows publishing messages decoupled from consuming them is effectively acting as a storage system for the in-flight messages. What is different about Kafka is that it is a very good storage system.
</p>
<p>
Data written to Kafka is written to disk and replicated for fault-tolerance. Kafka allows producers to wait on acknowledgement so that a write isn't considered complete until it is fully replicated and guaranteed to persist even if the server written to fails.
</p>
<p>
The disk structures Kafka uses scale well&mdash;Kafka will perform the same whether you have 50 KB or 50 TB of persistent data on the server.
</p>
<p>
As a result of taking storage seriously and allowing the clients to control their read position, you can think of Kafka as a kind of special purpose distributed filesystem dedicated to high-performance, low-latency commit log storage, replication, and propagation.
</p>
<h4>Kafka for Stream Processing</h4>
<p>
It isn't enough to just read, write, and store streams of data, the purpose is to enable real-time processing of streams.
</p>
<p>
In Kafka a stream processor is anything that takes continual streams of data from input topics, performs some processing on this input, and produces continual streams of data to output topics.
</p>
<p>
For example, a retail application might take in input streams of sales and shipments, and output a stream of reorders and price adjustments computed off this data.
</p>
<p>
It is possible to do simple processing directly using the producer and consumer APIs. However for more complex transformations Kafka provides a fully integrated <a href="/documentation.html#streams">Streams API</a>. This allows building applications that do non-trivial processing that compute aggregations off of streams or join streams together.
</p>
<p>
This facility helps solve the hard problems this type of application faces: handling out-of-order data, reprocessing input as code changes, performing stateful computations, etc.
</p>
<p>
The streams API builds on the core primitives Kafka provides: it uses the producer and consumer APIs for input, uses Kafka for stateful storage, and uses the same group mechanism for fault tolerance among the stream processor instances.
</p>
<h4>Putting the Pieces Together</h4>
<p>
This combination of messaging, storage, and stream processing may seem unusual but it is essential to Kafka's role as a streaming platform.
</p>
<p>
A distributed file system like HDFS allows storing static files for batch processing. Effectively a system like this allows storing and processing <i>historical</i> data from the past.
</p>
<p>
A traditional enterprise messaging system allows processing future messages that will arrive after you subscribe. Applications built in this way process future data as it arrives.
</p>
<p>
Kafka combines both of these capabilities, and the combination is critical both for Kafka usage as a platform for streaming applications as well as for streaming data pipelines.
</p>
<p>
By combining storage and low-latency subscriptions, streaming applications can treat both past and future data the same way. That is a single application can process historical, stored data but rather than ending when it reaches the last record it can keep processing as future data arrives. This is a generalized notion of stream processing that subsumes batch processing as well as message-driven applications.
</p>
<p>
Likewise for streaming data pipelines the combination of subscription to real-time events make it possible to use Kafka for very low-latency pipelines; but the ability to store data reliably make it possible to use it for critical data where the delivery of data must be guaranteed or for integration with offline systems that load data only periodically or may go down for extended periods of time for maintenance. The stream processing facilities make it possible to transform data as it arrives.
</p>
<p>
For more information on the guarantees, apis, and capabilities Kafka provides see the rest of the <a href="/documentation.html">documentation</a>.
</p>
<p>
More commonly, however, we have found that topics have a small number of consumer groups, one for each "logical subscriber". Each group is composed of many consumer instances for scalability and fault tolerance. This is nothing more than publish-subscribe semantics where the subscriber is a cluster of consumers instead of a single process.
</p>
<p>
The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a "fair share" of partitions at any point in time. This process of maintaining membership in the group is handled by the Kafka protocol dynamically. If new instances join the group they will take over some partitions from other members of the group; if an instance dies, its partitions will be distributed to the remaining instances.
</p>
<p>
Kafka only provides a total order over records <i>within</i> a partition, not between different partitions in a topic. Per-partition ordering combined with the ability to partition data by key is sufficient for most applications. However, if you require a total order over records this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group.
</p>
<h4><a id="intro_guarantees" href="#intro_guarantees">Guarantees</a></h4>
<p>
At a high-level Kafka gives the following guarantees:
</p>
<ul>
<li>Messages sent by a producer to a particular topic partition will be appended in the order they are sent. That is, if a record M1 is sent by the same producer as a record M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log.
<li>A consumer instance sees records in the order they are stored in the log.
<li>For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any records committed to the log.
</ul>
<p>
More details on these guarantees are given in the design section of the documentation.
</p>
<h4><a id="kafka_mq" href="#kafka_mq">Kafka as a Messaging System</a></h4>
<p>
How does Kafka's notion of streams compare to a traditional enterprise messaging system?
</p>
<p>
Messaging traditionally has two models: <a href="http://en.wikipedia.org/wiki/Message_queue">queuing</a> and <a href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern">publish-subscribe</a>. In a queue, a pool of consumers may read from a server and each record goes to one of them; in publish-subscribe the record is broadcast to all consumers. Each of these two models has a strength and a weakness. The strength of queuing is that it allows you to divide up the processing of data over multiple consumer instances, which lets you scale your processing. Unfortunately, queues aren't multi-subscriber&mdash;once one process reads the data it's gone. Publish-subscribe allows you broadcast data to multiple processes, but has no way of scaling processing since every message goes to every subscriber.
</p>
<p>
The consumer group concept in Kafka generalizes these two concepts. As with a queue the consumer group allows you to divide up processing over a collection of processes (the members of the consumer group). As with publish-subscribe, Kafka allows you to broadcast messages to multiple consumer groups.
</p>
<p>
The advantage of Kafka's model is that every topic has both these properties&mdash;it can scale processing and is also multi-subscriber&mdash;there is no need to choose one or the other.
</p>
<p>
Kafka has stronger ordering guarantees than a traditional messaging system, too.
</p>
<p>
A traditional queue retains records in-order on the server, and if multiple consumers consume from the queue then the server hands out records in the order they are stored. However, although the server hands out records in order, the records are delivered asynchronously to consumers, so they may arrive out of order on different consumers. This effectively means the ordering of the records is lost in the presence of parallel consumption. Messaging systems often work around this by having a notion of "exclusive consumer" that allows only one process to consume from a queue, but of course this means that there is no parallelism in processing.
</p>
<p>
Kafka does it better. By having a notion of parallelism&mdash;the partition&mdash;within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. Since there are many partitions this still balances the load over many consumer instances. Note however that there cannot be more consumer instances in a consumer group than partitions.
</p>
<h4>Kafka as a Storage System</h4>
<p>
Any message queue that allows publishing messages decoupled from consuming them is effectively acting as a storage system for the in-flight messages. What is different about Kafka is that it is a very good storage system.
</p>
<p>
Data written to Kafka is written to disk and replicated for fault-tolerance. Kafka allows producers to wait on acknowledgement so that a write isn't considered complete until it is fully replicated and guaranteed to persist even if the server written to fails.
</p>
<p>
The disk structures Kafka uses scale well&mdash;Kafka will perform the same whether you have 50 KB or 50 TB of persistent data on the server.
</p>
<p>
As a result of taking storage seriously and allowing the clients to control their read position, you can think of Kafka as a kind of special purpose distributed filesystem dedicated to high-performance, low-latency commit log storage, replication, and propagation.
</p>
<h4>Kafka for Stream Processing</h4>
<p>
It isn't enough to just read, write, and store streams of data, the purpose is to enable real-time processing of streams.
</p>
<p>
In Kafka a stream processor is anything that takes continual streams of data from input topics, performs some processing on this input, and produces continual streams of data to output topics.
</p>
<p>
For example, a retail application might take in input streams of sales and shipments, and output a stream of reorders and price adjustments computed off this data.
</p>
<p>
It is possible to do simple processing directly using the producer and consumer APIs. However for more complex transformations Kafka provides a fully integrated <a href="/documentation.html#streams">Streams API</a>. This allows building applications that do non-trivial processing that compute aggregations off of streams or join streams together.
</p>
<p>
This facility helps solve the hard problems this type of application faces: handling out-of-order data, reprocessing input as code changes, performing stateful computations, etc.
</p>
<p>
The streams API builds on the core primitives Kafka provides: it uses the producer and consumer APIs for input, uses Kafka for stateful storage, and uses the same group mechanism for fault tolerance among the stream processor instances.
</p>
<h4>Putting the Pieces Together</h4>
<p>
This combination of messaging, storage, and stream processing may seem unusual but it is essential to Kafka's role as a streaming platform.
</p>
<p>
A distributed file system like HDFS allows storing static files for batch processing. Effectively a system like this allows storing and processing <i>historical</i> data from the past.
</p>
<p>
A traditional enterprise messaging system allows processing future messages that will arrive after you subscribe. Applications built in this way process future data as it arrives.
</p>
<p>
Kafka combines both of these capabilities, and the combination is critical both for Kafka usage as a platform for streaming applications as well as for streaming data pipelines.
</p>
<p>
By combining storage and low-latency subscriptions, streaming applications can treat both past and future data the same way. That is a single application can process historical, stored data but rather than ending when it reaches the last record it can keep processing as future data arrives. This is a generalized notion of stream processing that subsumes batch processing as well as message-driven applications.
</p>
<p>
Likewise for streaming data pipelines the combination of subscription to real-time events make it possible to use Kafka for very low-latency pipelines; but the ability to store data reliably make it possible to use it for critical data where the delivery of data must be guaranteed or for integration with offline systems that load data only periodically or may go down for extended periods of time for maintenance. The stream processing facilities make it possible to transform data as it arrives.
</p>
<p>
For more information on the guarantees, apis, and capabilities Kafka provides see the rest of the <a href="/documentation.html">documentation</a>.
</p>
</script>
<!--#include virtual="../includes/_header.htm" -->
<!--#include virtual="../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../includes/_nav.htm" -->
<div class="right">
<div class="p-introduction"></div>
</div>
</div>
<!--#include virtual="../includes/_footer.htm" -->
<script>
// Show selected style on nav item
$(function() { $('.b-nav__intro').addClass('selected'); });
</script>

21
docs/js/templateData.js Normal file
View File

@ -0,0 +1,21 @@
/*
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// Define variables for doc templates
var context={
"version": "0101"
};

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -15,367 +15,410 @@
~ limitations under the License.
~-->
<h3><a id="streams_overview" href="#streams_overview">9.1 Overview</a></h3>
<script><!--#include virtual="js/templateData.js" --></script>
<p>
Kafka Streams is a client library for processing and analyzing data stored in Kafka and either write the resulting data back to Kafka or send the final output to an external system. It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management of application state.
Kafka Streams has a <b>low barrier to entry</b>: You can quickly write and run a small-scale proof-of-concept on a single machine; and you only need to run additional instances of your application on multiple machines to scale up to high-volume production workloads. Kafka Streams transparently handles the load balancing of multiple instances of the same application by leveraging Kafka's parallelism model.
</p>
<p>
Some highlights of Kafka Streams:
</p>
<script id="streams-template" type="text/x-handlebars-template">
<h1>Streams</h1>
<ul>
<li>Designed as a <b>simple and lightweight client library</b>, which can be easily embedded in any Java application and integrated with any existing packaging, deployment and operational tools that users have for their streaming applications.</li>
<li>Has <b>no external dependencies on systems other than Apache Kafka itself</b> as the internal messaging layer; notably, it uses Kafka's partitioning model to horizontally scale processing while maintaining strong ordering guarantees.</li>
<li>Supports <b>fault-tolerant local state</b>, which enables very fast and efficient stateful operations like joins and windowed aggregations.</li>
<li>Employs <b>one-record-at-a-time processing</b> to achieve low processing latency, and supports <b>event-time based windowing operations</b>.</li>
<li>Offers necessary stream processing primitives, along with a <b>high-level Streams DSL</b> and a <b>low-level Processor API</b>.</li>
<ol class="toc">
<li>
<a href="#streams_overview">Overview</a>
</li>
<li>
<a href="#streams_developer">Developer guide</a>
<ul>
<li><a href="#streams_concepts">Core concepts</a>
<li><a href="#streams_processor">Low-level processor API</a>
<li><a href="#streams_dsl">High-level streams DSL</a>
</ul>
</li>
</ol>
</ul>
<h2><a id="streams_overview" href="#overview">Overview</a></h2>
<h3><a id="streams_developer" href="#streams_developer">9.2 Developer Guide</a></h3>
<p>
Kafka Streams is a client library for processing and analyzing data stored in Kafka and either write the resulting data back to Kafka or send the final output to an external system. It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management of application state.
Kafka Streams has a <b>low barrier to entry</b>: You can quickly write and run a small-scale proof-of-concept on a single machine; and you only need to run additional instances of your application on multiple machines to scale up to high-volume production workloads. Kafka Streams transparently handles the load balancing of multiple instances of the same application by leveraging Kafka's parallelism model.
</p>
<p>
Some highlights of Kafka Streams:
</p>
<p>
There is a <a href="#quickstart_kafkastreams">quickstart</a> example that provides how to run a stream processing program coded in the Kafka Streams library.
This section focuses on how to write, configure, and execute a Kafka Streams application.
</p>
<ul>
<li>Designed as a <b>simple and lightweight client library</b>, which can be easily embedded in any Java application and integrated with any existing packaging, deployment and operational tools that users have for their streaming applications.</li>
<li>Has <b>no external dependencies on systems other than Apache Kafka itself</b> as the internal messaging layer; notably, it uses Kafka's partitioning model to horizontally scale processing while maintaining strong ordering guarantees.</li>
<li>Supports <b>fault-tolerant local state</b>, which enables very fast and efficient stateful operations like joins and windowed aggregations.</li>
<li>Employs <b>one-record-at-a-time processing</b> to achieve low processing latency, and supports <b>event-time based windowing operations</b>.</li>
<li>Offers necessary stream processing primitives, along with a <b>high-level Streams DSL</b> and a <b>low-level Processor API</b>.</li>
<h4><a id="streams_concepts" href="#streams_concepts">Core Concepts</a></h4>
</ul>
<p>
We first summarize the key concepts of Kafka Streams.
</p>
<h2><a id="streams_developer" href="#streams_developer">Developer Guide</a></h2>
<h5><a id="streams_topology" href="#streams_topology">Stream Processing Topology</a></h5>
<p>
There is a <a href="#quickstart_kafkastreams">quickstart</a> example that provides how to run a stream processing program coded in the Kafka Streams library.
This section focuses on how to write, configure, and execute a Kafka Streams application.
</p>
<ul>
<li>A <b>stream</b> is the most important abstraction provided by Kafka Streams: it represents an unbounded, continuously updating data set. A stream is an ordered, replayable, and fault-tolerant sequence of immutable data records, where a <b>data record</b> is defined as a key-value pair.</li>
<li>A <b>stream processing application</b> written in Kafka Streams defines its computational logic through one or more <b>processor topologies</b>, where a processor topology is a graph of stream processors (nodes) that are connected by streams (edges).</li>
<li>A <b>stream processor</b> is a node in the processor topology; it represents a processing step to transform data in streams by receiving one input record at a time from its upstream processors in the topology, applying its operation to it, and may subsequently producing one or more output records to its downstream processors.</li>
</ul>
<h4><a id="streams_concepts" href="#streams_concepts">Core Concepts</a></h4>
<p>
Kafka Streams offers two ways to define the stream processing topology: the <a href="#streams_dsl"><b>Kafka Streams DSL</b></a> provides
the most common data transformation operations such as <code>map</code> and <code>filter</code>; the lower-level <a href="#streams_processor"><b>Processor API</b></a> allows
developers define and connect custom processors as well as to interact with <a href="#streams_state">state stores</a>.
</p>
<p>
We first summarize the key concepts of Kafka Streams.
</p>
<h5><a id="streams_time" href="#streams_time">Time</a></h5>
<h5><a id="streams_topology" href="#streams_topology">Stream Processing Topology</a></h5>
<p>
A critical aspect in stream processing is the notion of <b>time</b>, and how it is modeled and integrated.
For example, some operations such as <b>windowing</b> are defined based on time boundaries.
</p>
<p>
Common notions of time in streams are:
</p>
<ul>
<li>A <b>stream</b> is the most important abstraction provided by Kafka Streams: it represents an unbounded, continuously updating data set. A stream is an ordered, replayable, and fault-tolerant sequence of immutable data records, where a <b>data record</b> is defined as a key-value pair.</li>
<li>A <b>stream processing application</b> written in Kafka Streams defines its computational logic through one or more <b>processor topologies</b>, where a processor topology is a graph of stream processors (nodes) that are connected by streams (edges).</li>
<li>A <b>stream processor</b> is a node in the processor topology; it represents a processing step to transform data in streams by receiving one input record at a time from its upstream processors in the topology, applying its operation to it, and may subsequently producing one or more output records to its downstream processors.</li>
</ul>
<ul>
<li><b>Event time</b> - The point in time when an event or data record occurred, i.e. was originally created "at the source".</li>
<li><b>Processing time</b> - The point in time when the event or data record happens to be processed by the stream processing application, i.e. when the record is being consumed. The processing time may be milliseconds, hours, or days etc. later than the original event time.</li>
<li><b>Ingestion time</b> - The point in time when an event or data record is stored in a topic partition by a Kafka broker. The difference to event time is that this ingestion timestamp is generated when the record is appended to the target topic by the Kafka broker, not when the record is created "at the source". The difference to processing time is that processing time is when the stream processing application processes the record. For example, if a record is never processed, there is no notion of processing time for it, but it still has an ingestion time.
</ul>
<p>
The choice between event-time and ingestion-time is actually done through the configuration of Kafka (not Kafka Streams): From Kafka 0.10.x onwards, timestamps are automatically embedded into Kafka messages. Depending on Kafka's configuration these timestamps represent event-time or ingestion-time. The respective Kafka configuration setting can be specified on the broker level or per topic. The default timestamp extractor in Kafka Streams will retrieve these embedded timestamps as-is. Hence, the effective time semantics of your application depend on the effective Kafka configuration for these embedded timestamps.
</p>
<p>
Kafka Streams assigns a <b>timestamp</b> to every data record
via the <code>TimestampExtractor</code> interface.
Concrete implementations of this interface may retrieve or compute timestamps based on the actual contents of data records such as an embedded timestamp field
to provide event-time semantics, or use any other approach such as returning the current wall-clock time at the time of processing,
thereby yielding processing-time semantics to stream processing applications.
Developers can thus enforce different notions of time depending on their business needs. For example,
per-record timestamps describe the progress of a stream with regards to time (although records may be out-of-order within the stream) and
are leveraged by time-dependent operations such as joins.
</p>
<p>
Kafka Streams offers two ways to define the stream processing topology: the <a href="#streams_dsl"><b>Kafka Streams DSL</b></a> provides
the most common data transformation operations such as <code>map</code> and <code>filter</code>; the lower-level <a href="#streams_processor"><b>Processor API</b></a> allows
developers define and connect custom processors as well as to interact with <a href="#streams_state">state stores</a>.
</p>
<p>
Finally, whenever a Kafka Streams application writes records to Kafka, then it will also assign timestamps to these new records. The way the timestamps are assigned depends on the context:
<ul>
<li> When new output records are generated via processing some input record, for example, <code>context.forward()</code> triggered in the <code>process()</code> function call, output record timestamps are inherited from input record timestamps directly.</li>
<li> When new output records are generated via periodic functions such as <code>punctuate()</code>, the output record timestamp is defined as the current internal time (obtained through <code>context.timestamp()</code>) of the stream task.</li>
<li> For aggregations, the timestamp of a resulting aggregate update record will be that of the latest arrived input record that triggered the update.</li>
</ul>
</p>
<h5><a id="streams_time" href="#streams_time">Time</a></h5>
<h5><a id="streams_state" href="#streams_state">States</a></h5>
<p>
A critical aspect in stream processing is the notion of <b>time</b>, and how it is modeled and integrated.
For example, some operations such as <b>windowing</b> are defined based on time boundaries.
</p>
<p>
Common notions of time in streams are:
</p>
<p>
Some stream processing applications don't require state, which means the processing of a message is independent from
the processing of all other messages.
However, being able to maintain state opens up many possibilities for sophisticated stream processing applications: you
can join input streams, or group and aggregate data records. Many such stateful operators are provided by the <a href="#streams_dsl"><b>Kafka Streams DSL</b></a>.
</p>
<p>
Kafka Streams provides so-called <b>state stores</b>, which can be used by stream processing applications to store and query data.
This is an important capability when implementing stateful operations.
Every task in Kafka Streams embeds one or more state stores that can be accessed via APIs to store and query data required for processing.
These state stores can either be a persistent key-value store, an in-memory hashmap, or another convenient data structure.
Kafka Streams offers fault-tolerance and automatic recovery for local state stores.
</p>
<p>
Kafka Streams allows direct read-only queries of the state stores by methods, threads, processes or applications external to the stream processing application that created the state stores. This is provided through a feature called <b>Interactive Queries</b>. All stores are named and Interactive Queries exposes only the read operations of the underlying implementation.
</p>
<br>
<p>
As we have mentioned above, the computational logic of a Kafka Streams application is defined as a <a href="#streams_topology">processor topology</a>.
Currently Kafka Streams provides two sets of APIs to define the processor topology, which will be described in the subsequent sections.
</p>
<ul>
<li><b>Event time</b> - The point in time when an event or data record occurred, i.e. was originally created "at the source".</li>
<li><b>Processing time</b> - The point in time when the event or data record happens to be processed by the stream processing application, i.e. when the record is being consumed. The processing time may be milliseconds, hours, or days etc. later than the original event time.</li>
<li><b>Ingestion time</b> - The point in time when an event or data record is stored in a topic partition by a Kafka broker. The difference to event time is that this ingestion timestamp is generated when the record is appended to the target topic by the Kafka broker, not when the record is created "at the source". The difference to processing time is that processing time is when the stream processing application processes the record. For example, if a record is never processed, there is no notion of processing time for it, but it still has an ingestion time.
</ul>
<p>
The choice between event-time and ingestion-time is actually done through the configuration of Kafka (not Kafka Streams): From Kafka 0.10.x onwards, timestamps are automatically embedded into Kafka messages. Depending on Kafka's configuration these timestamps represent event-time or ingestion-time. The respective Kafka configuration setting can be specified on the broker level or per topic. The default timestamp extractor in Kafka Streams will retrieve these embedded timestamps as-is. Hence, the effective time semantics of your application depend on the effective Kafka configuration for these embedded timestamps.
</p>
<p>
Kafka Streams assigns a <b>timestamp</b> to every data record
via the <code>TimestampExtractor</code> interface.
Concrete implementations of this interface may retrieve or compute timestamps based on the actual contents of data records such as an embedded timestamp field
to provide event-time semantics, or use any other approach such as returning the current wall-clock time at the time of processing,
thereby yielding processing-time semantics to stream processing applications.
Developers can thus enforce different notions of time depending on their business needs. For example,
per-record timestamps describe the progress of a stream with regards to time (although records may be out-of-order within the stream) and
are leveraged by time-dependent operations such as joins.
</p>
<h4><a id="streams_processor" href="#streams_processor">Low-Level Processor API</a></h4>
<p>
Finally, whenever a Kafka Streams application writes records to Kafka, then it will also assign timestamps to these new records. The way the timestamps are assigned depends on the context:
<ul>
<li> When new output records are generated via processing some input record, for example, <code>context.forward()</code> triggered in the <code>process()</code> function call, output record timestamps are inherited from input record timestamps directly.</li>
<li> When new output records are generated via periodic functions such as <code>punctuate()</code>, the output record timestamp is defined as the current internal time (obtained through <code>context.timestamp()</code>) of the stream task.</li>
<li> For aggregations, the timestamp of a resulting aggregate update record will be that of the latest arrived input record that triggered the update.</li>
</ul>
</p>
<h5><a id="streams_processor_process" href="#streams_processor_process">Processor</a></h5>
<h5><a id="streams_state" href="#streams_state">States</a></h5>
<p>
Developers can define their customized processing logic by implementing the <code>Processor</code> interface, which
provides <code>process</code> and <code>punctuate</code> methods. The <code>process</code> method is performed on each
of the received record; and the <code>punctuate</code> method is performed periodically based on elapsed time.
In addition, the processor can maintain the current <code>ProcessorContext</code> instance variable initialized in the
<code>init</code> method, and use the context to schedule the punctuation period (<code>context().schedule</code>), to
forward the modified / new key-value pair to downstream processors (<code>context().forward</code>), to commit the current
processing progress (<code>context().commit</code>), etc.
</p>
<p>
Some stream processing applications don't require state, which means the processing of a message is independent from
the processing of all other messages.
However, being able to maintain state opens up many possibilities for sophisticated stream processing applications: you
can join input streams, or group and aggregate data records. Many such stateful operators are provided by the <a href="#streams_dsl"><b>Kafka Streams DSL</b></a>.
</p>
<p>
Kafka Streams provides so-called <b>state stores</b>, which can be used by stream processing applications to store and query data.
This is an important capability when implementing stateful operations.
Every task in Kafka Streams embeds one or more state stores that can be accessed via APIs to store and query data required for processing.
These state stores can either be a persistent key-value store, an in-memory hashmap, or another convenient data structure.
Kafka Streams offers fault-tolerance and automatic recovery for local state stores.
</p>
<p>
Kafka Streams allows direct read-only queries of the state stores by methods, threads, processes or applications external to the stream processing application that created the state stores. This is provided through a feature called <b>Interactive Queries</b>. All stores are named and Interactive Queries exposes only the read operations of the underlying implementation.
</p>
<br>
<p>
As we have mentioned above, the computational logic of a Kafka Streams application is defined as a <a href="#streams_topology">processor topology</a>.
Currently Kafka Streams provides two sets of APIs to define the processor topology, which will be described in the subsequent sections.
</p>
<pre>
public class MyProcessor extends Processor<String, String> {
private ProcessorContext context;
private KeyValueStore<String, Integer> kvStore;
<h4><a id="streams_processor" href="#streams_processor">Low-Level Processor API</a></h4>
@Override
@SuppressWarnings("unchecked")
public void init(ProcessorContext context) {
this.context = context;
this.context.schedule(1000);
this.kvStore = (KeyValueStore<String, Integer>) context.getStateStore("Counts");
}
<h5><a id="streams_processor_process" href="#streams_processor_process">Processor</a></h5>
@Override
public void process(String dummy, String line) {
String[] words = line.toLowerCase().split(" ");
<p>
Developers can define their customized processing logic by implementing the <code>Processor</code> interface, which
provides <code>process</code> and <code>punctuate</code> methods. The <code>process</code> method is performed on each
of the received record; and the <code>punctuate</code> method is performed periodically based on elapsed time.
In addition, the processor can maintain the current <code>ProcessorContext</code> instance variable initialized in the
<code>init</code> method, and use the context to schedule the punctuation period (<code>context().schedule</code>), to
forward the modified / new key-value pair to downstream processors (<code>context().forward</code>), to commit the current
processing progress (<code>context().commit</code>), etc.
</p>
for (String word : words) {
Integer oldValue = this.kvStore.get(word);
<pre>
public class MyProcessor extends Processor<String, String> {
private ProcessorContext context;
private KeyValueStore<String, Integer> kvStore;
if (oldValue == null) {
this.kvStore.put(word, 1);
} else {
this.kvStore.put(word, oldValue + 1);
@Override
@SuppressWarnings("unchecked")
public void init(ProcessorContext context) {
this.context = context;
this.context.schedule(1000);
this.kvStore = (KeyValueStore<String, Integer>) context.getStateStore("Counts");
}
}
}
@Override
public void punctuate(long timestamp) {
KeyValueIterator<String, Integer> iter = this.kvStore.all();
@Override
public void process(String dummy, String line) {
String[] words = line.toLowerCase().split(" ");
while (iter.hasNext()) {
KeyValue<String, Integer> entry = iter.next();
context.forward(entry.key, entry.value.toString());
}
for (String word : words) {
Integer oldValue = this.kvStore.get(word);
iter.close();
context.commit();
}
if (oldValue == null) {
this.kvStore.put(word, 1);
} else {
this.kvStore.put(word, oldValue + 1);
}
}
}
@Override
public void close() {
this.kvStore.close();
}
};
</pre>
@Override
public void punctuate(long timestamp) {
KeyValueIterator<String, Integer> iter = this.kvStore.all();
<p>
In the above implementation, the following actions are performed:
while (iter.hasNext()) {
KeyValue<String, Integer> entry = iter.next();
context.forward(entry.key, entry.value.toString());
}
<ul>
<li>In the <code>init</code> method, schedule the punctuation every 1 second and retrieve the local state store by its name "Counts".</li>
<li>In the <code>process</code> method, upon each received record, split the value string into words, and update their counts into the state store (we will talk about this feature later in the section).</li>
<li>In the <code>punctuate</code> method, iterate the local state store and send the aggregated counts to the downstream processor, and commit the current stream state.</li>
</ul>
</p>
iter.close();
context.commit();
}
<h5><a id="streams_processor_topology" href="#streams_processor_topology">Processor Topology</a></h5>
@Override
public void close() {
this.kvStore.close();
}
};
</pre>
<p>
With the customized processors defined in the Processor API, developers can use the <code>TopologyBuilder</code> to build a processor topology
by connecting these processors together:
<p>
In the above implementation, the following actions are performed:
<pre>
TopologyBuilder builder = new TopologyBuilder();
<ul>
<li>In the <code>init</code> method, schedule the punctuation every 1 second and retrieve the local state store by its name "Counts".</li>
<li>In the <code>process</code> method, upon each received record, split the value string into words, and update their counts into the state store (we will talk about this feature later in the section).</li>
<li>In the <code>punctuate</code> method, iterate the local state store and send the aggregated counts to the downstream processor, and commit the current stream state.</li>
</ul>
</p>
builder.addSource("SOURCE", "src-topic")
<h5><a id="streams_processor_topology" href="#streams_processor_topology">Processor Topology</a></h5>
.addProcessor("PROCESS1", MyProcessor1::new /* the ProcessorSupplier that can generate MyProcessor1 */, "SOURCE")
.addProcessor("PROCESS2", MyProcessor2::new /* the ProcessorSupplier that can generate MyProcessor2 */, "PROCESS1")
.addProcessor("PROCESS3", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
<p>
With the customized processors defined in the Processor API, developers can use the <code>TopologyBuilder</code> to build a processor topology
by connecting these processors together:
.addSink("SINK1", "sink-topic1", "PROCESS1")
.addSink("SINK2", "sink-topic2", "PROCESS2")
.addSink("SINK3", "sink-topic3", "PROCESS3");
</pre>
<pre>
TopologyBuilder builder = new TopologyBuilder();
There are several steps in the above code to build the topology, and here is a quick walk through:
builder.addSource("SOURCE", "src-topic")
<ul>
<li>First of all a source node named "SOURCE" is added to the topology using the <code>addSource</code> method, with one Kafka topic "src-topic" fed to it.</li>
<li>Three processor nodes are then added using the <code>addProcessor</code> method; here the first processor is a child of the "SOURCE" node, but is the parent of the other two processors.</li>
<li>Finally three sink nodes are added to complete the topology using the <code>addSink</code> method, each piping from a different parent processor node and writing to a separate topic.</li>
</ul>
</p>
.addProcessor("PROCESS1", MyProcessor1::new /* the ProcessorSupplier that can generate MyProcessor1 */, "SOURCE")
.addProcessor("PROCESS2", MyProcessor2::new /* the ProcessorSupplier that can generate MyProcessor2 */, "PROCESS1")
.addProcessor("PROCESS3", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
<h5><a id="streams_processor_statestore" href="#streams_processor_statestore">Local State Store</a></h5>
.addSink("SINK1", "sink-topic1", "PROCESS1")
.addSink("SINK2", "sink-topic2", "PROCESS2")
.addSink("SINK3", "sink-topic3", "PROCESS3");
</pre>
<p>
Note that the Processor API is not limited to only accessing the current records as they arrive, but can also maintain local state stores
that keep recently arrived records to use in stateful processing operations such as aggregation or windowed joins.
To take advantage of this local states, developers can use the <code>TopologyBuilder.addStateStore</code> method when building the
processor topology to create the local state and associate it with the processor nodes that needs to access it; or they can connect a created
local state store with the existing processor nodes through <code>TopologyBuilder.connectProcessorAndStateStores</code>.
There are several steps in the above code to build the topology, and here is a quick walk through:
<pre>
TopologyBuilder builder = new TopologyBuilder();
<ul>
<li>First of all a source node named "SOURCE" is added to the topology using the <code>addSource</code> method, with one Kafka topic "src-topic" fed to it.</li>
<li>Three processor nodes are then added using the <code>addProcessor</code> method; here the first processor is a child of the "SOURCE" node, but is the parent of the other two processors.</li>
<li>Finally three sink nodes are added to complete the topology using the <code>addSink</code> method, each piping from a different parent processor node and writing to a separate topic.</li>
</ul>
</p>
builder.addSource("SOURCE", "src-topic")
<h5><a id="streams_processor_statestore" href="#streams_processor_statestore">Local State Store</a></h5>
.addProcessor("PROCESS1", MyProcessor1::new, "SOURCE")
// create the in-memory state store "COUNTS" associated with processor "PROCESS1"
.addStateStore(Stores.create("COUNTS").withStringKeys().withStringValues().inMemory().build(), "PROCESS1")
.addProcessor("PROCESS2", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
.addProcessor("PROCESS3", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
<p>
Note that the Processor API is not limited to only accessing the current records as they arrive, but can also maintain local state stores
that keep recently arrived records to use in stateful processing operations such as aggregation or windowed joins.
To take advantage of this local states, developers can use the <code>TopologyBuilder.addStateStore</code> method when building the
processor topology to create the local state and associate it with the processor nodes that needs to access it; or they can connect a created
local state store with the existing processor nodes through <code>TopologyBuilder.connectProcessorAndStateStores</code>.
// connect the state store "COUNTS" with processor "PROCESS2"
.connectProcessorAndStateStores("PROCESS2", "COUNTS");
<pre>
TopologyBuilder builder = new TopologyBuilder();
.addSink("SINK1", "sink-topic1", "PROCESS1")
.addSink("SINK2", "sink-topic2", "PROCESS2")
.addSink("SINK3", "sink-topic3", "PROCESS3");
</pre>
builder.addSource("SOURCE", "src-topic")
</p>
.addProcessor("PROCESS1", MyProcessor1::new, "SOURCE")
// create the in-memory state store "COUNTS" associated with processor "PROCESS1"
.addStateStore(Stores.create("COUNTS").withStringKeys().withStringValues().inMemory().build(), "PROCESS1")
.addProcessor("PROCESS2", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
.addProcessor("PROCESS3", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
In the next section we present another way to build the processor topology: the Kafka Streams DSL.
// connect the state store "COUNTS" with processor "PROCESS2"
.connectProcessorAndStateStores("PROCESS2", "COUNTS");
<h4><a id="streams_dsl" href="#streams_dsl">High-Level Streams DSL</a></h4>
.addSink("SINK1", "sink-topic1", "PROCESS1")
.addSink("SINK2", "sink-topic2", "PROCESS2")
.addSink("SINK3", "sink-topic3", "PROCESS3");
</pre>
To build a processor topology using the Streams DSL, developers can apply the <code>KStreamBuilder</code> class, which is extended from the <code>TopologyBuilder</code>.
A simple example is included with the source code for Kafka in the <code>streams/examples</code> package. The rest of this section will walk
through some code to demonstrate the key steps in creating a topology using the Streams DSL, but we recommend developers to read the full example source
codes for details.
</p>
<h5><a id="streams_kstream_ktable" href="#streams_kstream_ktable">KStream and KTable</a></h5>
The DSL uses two main abstractions. A <b>KStream</b> is an abstraction of a record stream, where each data record represents a self-contained datum in the unbounded data set. A <b>KTable</b> is an abstraction of a changelog stream, where each data record represents an update. More precisely, the value in a data record is considered to be an update of the last value for the same record key, if any (if a corresponding key doesn't exist yet, the update will be considered a create). To illustrate the difference between KStreams and KTables, lets imagine the following two data records are being sent to the stream: <code>("alice", 1) --> ("alice", 3)</code>. If these records a KStream and the stream processing application were to sum the values it would return <code>4</code>. If these records were a KTable, the return would be <code>3</code>, since the last record would be considered as an update.
In the next section we present another way to build the processor topology: the Kafka Streams DSL.
<h4><a id="streams_dsl" href="#streams_dsl">High-Level Streams DSL</a></h4>
To build a processor topology using the Streams DSL, developers can apply the <code>KStreamBuilder</code> class, which is extended from the <code>TopologyBuilder</code>.
A simple example is included with the source code for Kafka in the <code>streams/examples</code> package. The rest of this section will walk
through some code to demonstrate the key steps in creating a topology using the Streams DSL, but we recommend developers to read the full example source
codes for details.
<h5><a id="streams_kstream_ktable" href="#streams_kstream_ktable">KStream and KTable</a></h5>
The DSL uses two main abstractions. A <b>KStream</b> is an abstraction of a record stream, where each data record represents a self-contained datum in the unbounded data set. A <b>KTable</b> is an abstraction of a changelog stream, where each data record represents an update. More precisely, the value in a data record is considered to be an update of the last value for the same record key, if any (if a corresponding key doesn't exist yet, the update will be considered a create). To illustrate the difference between KStreams and KTables, lets imagine the following two data records are being sent to the stream: <code>("alice", 1) --> ("alice", 3)</code>. If these records a KStream and the stream processing application were to sum the values it would return <code>4</code>. If these records were a KTable, the return would be <code>3</code>, since the last record would be considered as an update.
<h5><a id="streams_dsl_source" href="#streams_dsl_source">Create Source Streams from Kafka</a></h5>
<h5><a id="streams_dsl_source" href="#streams_dsl_source">Create Source Streams from Kafka</a></h5>
<p>
Either a <b>record stream</b> (defined as <code>KStream</code>) or a <b>changelog stream</b> (defined as <code>KTable</code>)
can be created as a source stream from one or more Kafka topics (for <code>KTable</code> you can only create the source stream
from a single topic).
</p>
<p>
Either a <b>record stream</b> (defined as <code>KStream</code>) or a <b>changelog stream</b> (defined as <code>KTable</code>)
can be created as a source stream from one or more Kafka topics (for <code>KTable</code> you can only create the source stream
from a single topic).
</p>
<pre>
KStreamBuilder builder = new KStreamBuilder();
<pre>
KStreamBuilder builder = new KStreamBuilder();
KStream&lt;String, GenericRecord&gt; source1 = builder.stream("topic1", "topic2");
KTable&lt;String, GenericRecord&gt; source2 = builder.table("topic3", "stateStoreName");
</pre>
KStream<String, GenericRecord> source1 = builder.stream("topic1", "topic2");
KTable<String, GenericRecord> source2 = builder.table("topic3", "stateStoreName");
</pre>
<h5><a id="streams_dsl_windowing" href="#streams_dsl_windowing">Windowing a stream</a></h5>
A stream processor may need to divide data records into time buckets, i.e. to <b>window</b> the stream by time. This is usually needed for join and aggregation operations, etc. Kafka Streams currently defines the following types of windows:
<ul>
<li><b>Hopping time windows</b> are windows based on time intervals. They model fixed-sized, (possibly) overlapping windows. A hopping window is defined by two properties: the window's size and its advance interval (aka "hop"). The advance interval specifies by how much a window moves forward relative to the previous one. For example, you can configure a hopping window with a size 5 minutes and an advance interval of 1 minute. Since hopping windows can overlap a data record may belong to more than one such windows.</li>
<li><b>Tumbling time windows</b> are a special case of hopping time windows and, like the latter, are windows based on time intervals. They model fixed-size, non-overlapping, gap-less windows. A tumbling window is defined by a single property: the window's size. A tumbling window is a hopping window whose window size is equal to its advance interval. Since tumbling windows never overlap, a data record will belong to one and only one window.</li>
<li><b>Sliding windows</b> model a fixed-size window that slides continuously over the time axis; here, two data records are said to be included in the same window if the difference of their timestamps is within the window size. Thus, sliding windows are not aligned to the epoch, but on the data record timestamps. In Kafka Streams, sliding windows are used only for join operations, and can be specified through the <code>JoinWindows</code> class.</li>
</ul>
<h5><a id="streams_dsl_windowing" href="#streams_dsl_windowing">Windowing a stream</a></h5>
A stream processor may need to divide data records into time buckets, i.e. to <b>window</b> the stream by time. This is usually needed for join and aggregation operations, etc. Kafka Streams currently defines the following types of windows:
<ul>
<li><b>Hopping time windows</b> are windows based on time intervals. They model fixed-sized, (possibly) overlapping windows. A hopping window is defined by two properties: the window's size and its advance interval (aka "hop"). The advance interval specifies by how much a window moves forward relative to the previous one. For example, you can configure a hopping window with a size 5 minutes and an advance interval of 1 minute. Since hopping windows can overlap a data record may belong to more than one such windows.</li>
<li><b>Tumbling time windows</b> are a special case of hopping time windows and, like the latter, are windows based on time intervals. They model fixed-size, non-overlapping, gap-less windows. A tumbling window is defined by a single property: the window's size. A tumbling window is a hopping window whose window size is equal to its advance interval. Since tumbling windows never overlap, a data record will belong to one and only one window.</li>
<li><b>Sliding windows</b> model a fixed-size window that slides continuously over the time axis; here, two data records are said to be included in the same window if the difference of their timestamps is within the window size. Thus, sliding windows are not aligned to the epoch, but on the data record timestamps. In Kafka Streams, sliding windows are used only for join operations, and can be specified through the <code>JoinWindows</code> class.</li>
</ul>
<h5><a id="streams_dsl_joins" href="#streams_dsl_joins">Joins</a></h5>
A <b>join</b> operation merges two streams based on the keys of their data records, and yields a new stream. A join over record streams usually needs to be performed on a windowing basis because otherwise the number of records that must be maintained for performing the join may grow indefinitely. In Kafka Streams, you may perform the following join operations:
<ul>
<li><b>KStream-to-KStream Joins</b> are always windowed joins, since otherwise the memory and state required to compute the join would grow infinitely in size. Here, a newly received record from one of the streams is joined with the other stream's records within the specified window interval to produce one result for each matching pair based on user-provided <code>ValueJoiner</code>. A new <code>KStream</code> instance representing the result stream of the join is returned from this operator.</li>
<li><b>KTable-to-KTable Joins</b> are join operations designed to be consistent with the ones in relational databases. Here, both changelog streams are materialized into local state stores first. When a new record is received from one of the streams, it is joined with the other stream's materialized state stores to produce one result for each matching pair based on user-provided ValueJoiner. A new <code>KTable</code> instance representing the result stream of the join, which is also a changelog stream of the represented table, is returned from this operator.</li>
<li><b>KStream-to-KTable Joins</b> allow you to perform table lookups against a changelog stream (<code>KTable</code>) upon receiving a new record from another record stream (KStream). An example use case would be to enrich a stream of user activities (<code>KStream</code>) with the latest user profile information (<code>KTable</code>). Only records received from the record stream will trigger the join and produce results via <code>ValueJoiner</code>, not vice versa (i.e., records received from the changelog stream will be used only to update the materialized state store). A new <code>KStream</code> instance representing the result stream of the join is returned from this operator.</li>
</ul>
<h5><a id="streams_dsl_joins" href="#streams_dsl_joins">Joins</a></h5>
A <b>join</b> operation merges two streams based on the keys of their data records, and yields a new stream. A join over record streams usually needs to be performed on a windowing basis because otherwise the number of records that must be maintained for performing the join may grow indefinitely. In Kafka Streams, you may perform the following join operations:
<ul>
<li><b>KStream-to-KStream Joins</b> are always windowed joins, since otherwise the memory and state required to compute the join would grow infinitely in size. Here, a newly received record from one of the streams is joined with the other stream's records within the specified window interval to produce one result for each matching pair based on user-provided <code>ValueJoiner</code>. A new <code>KStream</code> instance representing the result stream of the join is returned from this operator.</li>
<li><b>KTable-to-KTable Joins</b> are join operations designed to be consistent with the ones in relational databases. Here, both changelog streams are materialized into local state stores first. When a new record is received from one of the streams, it is joined with the other stream's materialized state stores to produce one result for each matching pair based on user-provided ValueJoiner. A new <code>KTable</code> instance representing the result stream of the join, which is also a changelog stream of the represented table, is returned from this operator.</li>
<li><b>KStream-to-KTable Joins</b> allow you to perform table lookups against a changelog stream (<code>KTable</code>) upon receiving a new record from another record stream (KStream). An example use case would be to enrich a stream of user activities (<code>KStream</code>) with the latest user profile information (<code>KTable</code>). Only records received from the record stream will trigger the join and produce results via <code>ValueJoiner</code>, not vice versa (i.e., records received from the changelog stream will be used only to update the materialized state store). A new <code>KStream</code> instance representing the result stream of the join is returned from this operator.</li>
</ul>
Depending on the operands the following join operations are supported: <b>inner joins</b>, <b>outer joins</b> and <b>left joins</b>. Their semantics are similar to the corresponding operators in relational databases.
Depending on the operands the following join operations are supported: <b>inner joins</b>, <b>outer joins</b> and <b>left joins</b>. Their semantics are similar to the corresponding operators in relational databases.
a
<h5><a id="streams_dsl_transform" href="#streams_dsl_transform">Transform a stream</a></h5>
<h5><a id="streams_dsl_transform" href="#streams_dsl_transform">Transform a stream</a></h5>
<p>
There is a list of transformation operations provided for <code>KStream</code> and <code>KTable</code> respectively.
Each of these operations may generate either one or more <code>KStream</code> and <code>KTable</code> objects and
can be translated into one or more connected processors into the underlying processor topology.
All these transformation methods can be chained together to compose a complex processor topology.
Since <code>KStream</code> and <code>KTable</code> are strongly typed, all these transformation operations are defined as
generics functions where users could specify the input and output data types.
</p>
<p>
There is a list of transformation operations provided for <code>KStream</code> and <code>KTable</code> respectively.
Each of these operations may generate either one or more <code>KStream</code> and <code>KTable</code> objects and
can be translated into one or more connected processors into the underlying processor topology.
All these transformation methods can be chained together to compose a complex processor topology.
Since <code>KStream</code> and <code>KTable</code> are strongly typed, all these transformation operations are defined as
generics functions where users could specify the input and output data types.
</p>
<p>
Among these transformations, <code>filter</code>, <code>map</code>, <code>mapValues</code>, etc, are stateless
transformation operations and can be applied to both <code>KStream</code> and <code>KTable</code>,
where users can usually pass a customized function to these functions as a parameter, such as <code>Predicate</code> for <code>filter</code>,
<code>KeyValueMapper</code> for <code>map</code>, etc:
<p>
Among these transformations, <code>filter</code>, <code>map</code>, <code>mapValues</code>, etc, are stateless
transformation operations and can be applied to both <code>KStream</code> and <code>KTable</code>,
where users can usually pass a customized function to these functions as a parameter, such as <code>Predicate</code> for <code>filter</code>,
<code>KeyValueMapper</code> for <code>map</code>, etc:
</p>
</p>
<pre>
// written in Java 8+, using lambda expressions
KStream<String, GenericRecord> mapped = source1.mapValue(record -> record.get("category"));
</pre>
<pre>
// written in Java 8+, using lambda expressions
KStream&lt;String, GenericRecord&gt; mapped = source1.mapValue(record -> record.get("category"));
</pre>
<p>
Stateless transformations, by definition, do not depend on any state for processing, and hence implementation-wise
they do not require a state store associated with the stream processor; Stateful transformations, on the other hand,
require accessing an associated state for processing and producing outputs.
For example, in <code>join</code> and <code>aggregate</code> operations, a windowing state is usually used to store all the received records
within the defined window boundary so far. The operators can then access these accumulated records in the store and compute
based on them.
</p>
<p>
Stateless transformations, by definition, do not depend on any state for processing, and hence implementation-wise
they do not require a state store associated with the stream processor; stateful transformations, on the other hand,
require accessing an associated state for processing and producing outputs.
For example, in <code>join</code> and <code>aggregate</code> operations, a windowing state is usually used to store all the received records
within the defined window boundary so far. The operators can then access these accumulated records in the store and compute
based on them.
</p>
<pre>
// written in Java 8+, using lambda expressions
KTable<Windowed<String>, Long> counts = source1.groupByKey().aggregate(
() -> 0L, // initial value
(aggKey, value, aggregate) -> aggregate + 1L, // aggregating value
TimeWindows.of("counts", 5000L).advanceBy(1000L), // intervals in milliseconds
Serdes.Long() // serde for aggregated value
);
<pre>
// written in Java 8+, using lambda expressions
KTable&lt;Windowed&lt;String&gt;, Long&gt; counts = source1.groupByKey().aggregate(
() -> 0L, // initial value
(aggKey, value, aggregate) -> aggregate + 1L, // aggregating value
TimeWindows.of("counts", 5000L).advanceBy(1000L), // intervals in milliseconds
Serdes.Long() // serde for aggregated value
);
KStream<String, String> joined = source1.leftJoin(source2,
(record1, record2) -> record1.get("user") + "-" + record2.get("region");
);
</pre>
KStream&lt;String, String&gt; joined = source1.leftJoin(source2,
(record1, record2) -> record1.get("user") + "-" + record2.get("region");
);
</pre>
<h5><a id="streams_dsl_sink" href="#streams_dsl_sink">Write streams back to Kafka</a></h5>
<h5><a id="streams_dsl_sink" href="#streams_dsl_sink">Write streams back to Kafka</a></h5>
<p>
At the end of the processing, users can choose to (continuously) write the final resulted streams back to a Kafka topic through
<code>KStream.to</code> and <code>KTable.to</code>.
</p>
<p>
At the end of the processing, users can choose to (continuously) write the final resulted streams back to a Kafka topic through
<code>KStream.to</code> and <code>KTable.to</code>.
</p>
<pre>
joined.to("topic4");
</pre>
<pre>
joined.to("topic4");
</pre>
If your application needs to continue reading and processing the records after they have been materialized
to a topic via <code>to</code> above, one option is to construct a new stream that reads from the output topic;
Kafka Streams provides a convenience method called <code>through</code>:
If your application needs to continue reading and processing the records after they have been materialized
to a topic via <code>to</code> above, one option is to construct a new stream that reads from the output topic;
Kafka Streams provides a convenience method called <code>through</code>:
<pre>
// equivalent to
//
// joined.to("topic4");
// materialized = builder.stream("topic4");
KStream&lt;String, String&gt; materialized = joined.through("topic4");
</pre>
<pre>
// equivalent to
//
// joined.to("topic4");
// materialized = builder.stream("topic4");
KStream<String, String> materialized = joined.through("topic4");
</pre>
<br>
<p>
Besides defining the topology, developers will also need to configure their applications
in <code>StreamsConfig</code> before running it. A complete list of
Kafka Streams configs can be found <a href="#streamsconfigs"><b>here</b></a>.
</p>
<br>
<p>
Besides defining the topology, developers will also need to configure their applications
in <code>StreamsConfig</code> before running it. A complete list of
Kafka Streams configs can be found <a href="#streamsconfigs"><b>here</b></a>.
</p>
</script>
<!--#include virtual="../includes/_header.htm" -->
<!--#include virtual="../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
</ul>
<div class="p-streams"></div>
</div>
</div>
<!--#include virtual="../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>