MINOR: Improve Streams Dev Guide content on web docs

This PR migrates content from CP Streams Dev Guide.

Here is the top-level page:
![image](https://user-images.githubusercontent.com/11722533/33904945-df9cf804-df31-11e7-93aa-52385961522c.png)

Here is a child page:
![image](https://user-images.githubusercontent.com/11722533/33904976-f2eafabe-df31-11e7-918c-fbf95db0f76b.png)

See related: https://github.com/apache/kafka-site/pull/112

Author: Joel Hamill <joel-hamill@users.noreply.github.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #4252 from joel-hamill/20171122-migrate-cp-dev-guide
This commit is contained in:
Joel Hamill 2017-12-21 11:15:54 -08:00 committed by Guozhang Wang
parent 7d6f6f7320
commit 3e2fe17c08
38 changed files with 6553 additions and 3038 deletions

View File

@ -0,0 +1,19 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../../streams/developer-guide/app-reset-tool.html" -->

View File

@ -0,0 +1,19 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../../streams/developer-guide/config-streams.html" -->

View File

@ -0,0 +1,19 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../../streams/developer-guide/datatypes.html" -->

View File

@ -0,0 +1,19 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../../streams/developer-guide/dsl-api.html" -->

View File

@ -16,4 +16,4 @@
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../streams/developer-guide.html" -->
<!--#include virtual="../../../streams/developer-guide/index.html" -->

View File

@ -0,0 +1,19 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../../streams/developer-guide/interactive-queries.html" -->

View File

@ -0,0 +1,19 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../../streams/developer-guide/manage-topics.html" -->

View File

@ -0,0 +1,19 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../../streams/developer-guide/memory-mgmt.html" -->

View File

@ -0,0 +1,19 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../../streams/developer-guide/processor-api.html" -->

View File

@ -0,0 +1,19 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../../streams/developer-guide/running-app.html" -->

View File

@ -0,0 +1,19 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../../streams/developer-guide/security.html" -->

View File

@ -0,0 +1,19 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- should always link the latest release's documentation -->
<!--#include virtual="../../../streams/developer-guide/write-streams.html" -->

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

View File

@ -110,7 +110,7 @@
<p>
Kafka Streams builds on fault-tolerance capabilities integrated natively within Kafka. Kafka partitions are highly available and replicated; so when stream data is persisted to Kafka it is available
even if the application fails and needs to re-process it. Tasks in Kafka Streams leverage the fault-tolerance capability
offered by the <a href="https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client/">Kafka consumer client</a> to handle failures.
offered by the Kafka consumer client to handle failures.
If a task runs on a machine that fails, Kafka Streams automatically restarts the task in one of the remaining running instances of the application.
</p>
@ -143,7 +143,7 @@
<!--#include virtual="../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams API</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
</ul>
<div class="p-content"></div>
</div>

View File

@ -23,7 +23,7 @@
<div class="sticky-top">
<div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
@ -180,7 +180,7 @@
<!--#include virtual="../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams API</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
</ul>
<div class="p-content"></div>
</div>

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,173 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="application-reset-tool">
<span id="streams-developer-guide-app-reset"></span><h1>Application Reset Tool<a class="headerlink" href="#application-reset-tool" title="Permalink to this headline"></a></h1>
<p>You can reset an application and force it to reprocess its data from scratch by using the application reset tool.
This can be useful for development and testing, or when fixing bugs.</p>
<p>The application reset tool handles the Kafka Streams <a class="reference internal" href="manage-topics.html#streams-developer-guide-topics-user"><span class="std std-ref">user topics</span></a> (input,
output, and intermediate topics) and <a class="reference internal" href="manage-topics.html#streams-developer-guide-topics-internal"><span class="std std-ref">internal topics</span></a> differently
when resetting the application.</p>
<p>Here&#8217;s what the application reset tool does for each topic type:</p>
<ul class="simple">
<li>Input topics: Reset to the beginning of the topic. This means that it sets the application&#8217;s committed consumer offsets for all partitions to each partition&#8217;s <code class="docutils literal"><span class="pre">earliest</span></code> offset (for consumer group <code class="docutils literal"><span class="pre">application.id</span></code>).</li>
<li>Intermediate topics: Skip to the end of the topic, i.e., set the application&#8217;s committed consumer offsets for all partitions to each partition&#8217;s <code class="docutils literal"><span class="pre">logSize</span></code> (for consumer group <code class="docutils literal"><span class="pre">application.id</span></code>).</li>
<li>Internal topics: Delete the internal topic (this automatically deletes any committed offsets).</li>
</ul>
<p>The application reset tool does not:</p>
<ul class="simple">
<li>Reset output topics of an application. If any output (or intermediate) topics are consumed by downstream
applications, it is your responsibility to adjust those downstream applications as appropriate when you reset the
upstream application.</li>
<li>Reset the local environment of your application instances. It is your responsibility to delete the local
state on any machine on which an application instance was run. See the instructions in section
<a class="reference internal" href="#streams-developer-guide-reset-local-environment"><span class="std std-ref">Step 2: Reset the local environments of your application instances</span></a> on how to do this.</li>
</ul>
<dl class="docutils">
<dt>Prerequisites</dt>
<dd><ul class="first last">
<li><p class="first">All instances of your application must be stopped. Otherwise, the application may enter an invalid state, crash, or produce incorrect results. You can verify whether the consumer group with ID <code class="docutils literal"><span class="pre">application.id</span></code> is still active by using <code class="docutils literal"><span class="pre">bin/kafka-consumer-groups</span></code>.</p>
</li>
<li><p class="first">Use this tool with care and double-check its parameters: If you provide wrong parameter values (e.g., typos in <code class="docutils literal"><span class="pre">application.id</span></code>) or specify parameters inconsistently (e.g., specify the wrong input topics for the application), this tool might invalidate the application&#8217;s state or even impact other applications, consumer groups, or your Kafka topics.</p>
</li>
<li><p class="first">You should manually delete and re-create any intermediate topics before running the application reset tool. This will free up disk space in Kafka brokers.</p>
</li>
<li><p class="first">You should delete and recreate intermediate topics before running the application reset tool, unless the following applies:</p>
<blockquote>
<div><ul class="simple">
<li>You have external downstream consumers for the application&#8217;s intermediate topics.</li>
<li>You are in a development environment where manually deleting and re-creating intermediate topics is unnecessary.</li>
</ul>
</div></blockquote>
</li>
</ul>
</dd>
</dl>
<div class="section" id="step-1-run-the-application-reset-tool">
<h2>Step 1: Run the application reset tool<a class="headerlink" href="#step-1-run-the-application-reset-tool" title="Permalink to this headline"></a></h2>
<p>Invoke the application reset tool from the command line</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span>&lt;path-to-kafka&gt;/bin/kafka-streams-application-reset
</pre></div>
</div>
<p>The tool accepts the following parameters:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span>Option <span class="o">(</span>* <span class="o">=</span> required<span class="o">)</span> Description
--------------------- -----------
* --application-id &lt;String: id&gt; The Kafka Streams application ID
<span class="o">(</span>application.id<span class="o">)</span>.
--bootstrap-servers &lt;String: urls&gt; Comma-separated list of broker urls with
format: HOST1:PORT1,HOST2:PORT2
<span class="o">(</span>default: localhost:9092<span class="o">)</span>
--config-file &lt;String: file name&gt; Property file containing configs to be
passed to admin clients and embedded
consumer.
--dry-run Display the actions that would be
performed without executing the reset
commands.
--input-topics &lt;String: list&gt; Comma-separated list of user input
topics. For these topics, the tool will
reset the offset to the earliest
available offset.
--intermediate-topics &lt;String: list&gt; Comma-separated list of intermediate user
topics <span class="o">(</span>topics used in the through<span class="o">()</span>
method<span class="o">)</span>. For these topics, the tool
will skip to the end.
--zookeeper Zookeeper option is deprecated by
bootstrap.servers, as the reset tool
would no longer access Zookeeper
directly.
</pre></div>
</div>
<p>Parameters can be combined as needed. For example, if you want to restart an application from an
empty internal state, but not reprocess previous data, simply omit the parameters <code class="docutils literal"><span class="pre">--input-topics</span></code> and
<code class="docutils literal"><span class="pre">--intermediate-topics</span></code>.</p>
</div>
<div class="section" id="step-2-reset-the-local-environments-of-your-application-instances">
<span id="streams-developer-guide-reset-local-environment"></span><h2>Step 2: Reset the local environments of your application instances<a class="headerlink" href="#step-2-reset-the-local-environments-of-your-application-instances" title="Permalink to this headline"></a></h2>
<p>For a complete application reset, you must delete the application&#8217;s local state directory on any machines where the
application instance was run. You must do this before restarting an application instance on the same machine. You can
use either of these methods:</p>
<ul class="simple">
<li>The API method <code class="docutils literal"><span class="pre">KafkaStreams#cleanUp()</span></code> in your application code.</li>
<li>Manually delete the corresponding local state directory (default location: <code class="docutils literal"><span class="pre">/var/lib/kafka-streams/&lt;application.id&gt;</span></code>). For more information, see <a class="reference internal" href="../javadocs.html#streams-javadocs"><span class="std std-ref">state.dir</span></a> StreamsConfig class.</li>
</ul>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/security" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@ -0,0 +1,717 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="configuring-a-streams-application">
<span id="streams-developer-guide-configuration"></span><h1>Configuring a Streams Application<a class="headerlink" href="#configuring-a-streams-application" title="Permalink to this headline"></a></h1>
<p>Kafka and Kafka Streams configuration options must be configured before using Streams. You can configure Kafka Streams by specifying parameters in a <code class="docutils literal"><span class="pre">StreamsConfig</span></code> instance.</p>
<ol class="arabic">
<li><p class="first">Create a <code class="docutils literal"><span class="pre">java.util.Properties</span></code> instance.</p>
</li>
<li><p class="first">Set the <a class="reference internal" href="#streams-developer-guide-required-configs"><span class="std std-ref">parameters</span></a>.</p>
</li>
<li><p class="first">Construct a <code class="docutils literal"><span class="pre">StreamsConfig</span></code> instance from the <code class="docutils literal"><span class="pre">Properties</span></code> instance. For example:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Properties</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Set a few key parameters</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_ID_CONFIG</span><span class="o">,</span> <span class="s">&quot;my-first-streams-application&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">&quot;kafka-broker1:9092&quot;</span><span class="o">);</span>
<span class="c1">// Any further settings</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(...</span> <span class="o">,</span> <span class="o">...);</span>
<span class="c1">// Create an instance of StreamsConfig from the Properties instance</span>
<span class="n">StreamsConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StreamsConfig</span><span class="o">(</span><span class="n">settings</span><span class="o">);</span>
</pre></div>
</div>
</li>
</ol>
<div class="section" id="configuration-parameter-reference">
<span id="streams-developer-guide-required-configs"></span><h2>Configuration parameter reference<a class="headerlink" href="#configuration-parameter-reference" title="Permalink to this headline"></a></h2>
<p>This section contains the most common Streams configuration parameters. For a full reference, see the <a class="reference external" href="/current/streams/javadocs/index.html">Streams</a> and <a class="reference external" href="/current/clients/javadocs/index.html">Client</a> Javadocs.</p>
<div class="contents local topic" id="contents">
<ul class="simple">
<li><a class="reference internal" href="#required-configuration-parameters" id="id3">Required configuration parameters</a><ul>
<li><a class="reference internal" href="#application-id" id="id4">application.id</a></li>
<li><a class="reference internal" href="#bootstrap-servers" id="id5">bootstrap.servers</a></li>
</ul>
</li>
<li><a class="reference internal" href="#optional-configuration-parameters" id="id6">Optional configuration parameters</a><ul>
<li><a class="reference internal" href="#default-deserialization-exception-handler" id="id7">default.deserialization.exception.handler</a></li>
<li><a class="reference internal" href="#default-key-serde" id="id8">default.key.serde</a></li>
<li><a class="reference internal" href="#default-value-serde" id="id9">default.value.serde</a></li>
<li><a class="reference internal" href="#num-standby-replicas" id="id10">num.standby.replicas</a></li>
<li><a class="reference internal" href="#num-stream-threads" id="id11">num.stream.threads</a></li>
<li><a class="reference internal" href="#partition-grouper" id="id12">partition.grouper</a></li>
<li><a class="reference internal" href="#replication-factor" id="id13">replication.factor</a></li>
<li><a class="reference internal" href="#state-dir" id="id14">state.dir</a></li>
<li><a class="reference internal" href="#timestamp-extractor" id="id15">timestamp.extractor</a></li>
</ul>
</li>
<li><a class="reference internal" href="#kafka-consumers-and-producer-configuration-parameters" id="id16">Kafka consumers and producer configuration parameters</a><ul>
<li><a class="reference internal" href="#naming" id="id17">Naming</a></li>
<li><a class="reference internal" href="#default-values" id="id18">Default Values</a></li>
<li><a class="reference internal" href="#enable-auto-commit" id="id19">enable.auto.commit</a></li>
<li><a class="reference internal" href="#rocksdb-config-setter" id="id20">rocksdb.config.setter</a></li>
</ul>
</li>
<li><a class="reference internal" href="#recommended-configuration-parameters-for-resiliency" id="id21">Recommended configuration parameters for resiliency</a><ul>
<li><a class="reference internal" href="#acks" id="id22">acks</a></li>
<li><a class="reference internal" href="#id2" id="id23">replication.factor</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="required-configuration-parameters">
<h3><a class="toc-backref" href="#id3">Required configuration parameters</a><a class="headerlink" href="#required-configuration-parameters" title="Permalink to this headline"></a></h3>
<p>Here are the required Streams configuration parameters.</p>
<table border="1" class="non-scrolling-table docutils">
<colgroup>
<col width="20%" />
<col width="5%" />
<col width="7%" />
<col width="38%" />
<col width="31%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Parameter Name</th>
<th class="head">Importance</th>
<th class="head" colspan="2">Description</th>
<th class="head">Default Value</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>application.id</td>
<td>Required</td>
<td colspan="2">An identifier for the stream processing application. Must be unique within the Kafka cluster.</td>
<td>None</td>
</tr>
<tr class="row-odd"><td>bootstrap.servers</td>
<td>Required</td>
<td colspan="2">A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.</td>
<td>None</td>
</tr>
</tbody>
</table>
<div class="section" id="application-id">
<h4><a class="toc-backref" href="#id4">application.id</a><a class="headerlink" href="#application-id" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>(Required) The application ID. Each stream processing application must have a unique ID. The same ID must be given to
all instances of the application. It is recommended to use only alphanumeric characters, <code class="docutils literal"><span class="pre">.</span></code> (dot), <code class="docutils literal"><span class="pre">-</span></code> (hyphen), and <code class="docutils literal"><span class="pre">_</span></code> (underscore). Examples: <code class="docutils literal"><span class="pre">&quot;hello_world&quot;</span></code>, <code class="docutils literal"><span class="pre">&quot;hello_world-v1.0.0&quot;</span></code></p>
<p>This ID is used in the following places to isolate resources used by the application from others:</p>
<ul class="simple">
<li>As the default Kafka consumer and producer <code class="docutils literal"><span class="pre">client.id</span></code> prefix</li>
<li>As the Kafka consumer <code class="docutils literal"><span class="pre">group.id</span></code> for coordination</li>
<li>As the name of the subdirectory in the state directory (cf. <code class="docutils literal"><span class="pre">state.dir</span></code>)</li>
<li>As the prefix of internal Kafka topic names</li>
</ul>
<dl class="docutils">
<dt>Tip:</dt>
<dd>When an application is updated, the <code class="docutils literal"><span class="pre">application.id</span></code> should be changed unless you want to reuse the existing data in internal topics and state stores.
For example, you could embed the version information within <code class="docutils literal"><span class="pre">application.id</span></code>, as <code class="docutils literal"><span class="pre">my-app-v1.0.0</span></code> and <code class="docutils literal"><span class="pre">my-app-v1.0.2</span></code>.</dd>
</dl>
</div></blockquote>
</div>
<div class="section" id="bootstrap-servers">
<h4><a class="toc-backref" href="#id5">bootstrap.servers</a><a class="headerlink" href="#bootstrap-servers" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>(Required) The Kafka bootstrap servers. This is the same <a class="reference external" href="http://kafka.apache.org/documentation.html#producerconfigs">setting</a> that is used by the underlying producer and consumer clients to connect to the Kafka cluster.
Example: <code class="docutils literal"><span class="pre">&quot;kafka-broker1:9092,kafka-broker2:9092&quot;</span></code>.</p>
<dl class="docutils">
<dt>Tip:</dt>
<dd>Kafka Streams applications can only communicate with a single Kafka cluster specified by this config value.
Future versions of Kafka Streams will support connecting to different Kafka clusters for reading input
streams and writing output streams.</dd>
</dl>
</div></blockquote>
</div>
</div>
<div class="section" id="optional-configuration-parameters">
<span id="streams-developer-guide-optional-configs"></span><h3><a class="toc-backref" href="#id6">Optional configuration parameters</a><a class="headerlink" href="#optional-configuration-parameters" title="Permalink to this headline"></a></h3>
<p>Here are the optional <a class="reference internal" href="../javadocs.html#streams-javadocs"><span class="std std-ref">Streams configuration parameters</span></a>, sorted by level of importance:</p>
<blockquote>
<div><ul class="simple">
<li>High: These parameters can have a significant impact on performance. Take care when deciding the values of these parameters.</li>
<li>Medium: These parameters can have some impact on performance. Your specific environment will determine how much tuning effort should be focused on these parameters.</li>
<li>Low: These parameters have a less general or less significant impact on performance.</li>
</ul>
</div></blockquote>
<table border="1" class="non-scrolling-table docutils">
<colgroup>
<col width="20%" />
<col width="5%" />
<col width="7%" />
<col width="38%" />
<col width="31%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Parameter Name</th>
<th class="head">Importance</th>
<th class="head" colspan="2">Description</th>
<th class="head">Default Value</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>application.server</td>
<td>Low</td>
<td colspan="2">A host:port pair pointing to an embedded user defined endpoint that can be used for discovering the locations of
state stores within a single Kafka Streams application. The value of this must be different for each instance
of the application.</td>
<td>the empty string</td>
</tr>
<tr class="row-odd"><td>buffered.records.per.partition</td>
<td>Low</td>
<td colspan="2">The maximum number of records to buffer per partition.</td>
<td>1000</td>
</tr>
<tr class="row-even"><td>cache.max.bytes.buffering</td>
<td>Medium</td>
<td colspan="2">Maximum number of memory bytes to be used for record caches across all threads.</td>
<td>10485760 bytes</td>
</tr>
<tr class="row-odd"><td>client.id</td>
<td>Medium</td>
<td colspan="2">An ID string to pass to the server when making requests.
(This setting is passed to the consumer/producer clients used internally by Kafka Streams.)</td>
<td>the empty string</td>
</tr>
<tr class="row-even"><td>commit.interval.ms</td>
<td>Low</td>
<td colspan="2">The frequency with which to save the position (offsets in source topics) of tasks.</td>
<td>30000 milliseconds</td>
</tr>
<tr class="row-odd"><td>default.deserialization.exception.handler</td>
<td>Medium</td>
<td colspan="2">Exception handling class that implements the <code class="docutils literal"><span class="pre">DeserializationExceptionHandler</span></code> interface.</td>
<td>30000 milliseconds</td>
</tr>
<tr class="row-even"><td>key.serde</td>
<td>Medium</td>
<td colspan="2">Default serializer/deserializer class for record keys, implements the <code class="docutils literal"><span class="pre">Serde</span></code> interface (see also value.serde).</td>
<td><code class="docutils literal"><span class="pre">Serdes.ByteArray().getClass().getName()</span></code></td>
</tr>
<tr class="row-odd"><td>metric.reporters</td>
<td>Low</td>
<td colspan="2">A list of classes to use as metrics reporters.</td>
<td>the empty list</td>
</tr>
<tr class="row-even"><td>metrics.num.samples</td>
<td>Low</td>
<td colspan="2">The number of samples maintained to compute metrics.</td>
<td>2</td>
</tr>
<tr class="row-odd"><td>metrics.recording.level</td>
<td>Low</td>
<td colspan="2">The highest recording level for metrics.</td>
<td><code class="docutils literal"><span class="pre">INFO</span></code></td>
</tr>
<tr class="row-even"><td>metrics.sample.window.ms</td>
<td>Low</td>
<td colspan="2">The window of time a metrics sample is computed over.</td>
<td>30000 milliseconds</td>
</tr>
<tr class="row-odd"><td>num.standby.replicas</td>
<td>Medium</td>
<td colspan="2">The number of standby replicas for each task.</td>
<td>0</td>
</tr>
<tr class="row-even"><td>num.stream.threads</td>
<td>Medium</td>
<td colspan="2">The number of threads to execute stream processing.</td>
<td>1</td>
</tr>
<tr class="row-odd"><td>partition.grouper</td>
<td>Low</td>
<td colspan="2">Partition grouper class that implements the <code class="docutils literal"><span class="pre">PartitionGrouper</span></code> interface.</td>
<td>See <a class="reference internal" href="#streams-developer-guide-partition-grouper"><span class="std std-ref">Partition Grouper</span></a></td>
</tr>
<tr class="row-even"><td>poll.ms</td>
<td>Low</td>
<td colspan="2">The amount of time in milliseconds to block waiting for input.</td>
<td>100 milliseconds</td>
</tr>
<tr class="row-odd"><td>replication.factor</td>
<td>High</td>
<td colspan="2">The replication factor for changelog topics and repartition topics created by the application.</td>
<td>1</td>
</tr>
<tr class="row-even"><td>state.cleanup.delay.ms</td>
<td>Low</td>
<td colspan="2">The amount of time in milliseconds to wait before deleting state when a partition has migrated.</td>
<td>6000000 milliseconds</td>
</tr>
<tr class="row-odd"><td>state.dir</td>
<td>High</td>
<td colspan="2">Directory location for state stores.</td>
<td><code class="docutils literal"><span class="pre">/var/lib/kafka-streams</span></code></td>
</tr>
<tr class="row-even"><td>timestamp.extractor</td>
<td>Medium</td>
<td colspan="2">Timestamp extractor class that implements the <code class="docutils literal"><span class="pre">TimestampExtractor</span></code> interface.</td>
<td>See <a class="reference internal" href="#streams-developer-guide-timestamp-extractor"><span class="std std-ref">Timestamp Extractor</span></a></td>
</tr>
<tr class="row-odd"><td>value.serde</td>
<td>Medium</td>
<td colspan="2">Default serializer/deserializer class for record values, implements the <code class="docutils literal"><span class="pre">Serde</span></code> interface (see also key.serde).</td>
<td><code class="docutils literal"><span class="pre">Serdes.ByteArray().getClass().getName()</span></code></td>
</tr>
<tr class="row-even"><td>windowstore.changelog.additional.retention.ms</td>
<td>Low</td>
<td colspan="2">Added to a windows maintainMs to ensure data is not deleted from the log prematurely. Allows for clock drift.</td>
<td>86400000 milliseconds = 1 day</td>
</tr>
</tbody>
</table>
<div class="section" id="default-deserialization-exception-handler">
<span id="streams-developer-guide-deh"></span><h4><a class="toc-backref" href="#id7">default.deserialization.exception.handler</a><a class="headerlink" href="#default-deserialization-exception-handler" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>The default deserialization exception handler allows you to manage record exceptions that fail to deserialize. This
can be caused by corrupt data, incorrect serialization logic, or unhandled record types. These exception handlers
are available:</p>
<ul class="simple">
<li><a class="reference external" href="/4.0.0/streams/javadocs/org/apache/kafka/streams/errors/LogAndContinueExceptionHandler.html">LogAndContinueExceptionHandler</a>:
This handler logs the deserialization exception and then signals the processing pipeline to continue processing more records.
This log-and-skip strategy allows Kafka Streams to make progress instead of failing if there are records that fail
to deserialize.</li>
<li><a class="reference external" href="/4.0.0/streams/javadocs/org/apache/kafka/streams/errors/LogAndFailExceptionHandler.html">LogAndFailExceptionHandler</a>.
This handler logs the deserialization exception and then signals the processing pipeline to stop processing more records.</li>
</ul>
</div></blockquote>
</div>
<div class="section" id="default-key-serde">
<h4><a class="toc-backref" href="#id8">default.key.serde</a><a class="headerlink" href="#default-key-serde" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>The default Serializer/Deserializer class for record keys. Serialization and deserialization in Kafka Streams happens
whenever data needs to be materialized, for example:</p>
<blockquote>
<div><ul class="simple">
<li>Whenever data is read from or written to a <em>Kafka topic</em> (e.g., via the <code class="docutils literal"><span class="pre">StreamsBuilder#stream()</span></code> and <code class="docutils literal"><span class="pre">KStream#to()</span></code> methods).</li>
<li>Whenever data is read from or written to a <em>state store</em>.</li>
</ul>
<p>This is discussed in more detail in <a class="reference internal" href="datatypes.html#streams-developer-guide-serdes"><span class="std std-ref">Data types and serialization</span></a>.</p>
</div></blockquote>
</div></blockquote>
</div>
<div class="section" id="default-value-serde">
<h4><a class="toc-backref" href="#id9">default.value.serde</a><a class="headerlink" href="#default-value-serde" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>The default Serializer/Deserializer class for record values. Serialization and deserialization in Kafka Streams
happens whenever data needs to be materialized, for example:</p>
<ul class="simple">
<li>Whenever data is read from or written to a <em>Kafka topic</em> (e.g., via the <code class="docutils literal"><span class="pre">StreamsBuilder#stream()</span></code> and <code class="docutils literal"><span class="pre">KStream#to()</span></code> methods).</li>
<li>Whenever data is read from or written to a <em>state store</em>.</li>
</ul>
<p>This is discussed in more detail in <a class="reference internal" href="datatypes.html#streams-developer-guide-serdes"><span class="std std-ref">Data types and serialization</span></a>.</p>
</div></blockquote>
</div>
<div class="section" id="num-standby-replicas">
<span id="streams-developer-guide-standby-replicas"></span><h4><a class="toc-backref" href="#id10">num.standby.replicas</a><a class="headerlink" href="#num-standby-replicas" title="Permalink to this headline"></a></h4>
<blockquote>
<div>The number of standby replicas. Standby replicas are shadow copies of local state stores. Kafka Streams attempts to create the
specified number of replicas and keep them up to date as long as there are enough instances running.
Standby replicas are used to minimize the latency of task failover. A task that was previously running on a failed instance is
preferred to restart on an instance that has standby replicas so that the local state store restoration process from its
changelog can be minimized. Details about how Kafka Streams makes use of the standby replicas to minimize the cost of
resuming tasks on failover can be found in the <a class="reference internal" href="../architecture.html#streams-architecture-state"><span class="std std-ref">State</span></a> section.</div></blockquote>
</div>
<div class="section" id="num-stream-threads">
<h4><a class="toc-backref" href="#id11">num.stream.threads</a><a class="headerlink" href="#num-stream-threads" title="Permalink to this headline"></a></h4>
<blockquote>
<div>This specifies the number of stream threads in an instance of the Kafka Streams application. The stream processing code runs in these thread.
For more information about Kafka Streams threading model, see <a class="reference internal" href="../architecture.html#streams-architecture-threads"><span class="std std-ref">Threading Model</span></a>.</div></blockquote>
</div>
<div class="section" id="partition-grouper">
<span id="streams-developer-guide-partition-grouper"></span><h4><a class="toc-backref" href="#id12">partition.grouper</a><a class="headerlink" href="#partition-grouper" title="Permalink to this headline"></a></h4>
<blockquote>
<div>A partition grouper creates a list of stream tasks from the partitions of source topics, where each created task is assigned with a group of source topic partitions.
The default implementation provided by Kafka Streams is <a class="reference external" href="/4.0.0/streams/javadocs/org/apache/kafka/streams/processor/DefaultPartitionGrouper.html">DefaultPartitionGrouper</a>.
It assigns each task with one partition for each of the source topic partitions. The generated number of tasks equals the largest
number of partitions among the input topics. Usually an application does not need to customize the partition grouper.</div></blockquote>
</div>
<div class="section" id="replication-factor">
<span id="replication-factor-parm"></span><h4><a class="toc-backref" href="#id13">replication.factor</a><a class="headerlink" href="#replication-factor" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>This specifies the replication factor of internal topics that Kafka Streams creates when local states are used or a stream is
repartitioned for aggregation. Replication is important for fault tolerance. Without replication even a single broker failure
may prevent progress of the stream processing application. It is recommended to use a similar replication factor as source topics.</p>
<dl class="docutils">
<dt>Recommendation:</dt>
<dd>Increase the replication factor to 3 to ensure that the internal Kafka Streams topic can tolerate up to 2 broker failures.
Note that you will require more storage space as well (3 times more with the replication factor of 3).</dd>
</dl>
</div></blockquote>
</div>
<div class="section" id="state-dir">
<h4><a class="toc-backref" href="#id14">state.dir</a><a class="headerlink" href="#state-dir" title="Permalink to this headline"></a></h4>
<blockquote>
<div>The state directory. Kafka Streams persists local states under the state directory. Each application has a subdirectory on its hosting
machine that is located under the state directory. The name of the subdirectory is the application ID. The state stores associated
with the application are created under this subdirectory.</div></blockquote>
</div>
<div class="section" id="timestamp-extractor">
<span id="streams-developer-guide-timestamp-extractor"></span><h4><a class="toc-backref" href="#id15">timestamp.extractor</a><a class="headerlink" href="#timestamp-extractor" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>A timestamp extractor pulls a timestamp from an instance of <a class="reference external" href="/4.0.0/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecord.html">ConsumerRecord</a>.
Timestamps are used to control the progress of streams.</p>
<p>The default extractor is
<a class="reference external" href="/4.0.0/streams/javadocs/org/apache/kafka/streams/processor/FailOnInvalidTimestamp.html">FailOnInvalidTimestamp</a>.
This extractor retrieves built-in timestamps that are automatically embedded into Kafka messages by the Kafka producer
client since
<a class="reference external" href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+timestamps+to+Kafka+message">Kafka version 0.10</a>.
Depending on the setting of Kafka&#8217;s server-side <code class="docutils literal"><span class="pre">log.message.timestamp.type</span></code> broker and <code class="docutils literal"><span class="pre">message.timestamp.type</span></code> topic parameters,
this extractor provides you with:</p>
<ul class="simple">
<li><strong>event-time</strong> processing semantics if <code class="docutils literal"><span class="pre">log.message.timestamp.type</span></code> is set to <code class="docutils literal"><span class="pre">CreateTime</span></code> aka &#8220;producer time&#8221;
(which is the default). This represents the time when a Kafka producer sent the original message. If you use Kafka&#8217;s
official producer client, the timestamp represents milliseconds since the epoch.</li>
<li><strong>ingestion-time</strong> processing semantics if <code class="docutils literal"><span class="pre">log.message.timestamp.type</span></code> is set to <code class="docutils literal"><span class="pre">LogAppendTime</span></code> aka &#8220;broker
time&#8221;. This represents the time when the Kafka broker received the original message, in milliseconds since the epoch.</li>
</ul>
<p>The <code class="docutils literal"><span class="pre">FailOnInvalidTimestamp</span></code> extractor throws an exception if a record contains an invalid (i.e. negative) built-in
timestamp, because Kafka Streams would not process this record but silently drop it. Invalid built-in timestamps can
occur for various reasons: if for example, you consume a topic that is written to by pre-0.10 Kafka producer clients
or by third-party producer clients that don&#8217;t support the new Kafka 0.10 message format yet; another situation where
this may happen is after upgrading your Kafka cluster from <code class="docutils literal"><span class="pre">0.9</span></code> to <code class="docutils literal"><span class="pre">0.10</span></code>, where all the data that was generated
with <code class="docutils literal"><span class="pre">0.9</span></code> does not include the <code class="docutils literal"><span class="pre">0.10</span></code> message timestamps.</p>
<p>If you have data with invalid timestamps and want to process it, then there are two alternative extractors available.
Both work on built-in timestamps, but handle invalid timestamps differently.</p>
<ul class="simple">
<li><a class="reference external" href="/4.0.0/streams/javadocs/org/apache/kafka/streams/processor/LogAndSkipOnInvalidTimestamp.html">LogAndSkipOnInvalidTimestamp</a>:
This extractor logs a warn message and returns the invalid timestamp to Kafka Streams, which will not process but
silently drop the record.
This log-and-skip strategy allows Kafka Streams to make progress instead of failing if there are records with an
invalid built-in timestamp in your input data.</li>
<li><a class="reference external" href="/4.0.0/streams/javadocs/org/apache/kafka/streams/processor/UsePreviousTimeOnInvalidTimestamp.html">UsePreviousTimeOnInvalidTimestamp</a>.
This extractor returns the record&#8217;s built-in timestamp if it is valid (i.e. not negative). If the record does not
have a valid built-in timestamps, the extractor returns the previously extracted valid timestamp from a record of the
same topic partition as the current record as a timestamp estimation. In case that no timestamp can be estimated, it
throws an exception.</li>
</ul>
<p>Another built-in extractor is
<a class="reference external" href="/4.0.0/streams/javadocs/org/apache/kafka/streams/processor/WallclockTimestampExtractor.html">WallclockTimestampExtractor</a>.
This extractor does not actually &#8220;extract&#8221; a timestamp from the consumed record but rather returns the current time in
milliseconds from the system clock (think: <code class="docutils literal"><span class="pre">System.currentTimeMillis()</span></code>), which effectively means Streams will operate
on the basis of the so-called <strong>processing-time</strong> of events.</p>
<p>You can also provide your own timestamp extractors, for instance to retrieve timestamps embedded in the payload of
messages. If you cannot extract a valid timestamp, you can either throw an exception, return a negative timestamp, or
estimate a timestamp. Returning a negative timestamp will result in data loss &#8211; the corresponding record will not be
processed but silently dropped. If you want to estimate a new timestamp, you can use the value provided via
<code class="docutils literal"><span class="pre">previousTimestamp</span></code> (i.e., a Kafka Streams timestamp estimation). Here is an example of a custom
<code class="docutils literal"><span class="pre">TimestampExtractor</span></code> implementation:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerRecord</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.processor.TimestampExtractor</span><span class="o">;</span>
<span class="c1">// Extracts the embedded timestamp of a record (giving you &quot;event-time&quot; semantics).</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyEventTimeExtractor</span> <span class="kd">implements</span> <span class="n">TimestampExtractor</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">long</span> <span class="nf">extract</span><span class="o">(</span><span class="kd">final</span> <span class="n">ConsumerRecord</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">record</span><span class="o">,</span> <span class="kd">final</span> <span class="kt">long</span> <span class="n">previousTimestamp</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// `Foo` is your own custom class, which we assume has a method that returns</span>
<span class="c1">// the embedded timestamp (milliseconds since midnight, January 1, 1970 UTC).</span>
<span class="kt">long</span> <span class="n">timestamp</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="o">;</span>
<span class="kd">final</span> <span class="n">Foo</span> <span class="n">myPojo</span> <span class="o">=</span> <span class="o">(</span><span class="n">Foo</span><span class="o">)</span> <span class="n">record</span><span class="o">.</span><span class="na">value</span><span class="o">();</span>
<span class="k">if</span> <span class="o">(</span><span class="n">myPojo</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="n">timestamp</span> <span class="o">=</span> <span class="n">myPojo</span><span class="o">.</span><span class="na">getTimestampInMillis</span><span class="o">();</span>
<span class="o">}</span>
<span class="k">if</span> <span class="o">(</span><span class="n">timestamp</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// Invalid timestamp! Attempt to estimate a new timestamp,</span>
<span class="c1">// otherwise fall back to wall-clock time (processing-time).</span>
<span class="k">if</span> <span class="o">(</span><span class="n">previousTimestamp</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">previousTimestamp</span><span class="o">;</span>
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
</div>
<p>You would then define the custom timestamp extractor in your Streams configuration as follows:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Properties</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
<span class="n">Properties</span> <span class="n">streamsConfiguration</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">streamsConfiguration</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">TIMESTAMP_EXTRACTOR_CLASS_CONFIG</span><span class="o">,</span> <span class="n">MyEventTimeExtractor</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
</pre></div>
</div>
</div></blockquote>
</div>
</div>
<div class="section" id="kafka-consumers-and-producer-configuration-parameters">
<h3><a class="toc-backref" href="#id16">Kafka consumers and producer configuration parameters</a><a class="headerlink" href="#kafka-consumers-and-producer-configuration-parameters" title="Permalink to this headline"></a></h3>
<p>You can specify parameters for the Kafka <a class="reference external" href="/4.0.0/clients/javadocs/org/apache/kafka/clients/consumer/package-summary.html">consumers</a> and <a class="reference external" href="/4.0.0/clients/javadocs/org/apache/kafka/clients/producer/package-summary.html">producers</a> that are used internally. The consumer and producer settings
are defined by specifying parameters in a <code class="docutils literal"><span class="pre">StreamsConfig</span></code> instance.</p>
<p>In this example, the Kafka <a class="reference external" href="/4.0.0/clients/javadocs/org/apache/kafka/clients/consumer/ConsumerConfig.html#SESSION_TIMEOUT_MS_CONFIG">consumer session timeout</a> is configured to be 60000 milliseconds in the Streams settings:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Example of a &quot;normal&quot; setting for Kafka Streams</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">&quot;kafka-broker-01:9092&quot;</span><span class="o">);</span>
<span class="c1">// Customize the Kafka consumer settings of your Streams application</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">ConsumerConfig</span><span class="o">.</span><span class="na">SESSION_TIMEOUT_MS_CONFIG</span><span class="o">,</span> <span class="mi">60000</span><span class="o">);</span>
<span class="n">StreamsConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StreamsConfig</span><span class="o">(</span><span class="n">streamsSettings</span><span class="o">);</span>
</pre></div>
</div>
<div class="section" id="naming">
<h4><a class="toc-backref" href="#id17">Naming</a><a class="headerlink" href="#naming" title="Permalink to this headline"></a></h4>
<p>Some consumer and producer configuration parameters use the same parameter name. For example, <code class="docutils literal"><span class="pre">send.buffer.bytes</span></code> and
<code class="docutils literal"><span class="pre">receive.buffer.bytes</span></code> are used to configure TCP buffers; <code class="docutils literal"><span class="pre">request.timeout.ms</span></code> and <code class="docutils literal"><span class="pre">retry.backoff.ms</span></code> control retries
for client request. You can avoid duplicate names by prefix parameter names with <code class="docutils literal"><span class="pre">consumer.</span></code> or <code class="docutils literal"><span class="pre">producer</span></code> (e.g., <code class="docutils literal"><span class="pre">consumer.send.buffer.bytes</span></code> and <code class="docutils literal"><span class="pre">producer.send.buffer.bytes</span></code>).</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// same value for consumer and producer</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;value&quot;</span><span class="o">);</span>
<span class="c1">// different values for consumer and producer</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;consumer.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;consumer-value&quot;</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;producer.PARAMETER_NAME&quot;</span><span class="o">,</span> <span class="s">&quot;producer-value&quot;</span><span class="o">);</span>
<span class="c1">// alternatively, you can use</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">consumerPrefix</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;consumer-value&quot;</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StremasConfig</span><span class="o">.</span><span class="na">producerConfig</span><span class="o">(</span><span class="s">&quot;PARAMETER_NAME&quot;</span><span class="o">),</span> <span class="s">&quot;producer-value&quot;</span><span class="o">);</span>
</pre></div>
</div>
</div>
<div class="section" id="default-values">
<h4><a class="toc-backref" href="#id18">Default Values</a><a class="headerlink" href="#default-values" title="Permalink to this headline"></a></h4>
<p>Kafka Streams uses different default values for some of the underlying client configs, which are summarized below. For detailed descriptions
of these configs, see <a class="reference external" href="http://kafka.apache.org/0100/documentation.html#producerconfigs">Producer Configs</a>
and <a class="reference external" href="http://kafka.apache.org/0100/documentation.html#newconsumerconfigs">Consumer Configs</a>.</p>
<table border="1" class="non-scrolling-table docutils">
<colgroup>
<col width="50%" />
<col width="19%" />
<col width="31%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Parameter Name</th>
<th class="head">Corresponding Client</th>
<th class="head">Streams Default</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>auto.offset.reset</td>
<td>Consumer</td>
<td>earliest</td>
</tr>
<tr class="row-odd"><td>enable.auto.commit</td>
<td>Consumer</td>
<td>false</td>
</tr>
<tr class="row-even"><td>linger.ms</td>
<td>Producer</td>
<td>100</td>
</tr>
<tr class="row-odd"><td>max.poll.interval.ms</td>
<td>Consumer</td>
<td>Integer.MAX_VALUE</td>
</tr>
<tr class="row-even"><td>max.poll.records</td>
<td>Consumer</td>
<td>1000</td>
</tr>
<tr class="row-odd"><td>retries</td>
<td>Producer</td>
<td>10</td>
</tr>
<tr class="row-even"><td>rocksdb.config.setter</td>
<td>Consumer</td>
<td>&nbsp;</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="enable-auto-commit">
<span id="streams-developer-guide-consumer-auto-commit"></span><h4><a class="toc-backref" href="#id19">enable.auto.commit</a><a class="headerlink" href="#enable-auto-commit" title="Permalink to this headline"></a></h4>
<blockquote>
<div>The consumer auto commit. To guarantee at-least-once processing semantics and turn off auto commits, Kafka Streams overrides this consumer config
value to <code class="docutils literal"><span class="pre">false</span></code>. Consumers will only commit explicitly via <em>commitSync</em> calls when the Kafka Streams library or a user decides
to commit the current processing state.</div></blockquote>
</div>
<div class="section" id="rocksdb-config-setter">
<span id="streams-developer-guide-rocksdb-config"></span><h4><a class="toc-backref" href="#id20">rocksdb.config.setter</a><a class="headerlink" href="#rocksdb-config-setter" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>The RocksDB configuration. Kafka Streams uses RocksDB as the default storage engine for persistent stores. To change the default
configuration for RocksDB, implement <code class="docutils literal"><span class="pre">RocksDBConfigSetter</span></code> and provide your custom class via <a class="reference external" href="/current/streams/javadocs/org/apache/kafka/streams/state/RocksDBConfigSetter.html">rocksdb.config.setter</a>.</p>
<p>Here is an example that adjusts the memory size consumed by RocksDB.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span> <span class="kd">public</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">CustomRocksDBConfig</span> <span class="kd">implements</span> <span class="n">RocksDBConfigSetter</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">setConfig</span><span class="o">(</span><span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Options</span> <span class="n">options</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">configs</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// See #1 below.</span>
<span class="n">BlockBasedTableConfig</span> <span class="n">tableConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">org</span><span class="o">.</span><span class="na">rocksdb</span><span class="o">.</span><span class="na">BlockBasedTableConfig</span><span class="o">();</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockCacheSize</span><span class="o">(</span><span class="mi">16</span> <span class="o">*</span> <span class="mi">1024</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
<span class="c1">// See #2 below.</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setBlockSize</span><span class="o">(</span><span class="mi">16</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
<span class="c1">// See #3 below.</span>
<span class="n">tableConfig</span><span class="o">.</span><span class="na">setCacheIndexAndFilterBlocks</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span>
<span class="n">options</span><span class="o">.</span><span class="na">setTableFormatConfig</span><span class="o">(</span><span class="n">tableConfig</span><span class="o">);</span>
<span class="c1">// See #4 below.</span>
<span class="n">options</span><span class="o">.</span><span class="na">setMaxWriteBufferNumber</span><span class="o">(</span><span class="mi">2</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">streamsConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">ROCKSDB_CONFIG_SETTER_CLASS_CONFIG</span><span class="o">,</span> <span class="n">CustomRocksDBConfig</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
</pre></div>
</div>
<dl class="docutils">
<dt>Notes for example:</dt>
<dd><ol class="first last arabic simple">
<li><code class="docutils literal"><span class="pre">BlockBasedTableConfig</span> <span class="pre">tableConfig</span> <span class="pre">=</span> <span class="pre">new</span> <span class="pre">org.rocksdb.BlockBasedTableConfig();</span></code> Reduce block cache size from the default, shown <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L81">here</a>, as the total number of store RocksDB databases is partitions (40) * segments (3) = 120.</li>
<li><code class="docutils literal"><span class="pre">tableConfig.setBlockSize(16</span> <span class="pre">*</span> <span class="pre">1024L);</span></code> Modify the default <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L82">block size</a> per these instructions from the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks">RocksDB GitHub</a>.</li>
<li><code class="docutils literal"><span class="pre">tableConfig.setCacheIndexAndFilterBlocks(true);</span></code> Do not let the index and filter blocks grow unbounded. For more information, see the <a class="reference external" href="https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-and-filter-blocks">RocksDB GitHub</a>.</li>
<li><code class="docutils literal"><span class="pre">options.setMaxWriteBufferNumber(2);</span></code> See the advanced options in the <a class="reference external" href="https://github.com/facebook/rocksdb/blob/8dee8cad9ee6b70fd6e1a5989a8156650a70c04f/include/rocksdb/advanced_options.h#L103">RocksDB GitHub</a>.</li>
</ol>
</dd>
</dl>
</div></blockquote>
</div>
</div>
<div class="section" id="recommended-configuration-parameters-for-resiliency">
<h3><a class="toc-backref" href="#id21">Recommended configuration parameters for resiliency</a><a class="headerlink" href="#recommended-configuration-parameters-for-resiliency" title="Permalink to this headline"></a></h3>
<p>There are several Kafka and Kafka Streams configuration options that need to be configured explicitly for resiliency in face of broker failures:</p>
<table border="1" class="non-scrolling-table docutils">
<colgroup>
<col width="22%" />
<col width="19%" />
<col width="10%" />
<col width="49%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Parameter Name</th>
<th class="head">Corresponding Client</th>
<th class="head">Default value</th>
<th class="head">Consider setting to</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>acks</td>
<td>Producer</td>
<td><code class="docutils literal"><span class="pre">acks=1</span></code></td>
<td><code class="docutils literal"><span class="pre">acks=all</span></code></td>
</tr>
<tr class="row-odd"><td>replication.factor</td>
<td>Streams</td>
<td><code class="docutils literal"><span class="pre">1</span></code></td>
<td><code class="docutils literal"><span class="pre">3</span></code></td>
</tr>
<tr class="row-even"><td>min.insync.replicas</td>
<td>Broker</td>
<td><code class="docutils literal"><span class="pre">1</span></code></td>
<td><code class="docutils literal"><span class="pre">2</span></code></td>
</tr>
</tbody>
</table>
<p>Increasing the replication factor to 3 ensures that the internal Kafka Streams topic can tolerate up to 2 broker failures. Changing the acks setting to &#8220;all&#8221;
guarantees that a record will not be lost as long as one replica is alive. The tradeoff from moving to the default values to the recommended ones is
that some performance and more storage space (3x with the replication factor of 3) are sacrificed for more resiliency.</p>
<div class="section" id="acks">
<h4><a class="toc-backref" href="#id22">acks</a><a class="headerlink" href="#acks" title="Permalink to this headline"></a></h4>
<blockquote>
<div><p>The number of acknowledgments that the leader must have received before considering a request complete. This controls
the durability of records that are sent. The possible values are:</p>
<ul class="simple">
<li><code class="docutils literal"><span class="pre">acks=0</span></code> The producer does not wait for acknowledgment from the server and the record is immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the <code class="docutils literal"><span class="pre">retries</span></code> configuration will not take effect (as the client won&#8217;t generally know of any failures). The offset returned for each record will always be set to <code class="docutils literal"><span class="pre">-1</span></code>.</li>
<li><code class="docutils literal"><span class="pre">acks=1</span></code> The leader writes the record to its local log and responds without waiting for full acknowledgement from all followers. If the leader immediately fails after acknowledging the record, but before the followers have replicated it, then the record will be lost.</li>
<li><code class="docutils literal"><span class="pre">acks=all</span></code> The leader waits for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost if there is at least one in-sync replica alive. This is the strongest available guarantee.</li>
</ul>
<p>For more information, see the <a class="reference external" href="https://kafka.apache.org/documentation/#producerconfigs">Kafka Producer documentation</a>.</p>
</div></blockquote>
</div>
<div class="section" id="id2">
<h4><a class="toc-backref" href="#id23">replication.factor</a><a class="headerlink" href="#id2" title="Permalink to this headline"></a></h4>
<blockquote>
<div>See the <a class="reference internal" href="#replication-factor-parm"><span class="std std-ref">description here</span></a>.</div></blockquote>
<p>You define these settings via <code class="docutils literal"><span class="pre">StreamsConfig</span></code>:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsSettings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">REPLICATION_FACTOR_CONFIG</span><span class="o">,</span> <span class="mi">3</span><span class="o">);</span>
<span class="n">streamsSettings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">producerPrefix</span><span class="o">(</span><span class="n">ProducerConfig</span><span class="o">.</span><span class="na">ACKS_CONFIG</span><span class="o">),</span> <span class="s">&quot;all&quot;</span><span class="o">);</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">A future version of Kafka Streams will allow developers to set their own app-specific configuration settings through
<code class="docutils literal"><span class="pre">StreamsConfig</span></code> as well, which can then be accessed through
<a class="reference external" href="/4.0.0/streams/javadocs/org/apache/kafka/streams/processor/ProcessorContext.html">ProcessorContext</a>.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/write-streams" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/dsl-api" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@ -0,0 +1,223 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="data-types-and-serialization">
<span id="streams-developer-guide-serdes"></span><h1>Data Types and Serialization<a class="headerlink" href="#data-types-and-serialization" title="Permalink to this headline"></a></h1>
<p>Every Kafka Streams application must provide SerDes (Serializer/Deserializer) for the data types of record keys and record values (e.g. <code class="docutils literal"><span class="pre">java.lang.String</span></code>) to materialize the data when necessary. Operations that require such SerDes information include: <code class="docutils literal"><span class="pre">stream()</span></code>, <code class="docutils literal"><span class="pre">table()</span></code>, <code class="docutils literal"><span class="pre">to()</span></code>, <code class="docutils literal"><span class="pre">through()</span></code>, <code class="docutils literal"><span class="pre">groupByKey()</span></code>, <code class="docutils literal"><span class="pre">groupBy()</span></code>.</p>
<p>You can provide SerDes by using either of these methods:</p>
<ul class="simple">
<li>By setting default SerDes via a <code class="docutils literal"><span class="pre">StreamsConfig</span></code> instance.</li>
<li>By specifying explicit SerDes when calling the appropriate API methods, thus overriding the defaults.</li>
</ul>
<p class="topic-title first">Table of Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#configuring-serdes" id="id1">Configuring SerDes</a></li>
<li><a class="reference internal" href="#overriding-default-serdes" id="id2">Overriding default SerDes</a></li>
<li><a class="reference internal" href="#available-serdes" id="id3">Available SerDes</a><ul>
<li><a class="reference internal" href="#primitive-and-basic-types" id="id4">Primitive and basic types</a></li>
<li><a class="reference internal" href="#avro" id="id5">Avro</a></li>
<li><a class="reference internal" href="#json" id="id6">JSON</a></li>
<li><a class="reference internal" href="#further-serdes" id="id7">Further serdes</a></li>
</ul>
<div class="section" id="configuring-serdes">
<h2>Configuring SerDes<a class="headerlink" href="#configuring-serdes" title="Permalink to this headline"></a></h2>
<p>SerDes specified in the Streams configuration via <code class="docutils literal"><span class="pre">StreamsConfig</span></code> are used as the default in your Kafka Streams application.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Default serde for keys of data records (here: built-in serde for String type)</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">KEY_SERDE_CLASS_CONFIG</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">().</span><span class="na">getClass</span><span class="o">().</span><span class="na">getName</span><span class="o">());</span>
<span class="c1">// Default serde for values of data records (here: built-in serde for Long type)</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">VALUE_SERDE_CLASS_CONFIG</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">().</span><span class="na">getClass</span><span class="o">().</span><span class="na">getName</span><span class="o">());</span>
<span class="n">StreamsConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StreamsConfig</span><span class="o">(</span><span class="n">settings</span><span class="o">);</span>
</pre></div>
</div>
</div>
<div class="section" id="overriding-default-serdes">
<h2>Overriding default SerDes<a class="headerlink" href="#overriding-default-serdes" title="Permalink to this headline"></a></h2>
<p>You can also specify SerDes explicitly by passing them to the appropriate API methods, which overrides the default serde settings:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serde</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
<span class="kd">final</span> <span class="n">Serde</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">stringSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">();</span>
<span class="kd">final</span> <span class="n">Serde</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">longSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">();</span>
<span class="c1">// The stream userCountByRegion has type `String` for record keys (for region)</span>
<span class="c1">// and type `Long` for record values (for user counts).</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">userCountByRegion</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">userCountByRegion</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">&quot;RegionCountsTopic&quot;</span><span class="o">,</span> <span class="n">Produced</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">longSerde</span><span class="o">));</span>
</pre></div>
</div>
<p>If you want to override serdes selectively, i.e., keep the defaults for some fields, then don&#8217;t specify the serde whenever you want to leverage the default settings:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serde</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.common.serialization.Serdes</span><span class="o">;</span>
<span class="c1">// Use the default serializer for record keys (here: region as String) by not specifying the key serde,</span>
<span class="c1">// but override the default serializer for record values (here: userCount as Long).</span>
<span class="kd">final</span> <span class="n">Serde</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">longSerde</span> <span class="o">=</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">();</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">userCountByRegion</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">userCountByRegion</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">&quot;RegionCountsTopic&quot;</span><span class="o">,</span> <span class="n">Produced</span><span class="o">.</span><span class="na">valueSerde</span><span class="o">(</span><span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">()));</span>
</pre></div>
</div>
</div>
<div class="section" id="available-serdes">
<span id="streams-developer-guide-serdes-available"></span><h2>Available SerDes<a class="headerlink" href="#available-serdes" title="Permalink to this headline"></a></h2>
<div class="section" id="primitive-and-basic-types">
<h3>Primitive and basic types<a class="headerlink" href="#primitive-and-basic-types" title="Permalink to this headline"></a></h3>
<p>Apache Kafka includes several built-in serde implementations for Java primitives and basic types such as <code class="docutils literal"><span class="pre">byte[]</span></code> in
its <code class="docutils literal"><span class="pre">kafka-clients</span></code> Maven artifact:</p>
<div class="highlight-xml"><div class="highlight"><pre><span></span><span class="nt">&lt;dependency&gt;</span>
<span class="nt">&lt;groupId&gt;</span>org.apache.kafka<span class="nt">&lt;/groupId&gt;</span>
<span class="nt">&lt;artifactId&gt;</span>kafka-clients<span class="nt">&lt;/artifactId&gt;</span>
<span class="nt">&lt;version&gt;</span>1.0.0-cp1<span class="nt">&lt;/version&gt;</span>
<span class="nt">&lt;/dependency&gt;</span>
</pre></div>
</div>
<p>This artifact provides the following serde implementations under the package <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/clients/src/main/java/org/apache/kafka/common/serialization">org.apache.kafka.common.serialization</a>, which you can leverage when e.g., defining default serializers in your Streams configuration.</p>
<table border="1" class="docutils">
<colgroup>
<col width="17%" />
<col width="83%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Data type</th>
<th class="head">Serde</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>byte[]</td>
<td><code class="docutils literal"><span class="pre">Serdes.ByteArray()</span></code>, <code class="docutils literal"><span class="pre">Serdes.Bytes()</span></code> (see tip below)</td>
</tr>
<tr class="row-odd"><td>ByteBuffer</td>
<td><code class="docutils literal"><span class="pre">Serdes.ByteBuffer()</span></code></td>
</tr>
<tr class="row-even"><td>Double</td>
<td><code class="docutils literal"><span class="pre">Serdes.Double()</span></code></td>
</tr>
<tr class="row-odd"><td>Integer</td>
<td><code class="docutils literal"><span class="pre">Serdes.Integer()</span></code></td>
</tr>
<tr class="row-even"><td>Long</td>
<td><code class="docutils literal"><span class="pre">Serdes.Long()</span></code></td>
</tr>
<tr class="row-odd"><td>String</td>
<td><code class="docutils literal"><span class="pre">Serdes.String()</span></code></td>
</tr>
</tbody>
</table>
<div class="admonition tip">
<p class="first admonition-title">Tip</p>
<p class="last"><a class="reference external" href="https://github.com/apache/kafka/blob/1.0/clients/src/main/java/org/apache/kafka/common/utils/Bytes.java">Bytes</a> is a wrapper for Java&#8217;s <code class="docutils literal"><span class="pre">byte[]</span></code> (byte array) that supports proper equality and ordering semantics. You may want to consider using <code class="docutils literal"><span class="pre">Bytes</span></code> instead of <code class="docutils literal"><span class="pre">byte[]</span></code> in your applications.</p>
</div>
</div>
<div class="section" id="json">
<h3>JSON<a class="headerlink" href="#json" title="Permalink to this headline"></a></h3>
<p>The code examples of Kafka Streams also include a basic serde implementation for JSON:</p>
<ul class="simple">
<li><a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/examples/src/main/java/org/apache/kafka/streams/examples/pageview/JsonPOJOSerializer.java">JsonPOJOSerializer</a></li>
<li><a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/examples/src/main/java/org/apache/kafka/streams/examples/pageview/JsonPOJODeserializer.java">JsonPOJODeserializer</a></li>
</ul>
<p>You can construct a unified JSON serde from the <code class="docutils literal"><span class="pre">JsonPOJOSerializer</span></code> and <code class="docutils literal"><span class="pre">JsonPOJODeserializer</span></code> via
<code class="docutils literal"><span class="pre">Serdes.serdeFrom(&lt;serializerInstance&gt;,</span> <span class="pre">&lt;deserializerInstance&gt;)</span></code>. The
<a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/examples/src/main/java/org/apache/kafka/streams/examples/pageview/PageViewTypedDemo.java">PageViewTypedDemo</a>
example demonstrates how to use this JSON serde.</p>
</div>
<div class="section" id="implementing-custom-serdes">
<span id="streams-developer-guide-serdes-custom"></span><h2>Implementing custom SerDes<a class="headerlink" href="#implementing-custom-serdes" title="Permalink to this headline"></a></h2>
<p>If you need to implement custom SerDes, your best starting point is to take a look at the source code references of
existing SerDes (see previous section). Typically, your workflow will be similar to:</p>
<ol class="arabic simple">
<li>Write a <em>serializer</em> for your data type <code class="docutils literal"><span class="pre">T</span></code> by implementing
<a class="reference external" href="https://github.com/apache/kafka/blob/1.0/clients/src/main/java/org/apache/kafka/common/serialization/Serializer.java">org.apache.kafka.common.serialization.Serializer</a>.</li>
<li>Write a <em>deserializer</em> for <code class="docutils literal"><span class="pre">T</span></code> by implementing
<a class="reference external" href="https://github.com/apache/kafka/blob/1.0/clients/src/main/java/org/apache/kafka/common/serialization/Deserializer.java">org.apache.kafka.common.serialization.Deserializer</a>.</li>
<li>Write a <em>serde</em> for <code class="docutils literal"><span class="pre">T</span></code> by implementing
<a class="reference external" href="https://github.com/apache/kafka/blob/1.0/clients/src/main/java/org/apache/kafka/common/serialization/Serde.java">org.apache.kafka.common.serialization.Serde</a>,
which you either do manually (see existing SerDes in the previous section) or by leveraging helper functions in
<a class="reference external" href="https://github.com/apache/kafka/blob/1.0/clients/src/main/java/org/apache/kafka/common/serialization/Serdes.java">Serdes</a>
such as <code class="docutils literal"><span class="pre">Serdes.serdeFrom(Serializer&lt;T&gt;,</span> <span class="pre">Deserializer&lt;T&gt;)</span></code>.</li>
</ol>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/processor-api" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/interactive-queries" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,102 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<h1>Developer Guide for Kafka Streams</h1>
<div class="sub-nav-sticky">
<div class="sticky-top">
<div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div>
</div>
</div>
<div class="section" id="developer-guide">
<!-- span id="streams-developer-guide"></span><h1>Developer Guide<a class="headerlink" href="#developer-guide" title="Permalink to this headline"></a></h1 -->
<p>This developer guide describes how to write, configure, and execute a Kafka Streams application.</p>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="write-streams.html">Writing a Streams Application</a></li>
<li class="toctree-l1"><a class="reference internal" href="config-streams.html">Configuring a Streams Application</a></li>
<li class="toctree-l1"><a class="reference internal" href="dsl-api.html">Streams DSL</a></li>
<li class="toctree-l1"><a class="reference internal" href="processor-api.html">Processor API</a></li>
<li class="toctree-l1"><a class="reference internal" href="datatypes.html">Data Types and Serialization</a></li>
<li class="toctree-l1"><a class="reference internal" href="interactive-queries.html">Interactive Queries</a></li>
<li class="toctree-l1"><a class="reference internal" href="memory-mgmt.html">Memory Management</a></li>
<li class="toctree-l1"><a class="reference internal" href="running-app.html">Running Streams Applications</a></li>
<li class="toctree-l1"><a class="reference internal" href="manage-topics.html">Managing Streams Application Topics</a></li>
<li class="toctree-l1"><a class="reference internal" href="security.html">Streams Security</a></li>
<li class="toctree-l1"><a class="reference internal" href="app-reset-tool.html">Application Reset Tool</a></li>
</ul>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/write-streams" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@ -0,0 +1,530 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="interactive-queries">
<span id="streams-developer-guide-interactive-queries"></span><h1>Interactive Queries<a class="headerlink" href="#interactive-queries" title="Permalink to this headline"></a></h1>
<p>Interactive queries allow you to leverage the state of your application from outside your application. The Kafka Streams API enables your applications to be queryable.</p>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first">Table of Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#querying-local-state-stores-for-an-app-instance" id="id3">Querying local state stores for an app instance</a><ul>
<li><a class="reference internal" href="#querying-local-key-value-stores" id="id4">Querying local key-value stores</a></li>
<li><a class="reference internal" href="#querying-local-window-stores" id="id5">Querying local window stores</a></li>
<li><a class="reference internal" href="#querying-local-custom-state-stores" id="id6">Querying local custom state stores</a></li>
</ul>
</li>
<li><a class="reference internal" href="#querying-remote-state-stores-for-the-entire-app" id="id7">Querying remote state stores for the entire app</a><ul>
<li><a class="reference internal" href="#adding-an-rpc-layer-to-your-application" id="id8">Adding an RPC layer to your application</a></li>
<li><a class="reference internal" href="#exposing-the-rpc-endpoints-of-your-application" id="id9">Exposing the RPC endpoints of your application</a></li>
<li><a class="reference internal" href="#discovering-and-accessing-application-instances-and-their-local-state-stores" id="id10">Discovering and accessing application instances and their local state stores</a></li>
</ul>
</li>
<li><a class="reference internal" href="#demo-applications" id="id11">Demo applications</a></li>
</ul>
</div>
<p>The full state of your application is typically <a class="reference internal" href="../architecture.html#streams-architecture-state"><span class="std std-ref">split across many distributed instances of your application</span></a>, and across many state stores that are managed locally by these application instances.</p>
<div class="figure align-center">
<a class="reference internal image-reference" href="../../../images/streams-interactive-queries-03.png"><img alt="../../../images/streams-interactive-queries-03.png" src="../../../images/streams-interactive-queries-03.png" style="width: 400pt; height: 400pt;" /></a>
</div>
<p>There are local and remote components to interactively querying the state of your application.</p>
<dl class="docutils">
<dt>Local state</dt>
<dd>An application instance can query the locally managed portion of the state and directly query its own local state stores. You can use the corresponding local data in other parts of your application code, as long as it doesn&#8217;t required calling the Kafka Streams API. Querying state stores is always read-only to guarantee that the underlying state stores will never be mutated out-of-band (e.g., you cannot add new entries). State stores should only be mutated by the corresponding processor topology and the input data it operates on. For more information, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-stores"><span class="std std-ref">Querying local state stores for an app instance</span></a>.</dd>
<dt>Remote state</dt>
<dd><p class="first">To query the full state of your application, you must connect the various fragments of the state, including:</p>
<ul class="simple">
<li>query local state stores</li>
<li>discover all running instances of your application in the network and their state stores</li>
<li>communicate with these instances over the network (e.g., an RPC layer)</li>
</ul>
<p class="last">Connecting these fragments enables communication between instances of the same app and communication from other applications for interactive queries. For more information, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-discovery"><span class="std std-ref">Querying remote state stores for the entire app</span></a>.</p>
</dd>
</dl>
<p>Kafka Streams natively provides all of the required functionality for interactively querying the state of your application, except if you want to expose the full state of your application via interactive queries. To allow application instances to communicate over the network, you must add a Remote Procedure Call (RPC) layer to your application (e.g., REST API).</p>
<p>This table shows the Kafka Streams native communication support for various procedures.</p>
<table border="1" class="docutils">
<colgroup>
<col width="42%" />
<col width="27%" />
<col width="31%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Procedure</th>
<th class="head">Application instance</th>
<th class="head">Entire application</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>Query local state stores of an app instance</td>
<td>Supported</td>
<td>Supported</td>
</tr>
<tr class="row-odd"><td>Make an app instance discoverable to others</td>
<td>Supported</td>
<td>Supported</td>
</tr>
<tr class="row-even"><td>Discover all running app instances and their state stores</td>
<td>Supported</td>
<td>Supported</td>
</tr>
<tr class="row-odd"><td>Communicate with app instances over the network (RPC)</td>
<td>Supported</td>
<td>Not supported (you must configure)</td>
</tr>
</tbody>
</table>
<div class="section" id="querying-local-state-stores-for-an-app-instance">
<span id="streams-developer-guide-interactive-queries-local-stores"></span><h2><a class="toc-backref" href="#id3">Querying local state stores for an app instance</a><a class="headerlink" href="#querying-local-state-stores-for-an-app-instance" title="Permalink to this headline"></a></h2>
<p>A Kafka Streams application typically runs on multiple instances. The state that is locally available on any given instance is only a subset of the <a class="reference internal" href="../architecture.html#streams-architecture-state"><span class="std std-ref">application&#8217;s entire state</span></a>. Querying the local stores on an instance will only return data locally available on that particular instance.</p>
<p>The method <code class="docutils literal"><span class="pre">KafkaStreams#store(...)</span></code> finds an application instance&#8217;s local state stores by name and type.</p>
<div class="figure align-center" id="id1">
<a class="reference internal image-reference" href="../../../images/streams-interactive-queries-api-01.png"><img alt="../../../images/streams-interactive-queries-api-01.png" src="../../../images/streams-interactive-queries-api-01.png" style="width: 500pt;" /></a>
<p class="caption"><span class="caption-text">Every application instance can directly query any of its local state stores.</span></p>
</div>
<p>The <em>name</em> of a state store is defined when you create the store. You can create the store explicitly by using the Processor API or implicitly by using stateful operations in the DSL.</p>
<p>The <em>type</em> of a state store is defined by <code class="docutils literal"><span class="pre">QueryableStoreType</span></code>. You can access the built-in types via the class <code class="docutils literal"><span class="pre">QueryableStoreTypes</span></code>.
Kafka Streams currently has two built-in types:</p>
<ul class="simple">
<li>A key-value store <code class="docutils literal"><span class="pre">QueryableStoreTypes#keyValueStore()</span></code>, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-key-value-stores"><span class="std std-ref">Querying local key-value stores</span></a>.</li>
<li>A window store <code class="docutils literal"><span class="pre">QueryableStoreTypes#windowStore()</span></code>, see <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-window-stores"><span class="std std-ref">Querying local window stores</span></a>.</li>
</ul>
<p>You can also <a class="reference internal" href="#streams-developer-guide-interactive-queries-custom-stores"><span class="std std-ref">implement your own QueryableStoreType</span></a> as described in section <a class="reference internal" href="#streams-developer-guide-interactive-queries-custom-stores"><span class="std std-ref">Querying local custom state stores</span></a>.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Kafka Streams materializes one state store per stream partition. This means your application will potentially manage
many underlying state stores. The API enables you to query all of the underlying stores without having to know which
partition the data is in.</p>
</div>
<div class="section" id="querying-local-key-value-stores">
<span id="streams-developer-guide-interactive-queries-local-key-value-stores"></span><h3><a class="toc-backref" href="#id4">Querying local key-value stores</a><a class="headerlink" href="#querying-local-key-value-stores" title="Permalink to this headline"></a></h3>
<p>To query a local key-value store, you must first create a topology with a key-value store. This example creates a key-value
store named &#8220;CountsKeyValueStore&#8221;. This store will hold the latest count for any word that is found on the topic &#8220;word-count-input&#8221;.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">StreamsConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">textLines</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Define the processing topology (here: WordCount)</span>
<span class="n">KGroupedStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-&gt;</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;\\W+&quot;</span><span class="o">)))</span>
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">word</span><span class="o">,</span> <span class="n">Serialized</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
<span class="c1">// Create a key-value store named &quot;CountsKeyValueStore&quot; for the all-time word counts</span>
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;CountsKeyValueStore&quot;</span><span class="o">));</span>
<span class="c1">// Start an instance of the topology</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">builder</span><span class="o">,</span> <span class="n">config</span><span class="o">);</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
</pre></div>
</div>
<p>After the application has started, you can get access to &#8220;CountsKeyValueStore&#8221; and then query it via the <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/ReadOnlyKeyValueStore.java">ReadOnlyKeyValueStore</a> API:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Get the key-value store CountsKeyValueStore</span>
<span class="n">ReadOnlyKeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">keyValueStore</span> <span class="o">=</span>
<span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">&quot;CountsKeyValueStore&quot;</span><span class="o">,</span> <span class="n">QueryableStoreTypes</span><span class="o">.</span><span class="na">keyValueStore</span><span class="o">());</span>
<span class="c1">// Get value by key</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;count for hello:&quot;</span> <span class="o">+</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">&quot;hello&quot;</span><span class="o">));</span>
<span class="c1">// Get the values for a range of keys available in this application instance</span>
<span class="n">KeyValueIterator</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">range</span> <span class="o">=</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">range</span><span class="o">(</span><span class="s">&quot;all&quot;</span><span class="o">,</span> <span class="s">&quot;streams&quot;</span><span class="o">);</span>
<span class="k">while</span> <span class="o">(</span><span class="n">range</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">range</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;count for &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span> <span class="o">+</span> <span class="s">&quot;: &quot;</span> <span class="o">+</span> <span class="n">value</span><span class="o">);</span>
<span class="o">}</span>
<span class="c1">// Get the values for all of the keys available in this application instance</span>
<span class="n">KeyValueIterator</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">range</span> <span class="o">=</span> <span class="n">keyValueStore</span><span class="o">.</span><span class="na">all</span><span class="o">();</span>
<span class="k">while</span> <span class="o">(</span><span class="n">range</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">range</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;count for &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span> <span class="o">+</span> <span class="s">&quot;: &quot;</span> <span class="o">+</span> <span class="n">value</span><span class="o">);</span>
<span class="o">}</span>
</pre></div>
</div>
<p>You can also materialize the results of stateless operators by using the overloaded methods that take a <code class="docutils literal"><span class="pre">queryableStoreName</span></code>
as shown in the example below:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">StreamsConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KTable</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">regionCounts</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// materialize the result of filtering corresponding to odd numbers</span>
<span class="c1">// the &quot;queryableStoreName&quot; can be subsequently queried.</span>
<span class="n">KTable</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">oddCounts</span> <span class="o">=</span> <span class="n">numberLines</span><span class="o">.</span><span class="na">filter</span><span class="o">((</span><span class="n">region</span><span class="o">,</span> <span class="n">count</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">(</span><span class="n">count</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">!=</span> <span class="mi">0</span><span class="o">),</span>
<span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;queryableStoreName&quot;</span><span class="o">));</span>
<span class="c1">// do not materialize the result of filtering corresponding to even numbers</span>
<span class="c1">// this means that these results will not be materialized and cannot be queried.</span>
<span class="n">KTable</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">oddCounts</span> <span class="o">=</span> <span class="n">numberLines</span><span class="o">.</span><span class="na">filter</span><span class="o">((</span><span class="n">region</span><span class="o">,</span> <span class="n">count</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">(</span><span class="n">count</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="o">));</span>
</pre></div>
</div>
</div>
<div class="section" id="querying-local-window-stores">
<span id="streams-developer-guide-interactive-queries-local-window-stores"></span><h3><a class="toc-backref" href="#id5">Querying local window stores</a><a class="headerlink" href="#querying-local-window-stores" title="Permalink to this headline"></a></h3>
<p>A window store will potentially have many results for any given key because the key can be present in multiple windows.
However, there is only one result per window for a given key.</p>
<p>To query a local window store, you must first create a topology with a window store. This example creates a window store
named &#8220;CountsWindowStore&#8221; that contains the counts for words in 1-minute windows.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">StreamsConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">textLines</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Define the processing topology (here: WordCount)</span>
<span class="n">KGroupedStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-&gt;</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;\\W+&quot;</span><span class="o">)))</span>
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">word</span><span class="o">,</span> <span class="n">Serialized</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
<span class="c1">// Create a window state store named &quot;CountsWindowStore&quot; that contains the word counts for every minute</span>
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">windowedBy</span><span class="o">(</span><span class="n">TimeWindows</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="mi">60000</span><span class="o">))</span>
<span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">,</span> <span class="n">WindowStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;CountsWindowStore&quot;</span><span class="o">));</span>
</pre></div>
</div>
<p>After the application has started, you can get access to &#8220;CountsWindowStore&#8221; and then query it via the <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/ReadOnlyWindowStore.java">ReadOnlyWindowStore</a> API:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Get the window store named &quot;CountsWindowStore&quot;</span>
<span class="n">ReadOnlyWindowStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">windowStore</span> <span class="o">=</span>
<span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">&quot;CountsWindowStore&quot;</span><span class="o">,</span> <span class="n">QueryableStoreTypes</span><span class="o">.</span><span class="na">windowStore</span><span class="o">());</span>
<span class="c1">// Fetch values for the key &quot;world&quot; for all of the windows available in this application instance.</span>
<span class="c1">// To get *all* available windows we fetch windows from the beginning of time until now.</span>
<span class="kt">long</span> <span class="n">timeFrom</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="c1">// beginning of time = oldest available</span>
<span class="kt">long</span> <span class="n">timeTo</span> <span class="o">=</span> <span class="n">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">();</span> <span class="c1">// now (in processing-time)</span>
<span class="n">WindowStoreIterator</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">iterator</span> <span class="o">=</span> <span class="n">windowStore</span><span class="o">.</span><span class="na">fetch</span><span class="o">(</span><span class="s">&quot;world&quot;</span><span class="o">,</span> <span class="n">timeFrom</span><span class="o">,</span> <span class="n">timeTo</span><span class="o">);</span>
<span class="k">while</span> <span class="o">(</span><span class="n">iterator</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">iterator</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="kt">long</span> <span class="n">windowTimestamp</span> <span class="o">=</span> <span class="n">next</span><span class="o">.</span><span class="na">key</span><span class="o">;</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Count of &#39;world&#39; @ time &quot;</span> <span class="o">+</span> <span class="n">windowTimestamp</span> <span class="o">+</span> <span class="s">&quot; is &quot;</span> <span class="o">+</span> <span class="n">next</span><span class="o">.</span><span class="na">value</span><span class="o">);</span>
<span class="o">}</span>
</pre></div>
</div>
</div>
<div class="section" id="querying-local-custom-state-stores">
<span id="streams-developer-guide-interactive-queries-custom-stores"></span><h3><a class="toc-backref" href="#id6">Querying local custom state stores</a><a class="headerlink" href="#querying-local-custom-state-stores" title="Permalink to this headline"></a></h3>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Only the <a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a> supports custom state stores.</p>
</div>
<p>Before querying the custom state stores you must implement these interfaces:</p>
<ul class="simple">
<li>Your custom state store must implement <code class="docutils literal"><span class="pre">StateStore</span></code>.</li>
<li>You must have an interface to represent the operations available on the store.</li>
<li>You must provide an implementation of <code class="docutils literal"><span class="pre">StoreBuilder</span></code> for creating instances of your store.</li>
<li>It is recommended that you provide an interface that restricts access to read-only operations. This prevents users of this API from mutating the state of your running Kafka Streams application out-of-band.</li>
</ul>
<p>The class/interface hierarchy for your custom store might look something like:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">implements</span> <span class="n">StateStore</span><span class="o">,</span> <span class="n">MyWriteableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="c1">// implementation of the actual store</span>
<span class="o">}</span>
<span class="c1">// Read-write interface for MyCustomStore</span>
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">MyWriteableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">extends</span> <span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="kt">void</span> <span class="nf">write</span><span class="o">(</span><span class="n">K</span> <span class="n">Key</span><span class="o">,</span> <span class="n">V</span> <span class="n">value</span><span class="o">);</span>
<span class="o">}</span>
<span class="c1">// Read-only interface for MyCustomStore</span>
<span class="kd">public</span> <span class="kd">interface</span> <span class="nc">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="n">V</span> <span class="nf">read</span><span class="o">(</span><span class="n">K</span> <span class="n">key</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreBuilder</span> <span class="kd">implements</span> <span class="n">StoreBuilder</span> <span class="o">{</span>
<span class="c1">// implementation of the supplier for MyCustomStore</span>
<span class="o">}</span>
</pre></div>
</div>
<p>To make this store queryable you must:</p>
<ul class="simple">
<li>Provide an implementation of <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/QueryableStoreType.java">QueryableStoreType</a>.</li>
<li>Provide a wrapper class that has access to all of the underlying instances of the store and is used for querying.</li>
</ul>
<p>Here is how to implement <code class="docutils literal"><span class="pre">QueryableStoreType</span></code>:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreType</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">implements</span> <span class="n">QueryableStoreType</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;&gt;</span> <span class="o">{</span>
<span class="c1">// Only accept StateStores that are of type MyCustomStore</span>
<span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">accepts</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStore</span> <span class="n">stateStore</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">stateStore</span> <span class="n">instanceOf</span> <span class="n">MyCustomStore</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="nf">create</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">storeProvider</span><span class="o">,</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="n">MyCustomStoreTypeWrapper</span><span class="o">(</span><span class="n">storeProvider</span><span class="o">,</span> <span class="n">storeName</span><span class="o">,</span> <span class="k">this</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
</div>
<p>A wrapper class is required because each instance of a Kafka Streams application may run multiple stream tasks and manage
multiple local instances of a particular state store. The wrapper class hides this complexity and lets you query a &#8220;logical&#8221;
state store by name without having to know about all of the underlying local instances of that state store.</p>
<p>When implementing your wrapper class you must use the
<a class="reference external" href="https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/internals/StateStoreProvider.java">StateStoreProvider</a>
interface to get access to the underlying instances of your store.
<code class="docutils literal"><span class="pre">StateStoreProvider#stores(String</span> <span class="pre">storeName,</span> <span class="pre">QueryableStoreType&lt;T&gt;</span> <span class="pre">queryableStoreType)</span></code> returns a <code class="docutils literal"><span class="pre">List</span></code> of state
stores with the given storeName and of the type as defined by <code class="docutils literal"><span class="pre">queryableStoreType</span></code>.</p>
<p>Here is an example implementation of the wrapper follows (Java 8+):</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// We strongly recommended implementing a read-only interface</span>
<span class="c1">// to restrict usage of the store to safe read operations!</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyCustomStoreTypeWrapper</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="kd">implements</span> <span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span><span class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">QueryableStoreType</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">&gt;&gt;</span> <span class="n">customStoreType</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">provider</span><span class="o">;</span>
<span class="kd">public</span> <span class="nf">CustomStoreTypeWrapper</span><span class="o">(</span><span class="kd">final</span> <span class="n">StateStoreProvider</span> <span class="n">provider</span><span class="o">,</span>
<span class="kd">final</span> <span class="n">String</span> <span class="n">storeName</span><span class="o">,</span>
<span class="kd">final</span> <span class="n">QueryableStoreType</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">&gt;&gt;</span> <span class="n">customStoreType</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// ... assign fields ...</span>
<span class="o">}</span>
<span class="c1">// Implement a safe read method</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="n">V</span> <span class="nf">read</span><span class="o">(</span><span class="kd">final</span> <span class="n">K</span> <span class="n">key</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// Get all the stores with storeName and of customStoreType</span>
<span class="kd">final</span> <span class="n">List</span><span class="o">&lt;</span><span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">K</span><span class="o">,</span> <span class="n">V</span><span class="o">&gt;&gt;</span> <span class="n">stores</span> <span class="o">=</span> <span class="n">provider</span><span class="o">.</span><span class="na">getStores</span><span class="o">(</span><span class="n">storeName</span><span class="o">,</span> <span class="n">customStoreType</span><span class="o">);</span>
<span class="c1">// Try and find the value for the given key</span>
<span class="kd">final</span> <span class="n">Optional</span><span class="o">&lt;</span><span class="n">V</span><span class="o">&gt;</span> <span class="n">value</span> <span class="o">=</span> <span class="n">stores</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">filter</span><span class="o">(</span><span class="n">store</span> <span class="o">-&gt;</span> <span class="n">store</span><span class="o">.</span><span class="na">read</span><span class="o">(</span><span class="n">key</span><span class="o">)</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">).</span><span class="na">findFirst</span><span class="o">();</span>
<span class="c1">// Return the value if it exists</span>
<span class="k">return</span> <span class="n">value</span><span class="o">.</span><span class="na">orElse</span><span class="o">(</span><span class="kc">null</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
</div>
<p>You can now find and query your custom store:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">StreamsConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">ProcessorSupplier</span> <span class="n">processorSuppler</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Create CustomStoreSupplier for store name the-custom-store</span>
<span class="n">MyCustomStoreBuilder</span> <span class="n">customStoreBuilder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MyCustomStoreBuilder</span><span class="o">(</span><span class="s">&quot;the-custom-store&quot;</span><span class="o">)</span> <span class="c1">//...;</span>
<span class="c1">// Add the source topic</span>
<span class="n">topology</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="s">&quot;input&quot;</span><span class="o">,</span> <span class="s">&quot;inputTopic&quot;</span><span class="o">);</span>
<span class="c1">// Add a custom processor that reads from the source topic</span>
<span class="n">topology</span><span class="o">.</span><span class="na">addProcessor</span><span class="o">(</span><span class="s">&quot;the-processor&quot;</span><span class="o">,</span> <span class="n">processorSupplier</span><span class="o">,</span> <span class="s">&quot;input&quot;</span><span class="o">);</span>
<span class="c1">// Connect your custom state store to the custom processor above</span>
<span class="n">topology</span><span class="o">.</span><span class="na">addStateStore</span><span class="o">(</span><span class="n">customStoreBuilder</span><span class="o">,</span> <span class="s">&quot;the-processor&quot;</span><span class="o">);</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">topology</span><span class="o">,</span> <span class="n">config</span><span class="o">);</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
<span class="c1">// Get access to the custom store</span>
<span class="n">MyReadableCustomStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">store</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">store</span><span class="o">(</span><span class="s">&quot;the-custom-store&quot;</span><span class="o">,</span> <span class="k">new</span> <span class="n">MyCustomStoreType</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span><span class="n">String</span><span class="o">&gt;());</span>
<span class="c1">// Query the store</span>
<span class="n">String</span> <span class="n">value</span> <span class="o">=</span> <span class="n">store</span><span class="o">.</span><span class="na">read</span><span class="o">(</span><span class="s">&quot;key&quot;</span><span class="o">);</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="querying-remote-state-stores-for-the-entire-app">
<span id="streams-developer-guide-interactive-queries-discovery"></span><h2><a class="toc-backref" href="#id7">Querying remote state stores for the entire app</a><a class="headerlink" href="#querying-remote-state-stores-for-the-entire-app" title="Permalink to this headline"></a></h2>
<p>To query remote states for the entire app, you must expose the application&#8217;s full state to other applications, including
applications that are running on different machines.</p>
<p>For example, you have a Kafka Streams application that processes user events in a multi-player video game, and you want to retrieve the latest status of each user directly and display it in a mobile app. Here are the required steps to make the full state of your application queryable:</p>
<ol class="arabic simple">
<li><a class="reference internal" href="#streams-developer-guide-interactive-queries-rpc-layer"><span class="std std-ref">Add an RPC layer to your application</span></a> so that
the instances of your application can be interacted with via the network (e.g., a REST API, Thrift, a custom protocol,
and so on). The instances must respond to interactive queries. You can follow the reference examples provided to get
started.</li>
<li><a class="reference internal" href="#streams-developer-guide-interactive-queries-expose-rpc"><span class="std std-ref">Expose the RPC endpoints</span></a> of
your application&#8217;s instances via the <code class="docutils literal"><span class="pre">application.server</span></code> configuration setting of Kafka Streams. Because RPC
endpoints must be unique within a network, each instance has its own value for this configuration setting.
This makes an application instance discoverable by other instances.</li>
<li>In the RPC layer, <a class="reference internal" href="#streams-developer-guide-interactive-queries-discover-app-instances-and-stores"><span class="std std-ref">discover remote application instances</span></a> and their state stores and <a class="reference internal" href="#streams-developer-guide-interactive-queries-local-stores"><span class="std std-ref">query locally available state stores</span></a> to make the full state of your application queryable. The remote application instances can forward queries to other app instances if a particular instance lacks the local data to respond to a query. The locally available state stores can directly respond to queries.</li>
</ol>
<div class="figure align-center" id="id2">
<a class="reference internal image-reference" href="../../../images/streams-interactive-queries-api-02.png"><img alt="../../../images/streams-interactive-queries-api-02.png" src="../../../images/streams-interactive-queries-api-02.png" style="width: 500pt;" /></a>
<p class="caption"><span class="caption-text">Discover any running instances of the same application as well as the respective RPC endpoints they expose for
interactive queries</span></p>
</div>
<div class="section" id="adding-an-rpc-layer-to-your-application">
<span id="streams-developer-guide-interactive-queries-rpc-layer"></span><h3><a class="toc-backref" href="#id8">Adding an RPC layer to your application</a><a class="headerlink" href="#adding-an-rpc-layer-to-your-application" title="Permalink to this headline"></a></h3>
<p>There are many ways to add an RPC layer. The only requirements are that the RPC layer is embedded within the Kafka Streams
application and that it exposes an endpoint that other application instances and applications can connect to.</p>
</div>
<div class="section" id="exposing-the-rpc-endpoints-of-your-application">
<span id="streams-developer-guide-interactive-queries-expose-rpc"></span><h3><a class="toc-backref" href="#id9">Exposing the RPC endpoints of your application</a><a class="headerlink" href="#exposing-the-rpc-endpoints-of-your-application" title="Permalink to this headline"></a></h3>
<p>To enable remote state store discovery in a distributed Kafka Streams application, you must set the <a class="reference internal" href="config-streams.html#streams-developer-guide-required-configs"><span class="std std-ref">configuration property</span></a> in <code class="docutils literal"><span class="pre">StreamsConfig</span></code>.
The <code class="docutils literal"><span class="pre">application.server</span></code> property defines a unique <code class="docutils literal"><span class="pre">host:port</span></code> pair that points to the RPC endpoint of the respective instance of a Kafka Streams application.
The value of this configuration property will vary across the instances of your application.
When this property is set, Kafka Streams will keep track of the RPC endpoint information for every instance of an application, its state stores, and assigned stream partitions through instances of <a class="reference external" href="../javadocs/org/apache/kafka/streams/state/StreamsMetadata.html">StreamsMetadata</a>.</p>
<div class="admonition tip">
<p class="first admonition-title">Tip</p>
<p class="last">Consider leveraging the exposed RPC endpoints of your application for further functionality, such as
piggybacking additional inter-application communication that goes beyond interactive queries.</p>
</div>
<p>This example shows how to configure and run a Kafka Streams application that supports the discovery of its state stores.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">props</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Set the unique RPC endpoint of this application instance through which it</span>
<span class="c1">// can be interactively queried. In a real application, the value would most</span>
<span class="c1">// probably not be hardcoded but derived dynamically.</span>
<span class="n">String</span> <span class="n">rpcEndpoint</span> <span class="o">=</span> <span class="s">&quot;host1:4460&quot;</span><span class="o">;</span>
<span class="n">props</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_SERVER_CONFIG</span><span class="o">,</span> <span class="n">rpcEndpoint</span><span class="o">);</span>
<span class="c1">// ... further settings may follow here ...</span>
<span class="n">StreamsConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StreamsConfig</span><span class="o">(</span><span class="n">props</span><span class="o">);</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StreamsBuilder</span><span class="o">();</span>
<span class="n">KStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">textLines</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="na">stream</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">,</span> <span class="s">&quot;word-count-input&quot;</span><span class="o">);</span>
<span class="kd">final</span> <span class="n">KGroupedStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">groupedByWord</span> <span class="o">=</span> <span class="n">textLines</span>
<span class="o">.</span><span class="na">flatMapValues</span><span class="o">(</span><span class="n">value</span> <span class="o">-&gt;</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">value</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">().</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;\\W+&quot;</span><span class="o">)))</span>
<span class="o">.</span><span class="na">groupBy</span><span class="o">((</span><span class="n">key</span><span class="o">,</span> <span class="n">word</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">word</span><span class="o">,</span> <span class="n">Serialized</span><span class="o">.</span><span class="na">with</span><span class="o">(</span><span class="n">stringSerde</span><span class="o">,</span> <span class="n">stringSerde</span><span class="o">));</span>
<span class="c1">// This call to `count()` creates a state store named &quot;word-count&quot;.</span>
<span class="c1">// The state store is discoverable and can be queried interactively.</span>
<span class="n">groupedByWord</span><span class="o">.</span><span class="na">count</span><span class="o">(</span><span class="n">Materialized</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">,</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">Bytes</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]&gt;</span><span class="n">as</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">));</span>
<span class="c1">// Start an instance of the topology</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">builder</span><span class="o">,</span> <span class="n">streamsConfiguration</span><span class="o">);</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
<span class="c1">// Then, create and start the actual RPC service for remote access to this</span>
<span class="c1">// application instance&#39;s local state stores.</span>
<span class="c1">//</span>
<span class="c1">// This service should be started on the same host and port as defined above by</span>
<span class="c1">// the property `StreamsConfig.APPLICATION_SERVER_CONFIG`. The example below is</span>
<span class="c1">// fictitious, but we provide end-to-end demo applications (such as KafkaMusicExample)</span>
<span class="c1">// that showcase how to implement such a service to get you started.</span>
<span class="n">MyRPCService</span> <span class="n">rpcService</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">rpcService</span><span class="o">.</span><span class="na">listenAt</span><span class="o">(</span><span class="n">rpcEndpoint</span><span class="o">);</span>
</pre></div>
</div>
</div>
<div class="section" id="discovering-and-accessing-application-instances-and-their-local-state-stores">
<span id="streams-developer-guide-interactive-queries-discover-app-instances-and-stores"></span><h3><a class="toc-backref" href="#id10">Discovering and accessing application instances and their local state stores</a><a class="headerlink" href="#discovering-and-accessing-application-instances-and-their-local-state-stores" title="Permalink to this headline"></a></h3>
<p>The following methods return <a class="reference external" href="../javadocs/org/apache/kafka/streams/state/StreamsMetadata.html">StreamsMetadata</a> objects, which provide meta-information about application instances such as their RPC endpoint and locally available state stores.</p>
<ul class="simple">
<li><code class="docutils literal"><span class="pre">KafkaStreams#allMetadata()</span></code>: find all instances of this application</li>
<li><code class="docutils literal"><span class="pre">KafkaStreams#allMetadataForStore(String</span> <span class="pre">storeName)</span></code>: find those applications instances that manage local instances of the state store &#8220;storeName&#8221;</li>
<li><code class="docutils literal"><span class="pre">KafkaStreams#metadataForKey(String</span> <span class="pre">storeName,</span> <span class="pre">K</span> <span class="pre">key,</span> <span class="pre">Serializer&lt;K&gt;</span> <span class="pre">keySerializer)</span></code>: using the default stream partitioning strategy, find the one application instance that holds the data for the given key in the given state store</li>
<li><code class="docutils literal"><span class="pre">KafkaStreams#metadataForKey(String</span> <span class="pre">storeName,</span> <span class="pre">K</span> <span class="pre">key,</span> <span class="pre">StreamPartitioner&lt;K,</span> <span class="pre">?&gt;</span> <span class="pre">partitioner)</span></code>: using <code class="docutils literal"><span class="pre">partitioner</span></code>, find the one application instance that holds the data for the given key in the given state store</li>
</ul>
<div class="admonition attention">
<p class="first admonition-title">Attention</p>
<p class="last">If <code class="docutils literal"><span class="pre">application.server</span></code> is not configured for an application instance, then the above methods will not find any <a class="reference external" href="../javadocs/org/apache/kafka/streams/state/StreamsMetadata.html">StreamsMetadata</a> for it.</p>
</div>
<p>For example, we can now find the <code class="docutils literal"><span class="pre">StreamsMetadata</span></code> for the state store named &#8220;word-count&#8221; that we defined in the
code example shown in the previous section:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Find all the locations of local instances of the state store named &quot;word-count&quot;</span>
<span class="n">Collection</span><span class="o">&lt;</span><span class="n">StreamsMetadata</span><span class="o">&gt;</span> <span class="n">wordCountHosts</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">allMetadataForStore</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">);</span>
<span class="c1">// For illustrative purposes, we assume using an HTTP client to talk to remote app instances.</span>
<span class="n">HttpClient</span> <span class="n">http</span> <span class="o">=</span> <span class="o">...;</span>
<span class="c1">// Get the word count for word (aka key) &#39;alice&#39;: Approach 1</span>
<span class="c1">//</span>
<span class="c1">// We first find the one app instance that manages the count for &#39;alice&#39; in its local state stores.</span>
<span class="n">StreamsMetadata</span> <span class="n">metadata</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">metadataForKey</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">,</span> <span class="s">&quot;alice&quot;</span><span class="o">,</span> <span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">().</span><span class="na">serializer</span><span class="o">());</span>
<span class="c1">// Then, we query only that single app instance for the latest count of &#39;alice&#39;.</span>
<span class="c1">// Note: The RPC URL shown below is fictitious and only serves to illustrate the idea. Ultimately,</span>
<span class="c1">// the URL (or, in general, the method of communication) will depend on the RPC layer you opted to</span>
<span class="c1">// implement. Again, we provide end-to-end demo applications (such as KafkaMusicExample) that showcase</span>
<span class="c1">// how to implement such an RPC layer.</span>
<span class="n">Long</span> <span class="n">result</span> <span class="o">=</span> <span class="n">http</span><span class="o">.</span><span class="na">getLong</span><span class="o">(</span><span class="s">&quot;http://&quot;</span> <span class="o">+</span> <span class="n">metadata</span><span class="o">.</span><span class="na">host</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;:&quot;</span> <span class="o">+</span> <span class="n">metadata</span><span class="o">.</span><span class="na">port</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;/word-count/alice&quot;</span><span class="o">);</span>
<span class="c1">// Get the word count for word (aka key) &#39;alice&#39;: Approach 2</span>
<span class="c1">//</span>
<span class="c1">// Alternatively, we could also choose (say) a brute-force approach where we query every app instance</span>
<span class="c1">// until we find the one that happens to know about &#39;alice&#39;.</span>
<span class="n">Optional</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">&gt;</span> <span class="n">result</span> <span class="o">=</span> <span class="n">streams</span><span class="o">.</span><span class="na">allMetadataForStore</span><span class="o">(</span><span class="s">&quot;word-count&quot;</span><span class="o">)</span>
<span class="o">.</span><span class="na">stream</span><span class="o">()</span>
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">streamsMetadata</span> <span class="o">-&gt;</span> <span class="o">{</span>
<span class="c1">// Construct the (fictituous) full endpoint URL to query the current remote application instance</span>
<span class="n">String</span> <span class="n">url</span> <span class="o">=</span> <span class="s">&quot;http://&quot;</span> <span class="o">+</span> <span class="n">streamsMetadata</span><span class="o">.</span><span class="na">host</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;:&quot;</span> <span class="o">+</span> <span class="n">streamsMetadata</span><span class="o">.</span><span class="na">port</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;/word-count/alice&quot;</span><span class="o">;</span>
<span class="c1">// Read and return the count for &#39;alice&#39;, if any.</span>
<span class="k">return</span> <span class="n">http</span><span class="o">.</span><span class="na">getLong</span><span class="o">(</span><span class="n">url</span><span class="o">);</span>
<span class="o">})</span>
<span class="o">.</span><span class="na">filter</span><span class="o">(</span><span class="n">s</span> <span class="o">-&gt;</span> <span class="n">s</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span>
<span class="o">.</span><span class="na">findFirst</span><span class="o">();</span>
</pre></div>
</div>
<p>At this point the full state of the application is interactively queryable:</p>
<ul class="simple">
<li>You can discover the running instances of the application and the state stores they manage locally.</li>
<li>Through the RPC layer that was added to the application, you can communicate with these application instances over the
network and query them for locally available state.</li>
<li>The application instances are able to serve such queries because they can directly query their own local state stores
and respond via the RPC layer.</li>
<li>Collectively, this allows us to query the full state of the entire application.</li>
</ul>
<p>To see an end-to-end application with interactive queries, review the
<a class="reference internal" href="#streams-developer-guide-interactive-queries-demos"><span class="std std-ref">demo applications</span></a>.</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/datatypes" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/memory-mgmt" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@ -0,0 +1,129 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="managing-streams-application-topics">
<span id="streams-developer-guide-topics"></span><h1>Managing Streams Application Topics<a class="headerlink" href="#managing-streams-application-topics" title="Permalink to this headline"></a></h1>
<p>A Kafka Streams application continuously reads from Kafka topics, processes the read data, and then
writes the processing results back into Kafka topics. The application may also auto-create other Kafka topics in the
Kafka brokers, for example state store changelogs topics. This section describes the differences these topic types and
how to manage the topics and your applications.</p>
<p>Kafka Streams distinguishes between <a class="reference internal" href="#streams-developer-guide-topics-user"><span class="std std-ref">user topics</span></a> and
<a class="reference internal" href="#streams-developer-guide-topics-internal"><span class="std std-ref">internal topics</span></a>.</p>
<div class="section" id="user-topics">
<span id="streams-developer-guide-topics-user"></span><h2>User topics<a class="headerlink" href="#user-topics" title="Permalink to this headline"></a></h2>
<p>User topics exist externally to an application and are read from or written to by the application, including:</p>
<dl class="docutils">
<dt>Input topics</dt>
<dd>Topics that are specified via source processors in the application&#8217;s topology; e.g. via <code class="docutils literal"><span class="pre">StreamsBuilder#stream()</span></code>, <code class="docutils literal"><span class="pre">StreamsBuilder#table()</span></code> and <code class="docutils literal"><span class="pre">Topology#addSource()</span></code>.</dd>
<dt>Output topics</dt>
<dd>Topics that are specified via sink processors in the application&#8217;s topology; e.g. via
<code class="docutils literal"><span class="pre">KStream#to()</span></code>, <code class="docutils literal"><span class="pre">KTable.to()</span></code> and <code class="docutils literal"><span class="pre">Topology#addSink()</span></code>.</dd>
<dt>Intermediate topics</dt>
<dd>Topics that are both input and output topics of the application&#8217;s topology; e.g. via
<code class="docutils literal"><span class="pre">KStream#through()</span></code>.</dd>
</dl>
<p>User topics must be created and manually managed ahead of time (e.g., via the
<a class="reference internal" href="../../kafka/post-deployment.html#kafka-operations-admin"><span class="std std-ref">topic tools</span></a>). If user topics are shared among multiple applications for reading and
writing, the application users must coordinate topic management. If user topics are centrally managed, then application
users then would not need to manage topics themselves but simply obtain access to them.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p>You should not use the auto-create topic feature on the brokers to create user topics, because:</p>
<ul class="last simple">
<li>Auto-creation of topics may be disabled in your Kafka cluster.</li>
<li>Auto-creation automatically applies the default topic settings such as the replicaton factor. These default settings might not be what you want for certain output topics (e.g., <code class="docutils literal"><span class="pre">auto.create.topics.enable=true</span></code> in the <a class="reference external" href="http://kafka.apache.org/0100/documentation.html#brokerconfigs">Kafka broker configuration</a>).</li>
</ul>
</div>
</div>
<div class="section" id="internal-topics">
<span id="streams-developer-guide-topics-internal"></span><h2>Internal topics<a class="headerlink" href="#internal-topics" title="Permalink to this headline"></a></h2>
<p>Internal topics are used internally by the Kafka Streams application while executing, for example the
changelog topics for state stores. These topics are created by the application and are only used by that stream application.</p>
<p>If security is enabled on the Kafka brokers, you must grant the underlying clients admin permissions so that they can
create internal topics set. For more information, see <a class="reference internal" href="security.html#streams-developer-guide-security"><span class="std std-ref">Streams Security</span></a>.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">The internal topics follow the naming convention <code class="docutils literal"><span class="pre">&lt;application.id&gt;-&lt;operatorName&gt;-&lt;suffix&gt;</span></code>, but this convention
is not guaranteed for future releases.</p>
</div>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/running-app" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/security" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@ -0,0 +1,241 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="memory-management">
<span id="streams-developer-guide-memory-management"></span><h1>Memory Management<a class="headerlink" href="#memory-management" title="Permalink to this headline"></a></h1>
<p>You can specify the total memory (RAM) size used for internal caching and compacting of records. This caching happens
before the records are written to state stores or forwarded downstream to other nodes.</p>
<p>The record caches are implemented slightly different in the DSL and Processor API.</p>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first">Table of Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#record-caches-in-the-dsl" id="id1">Record caches in the DSL</a></li>
<li><a class="reference internal" href="#record-caches-in-the-processor-api" id="id2">Record caches in the Processor API</a></li>
<li><a class="reference internal" href="#other-memory-usage" id="id3">Other memory usage</a></li>
</ul>
</div>
<div class="section" id="record-caches-in-the-dsl">
<span id="streams-developer-guide-memory-management-record-cache"></span><h2><a class="toc-backref" href="#id1">Record caches in the DSL</a><a class="headerlink" href="#record-caches-in-the-dsl" title="Permalink to this headline"></a></h2>
<p>You can specify the total memory (RAM) size of the record cache for an instance of the processing topology. It is leveraged
by the following <code class="docutils literal"><span class="pre">KTable</span></code> instances:</p>
<ul class="simple">
<li>Source <code class="docutils literal"><span class="pre">KTable</span></code>: <code class="docutils literal"><span class="pre">KTable</span></code> instances that are created via <code class="docutils literal"><span class="pre">StreamsBuilder#table()</span></code> or <code class="docutils literal"><span class="pre">StreamsBuilder#globalTable()</span></code>.</li>
<li>Aggregation <code class="docutils literal"><span class="pre">KTable</span></code>: instances of <code class="docutils literal"><span class="pre">KTable</span></code> that are created as a result of <a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl-aggregating"><span class="std std-ref">aggregations</span></a>.</li>
</ul>
<p>For such <code class="docutils literal"><span class="pre">KTable</span></code> instances, the record cache is used for:</p>
<ul class="simple">
<li>Internal caching and compacting of output records before they are written by the underlying stateful
<a class="reference internal" href="../concepts.html#streams-concepts-processor"><span class="std std-ref">processor node</span></a> to its internal state stores.</li>
<li>Internal caching and compacting of output records before they are forwarded from the underlying stateful
<a class="reference internal" href="../concepts.html#streams-concepts-processor"><span class="std std-ref">processor node</span></a> to any of its downstream processor nodes.</li>
</ul>
<p>Use the following example to understand the behaviors with and without record caching. In this example, the input is a
<code class="docutils literal"><span class="pre">KStream&lt;String,</span> <span class="pre">Integer&gt;</span></code> with the records <code class="docutils literal"><span class="pre">&lt;K,V&gt;:</span> <span class="pre">&lt;A,</span> <span class="pre">1&gt;,</span> <span class="pre">&lt;D,</span> <span class="pre">5&gt;,</span> <span class="pre">&lt;A,</span> <span class="pre">20&gt;,</span> <span class="pre">&lt;A,</span> <span class="pre">300&gt;</span></code>. The focus in this example is
on the records with key == <code class="docutils literal"><span class="pre">A</span></code>.</p>
<ul>
<li><p class="first">An <a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl-aggregating"><span class="std std-ref">aggregation</span></a> computes the sum of record values, grouped by key, for
the input and returns a <code class="docutils literal"><span class="pre">KTable&lt;String,</span> <span class="pre">Integer&gt;</span></code>.</p>
<blockquote>
<div><ul class="simple">
<li><strong>Without caching</strong>: a sequence of output records is emitted for key <code class="docutils literal"><span class="pre">A</span></code> that represent changes in the
resulting aggregation table. The parentheses (<code class="docutils literal"><span class="pre">()</span></code>) denote changes, the left number is the new aggregate value
and the right number is the old aggregate value: <code class="docutils literal"><span class="pre">&lt;A,</span> <span class="pre">(1,</span> <span class="pre">null)&gt;,</span> <span class="pre">&lt;A,</span> <span class="pre">(21,</span> <span class="pre">1)&gt;,</span> <span class="pre">&lt;A,</span> <span class="pre">(321,</span> <span class="pre">21)&gt;</span></code>.</li>
<li><strong>With caching</strong>: a single output record is emitted for key <code class="docutils literal"><span class="pre">A</span></code> that would likely be compacted in the cache,
leading to a single output record of <code class="docutils literal"><span class="pre">&lt;A,</span> <span class="pre">(321,</span> <span class="pre">null)&gt;</span></code>. This record is written to the aggregation&#8217;s internal state
store and forwarded to any downstream operations.</li>
</ul>
</div></blockquote>
</li>
</ul>
<p>The cache size is specified through the <code class="docutils literal"><span class="pre">cache.max.bytes.buffering</span></code> parameter, which is a global setting per
processing topology:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Enable record cache of size 10 MB.</span>
<span class="n">Properties</span> <span class="n">streamsConfiguration</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">streamsConfiguration</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1024</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
</pre></div>
</div>
<p>This parameter controls the number of bytes allocated for caching. Specifically, for a processor topology instance with
<code class="docutils literal"><span class="pre">T</span></code> threads and <code class="docutils literal"><span class="pre">C</span></code> bytes allocated for caching, each thread will have an even <code class="docutils literal"><span class="pre">C/T</span></code> bytes to construct its own
cache and use as it sees fit among its tasks. This means that there are as many caches as there are threads, but no sharing of
caches across threads happens.</p>
<p>The basic API for the cache is made of <code class="docutils literal"><span class="pre">put()</span></code> and <code class="docutils literal"><span class="pre">get()</span></code> calls. Records are
evicted using a simple LRU scheme after the cache size is reached. The first time a keyed record <code class="docutils literal"><span class="pre">R1</span> <span class="pre">=</span> <span class="pre">&lt;K1,</span> <span class="pre">V1&gt;</span></code>
finishes processing at a node, it is marked as dirty in the cache. Any other keyed record <code class="docutils literal"><span class="pre">R2</span> <span class="pre">=</span> <span class="pre">&lt;K1,</span> <span class="pre">V2&gt;</span></code> with the
same key <code class="docutils literal"><span class="pre">K1</span></code> that is processed on that node during that time will overwrite <code class="docutils literal"><span class="pre">&lt;K1,</span> <span class="pre">V1&gt;</span></code>, this is referred to as
&#8220;being compacted&#8221;. This has the same effect as
<a class="reference external" href="https://kafka.apache.org/documentation.html#compaction">Kafka&#8217;s log compaction</a>, but happens earlier, while the
records are still in memory, and within your client-side application, rather than on the server-side (i.e. the Kafka
broker). After flushing, <code class="docutils literal"><span class="pre">R2</span></code> is forwarded to the next processing node and then written to the local state store.</p>
<p>The semantics of caching is that data is flushed to the state store and forwarded to the next downstream processor node
whenever the earliest of <code class="docutils literal"><span class="pre">commit.interval.ms</span></code> or <code class="docutils literal"><span class="pre">cache.max.bytes.buffering</span></code> (cache pressure) hits. Both
<code class="docutils literal"><span class="pre">commit.interval.ms</span></code> and <code class="docutils literal"><span class="pre">cache.max.bytes.buffering</span></code> are global parameters. As such, it is not possible to specify
different parameters for individual nodes.</p>
<p>Here are example settings for both parameters based on desired scenarios.</p>
<ul>
<li><p class="first">To turn off caching the cache size can be set to zero:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Disable record cache</span>
<span class="n">Properties</span> <span class="n">streamsConfiguration</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">streamsConfiguration</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">0</span><span class="o">);</span>
</pre></div>
</div>
<p>Turning off caching might result in high write traffic for the underlying RocksDB store.
With default settings caching is enabled within Kafka Streams but RocksDB caching is disabled.
Thus, to avoid high write traffic it is recommended to enable RocksDB caching if Kafka Streams caching is turned off.</p>
<p>For example, the RocksDB Block Cache could be set to 100MB and Write Buffer size to 32 MB. For more information, see
the <a class="reference internal" href="config-streams.html#streams-developer-guide-rocksdb-config"><span class="std std-ref">RocksDB config</span></a>.</p>
</div></blockquote>
</li>
<li><p class="first">To enable caching but still have an upper bound on how long records will be cached, you can set the commit interval. In this example, it is set to 1000 milliseconds:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Properties</span> <span class="n">streamsConfiguration</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="c1">// Enable record cache of size 10 MB.</span>
<span class="n">streamsConfiguration</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">CACHE_MAX_BYTES_BUFFERING_CONFIG</span><span class="o">,</span> <span class="mi">10</span> <span class="o">*</span> <span class="mi">1024</span> <span class="o">*</span> <span class="mi">1024L</span><span class="o">);</span>
<span class="c1">// Set commit interval to 1 second.</span>
<span class="n">streamsConfiguration</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">COMMIT_INTERVAL_MS_CONFIG</span><span class="o">,</span> <span class="mi">1000</span><span class="o">);</span>
</pre></div>
</div>
</div></blockquote>
</li>
</ul>
<p>The effect of these two configurations is described in the figure below. The records are shown using 4 keys: blue, red, yellow, and green. Assume the cache has space for only 3 keys.</p>
<ul>
<li><p class="first">When the cache is disabled (a), all of the input records will be output.</p>
</li>
<li><p class="first">When the cache is enabled (b):</p>
<blockquote>
<div><ul class="simple">
<li>Most records are output at the end of commit intervals (e.g., at <code class="docutils literal"><span class="pre">t1</span></code> a single blue record is output, which is the final over-write of the blue key up to that time).</li>
<li>Some records are output because of cache pressure (i.e. before the end of a commit interval). For example, see the red record before <code class="docutils literal"><span class="pre">t2</span></code>. With smaller cache sizes we expect cache pressure to be the primary factor that dictates when records are output. With large cache sizes, the commit interval will be the primary factor.</li>
<li>The total number of records output has been reduced from 15 to 8.</li>
</ul>
</div></blockquote>
</li>
</ul>
<div class="figure align-center">
<a class="reference internal image-reference" href="../../../images/streams-cache-and-commit-interval.png"><img alt="../../../images/streams-cache-and-commit-interval.png" src="../../../images/streams-cache-and-commit-interval.png" style="width: 500pt; height: 400pt;" /></a>
</div>
</div>
<div class="section" id="record-caches-in-the-processor-api">
<span id="streams-developer-guide-memory-management-state-store-cache"></span><h2><a class="toc-backref" href="#id2">Record caches in the Processor API</a><a class="headerlink" href="#record-caches-in-the-processor-api" title="Permalink to this headline"></a></h2>
<p>You can specify the total memory (RAM) size of the record cache for an instance of the processing topology. It is used
for internal caching and compacting of output records before they are written from a stateful processor node to its
state stores.</p>
<p>The record cache in the Processor API does not cache or compact any output records that are being forwarded downstream.
This means that all downstream processor nodes can see all records, whereas the state stores see a reduced number of records.
This does not impact correctness of the system, but is a performance optimization for the state stores. For example, with the
Processor API you can store a record in a state store while forwarding a different value downstream.</p>
<p>Following from the example first shown in section <a class="reference internal" href="processor-api.html#streams-developer-guide-state-store"><span class="std std-ref">State Stores</span></a>, to enable caching, you can
add the <code class="docutils literal"><span class="pre">withCachingEnabled</span></code> call (note that caches are disabled by default and there is no explicit <code class="docutils literal"><span class="pre">withDisableCaching</span></code>
call).</p>
<p><strong>Tip:</strong> Caches are disabled by default and there is no explicit <code class="docutils literal"><span class="pre">disableCaching</span></code> call).</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">StoreBuilder</span> <span class="n">countStoreBuilder</span> <span class="o">=</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
<span class="o">.</span><span class="na">withCachingEnabled</span><span class="o">()</span>
</pre></div>
</div>
</div>
<div class="section" id="other-memory-usage">
<h2><a class="toc-backref" href="#id3">Other memory usage</a><a class="headerlink" href="#other-memory-usage" title="Permalink to this headline"></a></h2>
<p>There are other modules inside Apache Kafka that allocate memory during runtime. They include the following:</p>
<ul class="simple">
<li>Producer buffering, managed by the producer config <code class="docutils literal"><span class="pre">buffer.memory</span></code>.</li>
<li>Consumer buffering, currently not strictly managed, but can be indirectly controlled by fetch size, i.e.,
<code class="docutils literal"><span class="pre">fetch.max.bytes</span></code> and <code class="docutils literal"><span class="pre">fetch.max.wait.ms</span></code>.</li>
<li>Both producer and consumer also have separate TCP send / receive buffers that are not counted as the buffering memory.
These are controlled by the <code class="docutils literal"><span class="pre">send.buffer.bytes</span></code> / <code class="docutils literal"><span class="pre">receive.buffer.bytes</span></code> configs.</li>
<li>Deserialized objects buffering: after <code class="docutils literal"><span class="pre">consumer.poll()</span></code> returns records, they will be deserialized to extract
timestamp and buffered in the streams space. Currently this is only indirectly controlled by
<code class="docutils literal"><span class="pre">buffered.records.per.partition</span></code>.</li>
<li>RocksDB&#8217;s own memory usage, both on-heap and off-heap; critical configs (for RocksDB version 4.1.0) include
<code class="docutils literal"><span class="pre">block_cache_size</span></code>, <code class="docutils literal"><span class="pre">write_buffer_size</span></code> and <code class="docutils literal"><span class="pre">max_write_buffer_number</span></code>. These can be specified through the
<code class="docutils literal"><span class="pre">rocksdb.config.setter</span></code> configuration.</li>
</ul>
<div class="admonition tip">
<p class="first admonition-title">Tip</p>
<p><strong>Iterators should be closed explicitly to release resources:</strong> Store iterators (e.g., <code class="docutils literal"><span class="pre">KeyValueIterator</span></code> and <code class="docutils literal"><span class="pre">WindowStoreIterator</span></code>) must be closed explicitly upon completeness to release resources such as open file handlers and in-memory read buffers, or use try-with-resources statement (available since JDK7) for this Closeable class.</p>
<p class="last">Otherwise, stream application&#8217;s memory usage keeps increasing when running until it hits an OOM.</p>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/interactive-queries" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/running-app" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@ -0,0 +1,437 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="processor-api">
<span id="streams-developer-guide-processor-api"></span><h1>Processor API<a class="headerlink" href="#processor-api" title="Permalink to this headline"></a></h1>
<p>The Processor API allows developers to define and connect custom processors and to interact with state stores. With the
Processor API, you can define arbitrary stream processors that process one received record at a time, and connect these
processors with their associated state stores to compose the processor topology that represents a customized processing
logic.</p>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first">Table of Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#overview" id="id1">Overview</a></li>
<li><a class="reference internal" href="#defining-a-stream-processor" id="id2">Defining a Stream Processor</a></li>
<li><a class="reference internal" href="#state-stores" id="id3">State Stores</a><ul>
<li><a class="reference internal" href="#defining-and-creating-a-state-store" id="id4">Defining and creating a State Store</a></li>
<li><a class="reference internal" href="#fault-tolerant-state-stores" id="id5">Fault-tolerant State Stores</a></li>
<li><a class="reference internal" href="#enable-or-disable-fault-tolerance-of-state-stores-store-changelogs" id="id6">Enable or Disable Fault Tolerance of State Stores (Store Changelogs)</a></li>
<li><a class="reference internal" href="#implementing-custom-state-stores" id="id7">Implementing Custom State Stores</a></li>
</ul>
</li>
<li><a class="reference internal" href="#connecting-processors-and-state-stores" id="id8">Connecting Processors and State Stores</a></li>
</ul>
</div>
<div class="section" id="overview">
<h2><a class="toc-backref" href="#id1">Overview</a><a class="headerlink" href="#overview" title="Permalink to this headline"></a></h2>
<p>The Processor API can be used to implement both <strong>stateless</strong> as well as <strong>stateful</strong> operations, where the latter is
achieved through the use of <a class="reference internal" href="#streams-developer-guide-state-store"><span class="std std-ref">state stores</span></a>.</p>
<div class="admonition tip">
<p class="first admonition-title">Tip</p>
<p class="last"><strong>Combining the DSL and the Processor API:</strong>
You can combine the convenience of the DSL with the power and flexibility of the Processor API as described in the
section <a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl-process"><span class="std std-ref">Applying processors and transformers (Processor API integration)</span></a>.</p>
</div>
<p>For a complete list of available API functionality, see the <a class="reference internal" href="../javadocs.html#streams-javadocs"><span class="std std-ref">Kafka Streams API docs</span></a>.</p>
</div>
<div class="section" id="defining-a-stream-processor">
<span id="streams-developer-guide-stream-processor"></span><h2><a class="toc-backref" href="#id2">Defining a Stream Processor</a><a class="headerlink" href="#defining-a-stream-processor" title="Permalink to this headline"></a></h2>
<p>A <a class="reference internal" href="../concepts.html#streams-concepts"><span class="std std-ref">stream processor</span></a> is a node in the processor topology that represents a single processing step.
With the Processor API, you can define arbitrary stream processors that processes one received record at a time, and connect
these processors with their associated state stores to compose the processor topology.</p>
<p>You can define a customized stream processor by implementing the <code class="docutils literal"><span class="pre">Processor</span></code> interface, which provides the <code class="docutils literal"><span class="pre">process()</span></code> API method.
The <code class="docutils literal"><span class="pre">process()</span></code> method is called on each of the received records.</p>
<p>The <code class="docutils literal"><span class="pre">Processor</span></code> interface also has an <code class="docutils literal"><span class="pre">init()</span></code> method, which is called by the Kafka Streams library during task construction
phase. Processor instances should perform any required initialization in this method. The <code class="docutils literal"><span class="pre">init()</span></code> method passes in a <code class="docutils literal"><span class="pre">ProcessorContext</span></code>
instance, which provides access to the metadata of the currently processed record, including its source Kafka topic and partition,
its corresponding message offset, and further such information. You can also use this context instance to schedule a punctuation
function (via <code class="docutils literal"><span class="pre">ProcessorContext#schedule()</span></code>), to forward a new record as a key-value pair to the downstream processors (via <code class="docutils literal"><span class="pre">ProcessorContext#forward()</span></code>),
and to commit the current processing progress (via <code class="docutils literal"><span class="pre">ProcessorContext#commit()</span></code>).</p>
<p>Specifically, <code class="docutils literal"><span class="pre">ProcessorContext#schedule()</span></code> accepts a user <code class="docutils literal"><span class="pre">Punctuator</span></code> callback interface, which triggers its <code class="docutils literal"><span class="pre">punctuate()</span></code>
API method periodically based on the <code class="docutils literal"><span class="pre">PunctuationType</span></code>. The <code class="docutils literal"><span class="pre">PunctuationType</span></code> determines what notion of time is used
for the punctuation scheduling: either <a class="reference internal" href="../concepts.html#streams-concepts-time"><span class="std std-ref">stream-time</span></a> or wall-clock-time (by default, stream-time
is configured to represent event-time via <code class="docutils literal"><span class="pre">TimestampExtractor</span></code>). When stream-time is used, <code class="docutils literal"><span class="pre">punctuate()</span></code> is triggered purely
by data because stream-time is determined (and advanced forward) by the timestamps derived from the input data. When there
is no new input data arriving, stream-time is not advanced and thus <code class="docutils literal"><span class="pre">punctuate()</span></code> is not called.</p>
<p>For example, if you schedule a <code class="docutils literal"><span class="pre">Punctuator</span></code> function every 10 seconds based on <code class="docutils literal"><span class="pre">PunctuationType.STREAM_TIME</span></code> and if you
process a stream of 60 records with consecutive timestamps from 1 (first record) to 60 seconds (last record),
then <code class="docutils literal"><span class="pre">punctuate()</span></code> would be called 6 times. This happens regardless of the time required to actually process those records. <code class="docutils literal"><span class="pre">punctuate()</span></code>
would be called 6 times regardless of whether processing these 60 records takes a second, a minute, or an hour.</p>
<p>When wall-clock-time (i.e. <code class="docutils literal"><span class="pre">PunctuationType.WALL_CLOCK_TIME</span></code>) is used, <code class="docutils literal"><span class="pre">punctuate()</span></code> is triggered purely by the wall-clock time.
Reusing the example above, if the <code class="docutils literal"><span class="pre">Punctuator</span></code> function is scheduled based on <code class="docutils literal"><span class="pre">PunctuationType.WALL_CLOCK_TIME</span></code>, and if these
60 records were processed within 20 seconds, <code class="docutils literal"><span class="pre">punctuate()</span></code> is called 2 times (one time every 10 seconds). If these 60 records
were processed within 5 seconds, then no <code class="docutils literal"><span class="pre">punctuate()</span></code> is called at all. Note that you can schedule multiple <code class="docutils literal"><span class="pre">Punctuator</span></code>
callbacks with different <code class="docutils literal"><span class="pre">PunctuationType</span></code> types within the same processor by calling <code class="docutils literal"><span class="pre">ProcessorContext#schedule()</span></code> multiple
times inside <code class="docutils literal"><span class="pre">init()</span></code> method.</p>
<div class="admonition attention">
<p class="first admonition-title">Attention</p>
<p class="last">Stream-time is only advanced if all input partitions over all input topics have new data (with newer timestamps) available.
If at least one partition does not have any new data available, stream-time will not be advanced and thus <code class="docutils literal"><span class="pre">punctuate()</span></code> will not be triggered if <code class="docutils literal"><span class="pre">PunctuationType.STREAM_TIME</span></code> was specified.
This behavior is independent of the configured timestamp extractor, i.e., using <code class="docutils literal"><span class="pre">WallclockTimestampExtractor</span></code> does not enable wall-clock triggering of <code class="docutils literal"><span class="pre">punctuate()</span></code>.</p>
</div>
<p>The following example <code class="docutils literal"><span class="pre">Processor</span></code> defines a simple word-count algorithm and the following actions are performed:</p>
<ul class="simple">
<li>In the <code class="docutils literal"><span class="pre">init()</span></code> method, schedule the punctuation every 1000 time units (the time unit is normally milliseconds, which in this example would translate to punctuation every 1 second) and retrieve the local state store by its name &#8220;Counts&#8221;.</li>
<li>In the <code class="docutils literal"><span class="pre">process()</span></code> method, upon each received record, split the value string into words, and update their counts into the state store (we will talk about this later in this section).</li>
<li>In the <code class="docutils literal"><span class="pre">punctuate()</span></code> method, iterate the local state store and send the aggregated counts to the downstream processor (we will talk about downstream processors later in this section), and commit the current stream state.</li>
</ul>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">WordCountProcessor</span> <span class="kd">implements</span> <span class="n">Processor</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="o">{</span>
<span class="kd">private</span> <span class="n">ProcessorContext</span> <span class="n">context</span><span class="o">;</span>
<span class="kd">private</span> <span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">kvStore</span><span class="o">;</span>
<span class="nd">@Override</span>
<span class="nd">@SuppressWarnings</span><span class="o">(</span><span class="s">&quot;unchecked&quot;</span><span class="o">)</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">init</span><span class="o">(</span><span class="n">ProcessorContext</span> <span class="n">context</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// keep the processor context locally because we need it in punctuate() and commit()</span>
<span class="k">this</span><span class="o">.</span><span class="na">context</span> <span class="o">=</span> <span class="n">context</span><span class="o">;</span>
<span class="c1">// retrieve the key-value store named &quot;Counts&quot;</span>
<span class="n">kvStore</span> <span class="o">=</span> <span class="o">(</span><span class="n">KeyValueStore</span><span class="o">)</span> <span class="n">context</span><span class="o">.</span><span class="na">getStateStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">);</span>
<span class="c1">// schedule a punctuate() method every 1000 milliseconds based on stream-time</span>
<span class="k">this</span><span class="o">.</span><span class="na">context</span><span class="o">.</span><span class="na">schedule</span><span class="o">(</span><span class="mi">1000</span><span class="o">,</span> <span class="n">PunctuationType</span><span class="o">.</span><span class="na">STREAM_TIME</span><span class="o">,</span> <span class="o">(</span><span class="n">timestamp</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">{</span>
<span class="n">KeyValueIterator</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">iter</span> <span class="o">=</span> <span class="k">this</span><span class="o">.</span><span class="na">kvStore</span><span class="o">.</span><span class="na">all</span><span class="o">();</span>
<span class="k">while</span> <span class="o">(</span><span class="n">iter</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span>
<span class="n">KeyValue</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">entry</span> <span class="o">=</span> <span class="n">iter</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="n">context</span><span class="o">.</span><span class="na">forward</span><span class="o">(</span><span class="n">entry</span><span class="o">.</span><span class="na">key</span><span class="o">,</span> <span class="n">entry</span><span class="o">.</span><span class="na">value</span><span class="o">.</span><span class="na">toString</span><span class="o">());</span>
<span class="o">}</span>
<span class="n">iter</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
<span class="c1">// commit the current processing progress</span>
<span class="n">context</span><span class="o">.</span><span class="na">commit</span><span class="o">();</span>
<span class="o">});</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">punctuate</span><span class="o">(</span><span class="kt">long</span> <span class="n">timestamp</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// this method is deprecated and should not be used anymore</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">close</span><span class="o">()</span> <span class="o">{</span>
<span class="c1">// close the key-value store</span>
<span class="n">kvStore</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last"><strong>Stateful processing with state stores:</strong>
The <code class="docutils literal"><span class="pre">WordCountProcessor</span></code> defined above can access the currently received record in its <code class="docutils literal"><span class="pre">process()</span></code> method, and it can
leverage <a class="reference internal" href="#streams-developer-guide-state-store"><span class="std std-ref">state stores</span></a> to maintain processing states to, for example, remember recently
arrived records for stateful processing needs like aggregations and joins. For more information, see the <a class="reference internal" href="#streams-developer-guide-state-store"><span class="std std-ref">state stores</span></a> documentation.</p>
</div>
</div>
<div class="section" id="state-stores">
<span id="streams-developer-guide-state-store"></span><h2><a class="toc-backref" href="#id3">State Stores</a><a class="headerlink" href="#state-stores" title="Permalink to this headline"></a></h2>
<p>To implement a <strong>stateful</strong> <code class="docutils literal"><span class="pre">Processor</span></code> or <code class="docutils literal"><span class="pre">Transformer</span></code>, you must provide one or more state stores to the processor
or transformer (<em>stateless</em> processors or transformers do not need state stores). State stores can be used to remember
recently received input records, to track rolling aggregates, to de-duplicate input records, and more.
Another feature of state stores is that they can be
<a class="reference internal" href="interactive-queries.html#streams-developer-guide-interactive-queries"><span class="std std-ref">interactively queried</span></a> from other applications, such as a
NodeJS-based dashboard or a microservice implemented in Scala or Go.</p>
<p>The
<a class="reference internal" href="#streams-developer-guide-state-store-defining"><span class="std std-ref">available state store types</span></a> in Kafka Streams have
<a class="reference internal" href="#streams-developer-guide-state-store-fault-tolerance"><span class="std std-ref">fault tolerance</span></a> enabled by default.</p>
<div class="section" id="defining-and-creating-a-state-store">
<span id="streams-developer-guide-state-store-defining"></span><h3><a class="toc-backref" href="#id4">Defining and creating a State Store</a><a class="headerlink" href="#defining-and-creating-a-state-store" title="Permalink to this headline"></a></h3>
<p>You can either use one of the available store types or
<a class="reference internal" href="#streams-developer-guide-state-store-custom"><span class="std std-ref">implement your own custom store type</span></a>.
It&#8217;s common practice to leverage an existing store type via the <code class="docutils literal"><span class="pre">Stores</span></code> factory.</p>
<p>Note that, when using Kafka Streams, you normally don&#8217;t create or instantiate state stores directly in your code.
Rather, you define state stores indirectly by creating a so-called <code class="docutils literal"><span class="pre">StoreBuilder</span></code>. This buildeer is used by
Kafka Streams as a factory to instantiate the actual state stores locally in application instances when and where
needed.</p>
<p>The following store types are available out of the box.</p>
<table border="1" class="non-scrolling-table width-100-percent docutils">
<colgroup>
<col width="19%" />
<col width="11%" />
<col width="18%" />
<col width="51%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Store Type</th>
<th class="head">Storage Engine</th>
<th class="head">Fault-tolerant?</th>
<th class="head">Description</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>Persistent
<code class="docutils literal"><span class="pre">KeyValueStore&lt;K,</span> <span class="pre">V&gt;</span></code></td>
<td>RocksDB</td>
<td>Yes (enabled by default)</td>
<td><ul class="first simple">
<li><strong>The recommended store type for most use cases.</strong></li>
<li>Stores its data on local disk.</li>
<li>Storage capacity:
managed local state can be larger than the memory (heap space) of an
application instance, but must fit into the available local disk
space.</li>
<li>RocksDB settings can be fine-tuned, see
<a class="reference internal" href="config-streams.html#streams-developer-guide-rocksdb-config"><span class="std std-ref">RocksDB configuration</span></a>.</li>
<li>Available <a class="reference external" href="../javadocs/org/apache/kafka/streams/state/Stores.PersistentKeyValueFactory.html">store variants</a>:
time window key-value store, session window key-value store.</li>
</ul>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Creating a persistent key-value store:</span>
<span class="c1">// here, we create a `KeyValueStore&lt;String, Long&gt;` named &quot;persistent-counts&quot;.</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.processor.StateStoreSupplier</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<span class="c1">// Note: The `Stores` factory returns a supplier for the state store,</span>
<span class="c1">// because that&#39;s what you typically need to pass as API parameter.</span>
<span class="n">StateStoreSupplier</span> <span class="n">countStoreSupplier</span> <span class="o">=</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="s">&quot;persistent-counts&quot;</span><span class="o">)</span>
<span class="o">.</span><span class="na">withKeys</span><span class="o">(</span><span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">())</span>
<span class="o">.</span><span class="na">withValues</span><span class="o">(</span><span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
<span class="o">.</span><span class="na">persistent</span><span class="o">()</span>
<span class="o">.</span><span class="na">build</span><span class="o">();</span>
</pre></div>
</div>
<p class="last">See
<a class="reference external" href="../javadocs/org/apache/kafka/streams/state/Stores.PersistentKeyValueFactory.html">PersistentKeyValueFactory</a> for
detailed factory options.</p>
</td>
</tr>
<tr class="row-odd"><td>In-memory
<code class="docutils literal"><span class="pre">KeyValueStore&lt;K,</span> <span class="pre">V&gt;</span></code></td>
<td>-</td>
<td>Yes (enabled by default)</td>
<td><ul class="first simple">
<li>Stores its data in memory.</li>
<li>Storage capacity:
managed local state must fit into memory (heap space) of an
application instance.</li>
<li>Useful when application instances run in an environment where local
disk space is either not available or local disk space is wiped
in-between app instance restarts.</li>
</ul>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Creating an in-memory key-value store:</span>
<span class="c1">// here, we create a `KeyValueStore&lt;String, Long&gt;` named &quot;inmemory-counts&quot;.</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.processor.StateStoreSupplier</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<span class="c1">// Note: The `Stores` factory returns a supplier for the state store,</span>
<span class="c1">// because that&#39;s what you typically need to pass as API parameter.</span>
<span class="n">StateStoreSupplier</span> <span class="n">countStoreSupplier</span> <span class="o">=</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="s">&quot;inmemory-counts&quot;</span><span class="o">)</span>
<span class="o">.</span><span class="na">withKeys</span><span class="o">(</span><span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">())</span>
<span class="o">.</span><span class="na">withValues</span><span class="o">(</span><span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
<span class="o">.</span><span class="na">inMemory</span><span class="o">()</span>
<span class="o">.</span><span class="na">build</span><span class="o">();</span>
</pre></div>
</div>
<p class="last">See
<a class="reference external" href="../javadocs/org/apache/kafka/streams/state/Stores.InMemoryKeyValueFactory.html">InMemoryKeyValueFactory</a> for
detailed factory options.</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="fault-tolerant-state-stores">
<span id="streams-developer-guide-state-store-fault-tolerance"></span><h3><a class="toc-backref" href="#id5">Fault-tolerant State Stores</a><a class="headerlink" href="#fault-tolerant-state-stores" title="Permalink to this headline"></a></h3>
<p>To make state stores fault-tolerant and to allow for state store migration without data loss, a state store can be
continuously backed up to a Kafka topic behind the scenes. For example, to migrate a stateful stream task from one
machine to another when <a class="reference internal" href="running-app.html#streams-developer-guide-execution-scaling"><span class="std std-ref">elastically adding or removing capacity from your application</span></a>.
This topic is sometimes referred to as the state store&#8217;s associated <em>changelog topic</em>, or its <em>changelog</em>. For example, if
you experience machine failure, the state store and the application&#8217;s state can be fully restored from its changelog. You can
<a class="reference internal" href="#streams-developer-guide-state-store-enable-disable-fault-tolerance"><span class="std std-ref">enable or disable this backup feature</span></a> for a
state store.</p>
<p>By default, persistent key-value stores are fault-tolerant. They are backed by a
<a class="reference external" href="https://kafka.apache.org/documentation.html#compaction">compacted</a> changelog topic. The purpose of compacting this
topic is to prevent the topic from growing indefinitely, to reduce the storage consumed in the associated Kafka cluster,
and to minimize recovery time if a state store needs to be restored from its changelog topic.</p>
<p>Similarly, persistent window stores are fault-tolerant. They are backed by a topic that uses both compaction and
deletion. Because of the structure of the message keys that are being sent to the changelog topics, this combination of
deletion and compaction is required for the changelog topics of window stores. For window stores, the message keys are
composite keys that include the &#8220;normal&#8221; key and window timestamps. For these types of composite keys it would not
be sufficient to only enable compaction to prevent a changelog topic from growing out of bounds. With deletion
enabled, old windows that have expired will be cleaned up by Kafka&#8217;s log cleaner as the log segments expire. The
default retention setting is <code class="docutils literal"><span class="pre">Windows#maintainMs()</span></code> + 1 day. You can override this setting by specifying
<code class="docutils literal"><span class="pre">StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG</span></code> in the <code class="docutils literal"><span class="pre">StreamsConfig</span></code>.</p>
<p>When you open an <code class="docutils literal"><span class="pre">Iterator</span></code> from a state store you must call <code class="docutils literal"><span class="pre">close()</span></code> on the iterator when you are done working with
it to reclaim resources; or you can use the iterator from within a try-with-resources statement. If you do not close an iterator,
you may encounter an OOM error.</p>
</div>
<div class="section" id="enable-or-disable-fault-tolerance-of-state-stores-store-changelogs">
<span id="streams-developer-guide-state-store-enable-disable-fault-tolerance"></span><h3><a class="toc-backref" href="#id6">Enable or Disable Fault Tolerance of State Stores (Store Changelogs)</a><a class="headerlink" href="#enable-or-disable-fault-tolerance-of-state-stores-store-changelogs" title="Permalink to this headline"></a></h3>
<p>You can enable or disable fault tolerance for a state store by enabling or disabling the change logging
of the store through <code class="docutils literal"><span class="pre">enableLogging()</span></code> and <code class="docutils literal"><span class="pre">disableLogging()</span></code>.
You can also fine-tune the associated topics configuration if needed.</p>
<p>Example for disabling fault-tolerance:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<span class="n">StoreBuilder</span><span class="o">&lt;</span><span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">countStoreSupplier</span> <span class="o">=</span> <span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
<span class="o">.</span><span class="na">withLoggingDisabled</span><span class="o">();</span> <span class="c1">// disable backing up the store to a changelog topic</span>
</pre></div>
</div>
<div class="admonition attention">
<p class="first admonition-title">Attention</p>
<p class="last">If the changelog is disabled then the attached state store is no longer fault tolerant and it can&#8217;t have any <a class="reference internal" href="config-streams.html#streams-developer-guide-standby-replicas"><span class="std std-ref">standby replicas</span></a>.</p>
</div>
<p>Here is an example for enabling fault tolerance, with additional changelog-topic configuration:
You can add any log config from <a class="reference external" href="https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/log/LogConfig.scala#L61">kafka.log.LogConfig</a>.
Unrecognized configs will be ignored.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.StoreBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.state.Stores</span><span class="o">;</span>
<span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">changelogConfig</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">();</span>
<span class="c1">// override min.insync.replicas</span>
<span class="n">changelogConfig</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;min.insyc.replicas&quot;</span><span class="o">,</span> <span class="s">&quot;1&quot;</span><span class="o">)</span>
<span class="n">StoreBuilder</span><span class="o">&lt;</span><span class="n">KeyValueStore</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span> <span class="n">countStoreSupplier</span> <span class="o">=</span> <span class="n">Stores</span><span class="o">.</span><span class="na">keyValueStoreBuilder</span><span class="o">(</span>
<span class="n">Stores</span><span class="o">.</span><span class="na">persistentKeyValueStore</span><span class="o">(</span><span class="s">&quot;Counts&quot;</span><span class="o">),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">String</span><span class="o">(),</span>
<span class="n">Serdes</span><span class="o">.</span><span class="na">Long</span><span class="o">())</span>
<span class="o">.</span><span class="na">withLoggingEnabled</span><span class="o">(</span><span class="n">changlogConfig</span><span class="o">);</span> <span class="c1">// enable changelogging, with custom changelog settings</span>
</pre></div>
</div>
</div>
<div class="section" id="implementing-custom-state-stores">
<span id="streams-developer-guide-state-store-custom"></span><h3><a class="toc-backref" href="#id7">Implementing Custom State Stores</a><a class="headerlink" href="#implementing-custom-state-stores" title="Permalink to this headline"></a></h3>
<p>You can use the <a class="reference internal" href="#streams-developer-guide-state-store-defining"><span class="std std-ref">built-in state store types</span></a> or implement your own.
The primary interface to implement for the store is
<code class="docutils literal"><span class="pre">org.apache.kafka.streams.processor.StateStore</span></code>. Kafka Streams also has a few extended interfaces such
as <code class="docutils literal"><span class="pre">KeyValueStore</span></code>.</p>
<p>You also need to provide a &#8220;factory&#8221; for the store by implementing the
<code class="docutils literal"><span class="pre">org.apache.kafka.streams.processor.StateStoreSupplier</span></code> interface, which Kafka Streams uses to create instances of
your store.</p>
</div>
</div>
<div class="section" id="connecting-processors-and-state-stores">
<h2><a class="toc-backref" href="#id8">Connecting Processors and State Stores</a><a class="headerlink" href="#connecting-processors-and-state-stores" title="Permalink to this headline"></a></h2>
<p>Now that a <a class="reference internal" href="#streams-developer-guide-stream-processor"><span class="std std-ref">processor</span></a> (WordCountProcessor) and the
state stores have been defined, you can construct the processor topology by connecting these processors and state stores together by
using the <code class="docutils literal"><span class="pre">Topology</span></code> instance. In addition, you can add source processors with the specified Kafka topics
to generate input data streams into the topology, and sink processors with the specified Kafka topics to generate
output data streams out of the topology.</p>
<p>Here is an example implementation:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="n">Topology</span> <span class="n">builder</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Topology</span><span class="o">();</span>
<span class="c1">// add the source processor node that takes Kafka topic &quot;source-topic&quot; as input</span>
<span class="n">builder</span><span class="o">.</span><span class="na">addSource</span><span class="o">(</span><span class="s">&quot;Source&quot;</span><span class="o">,</span> <span class="s">&quot;source-topic&quot;</span><span class="o">)</span>
<span class="c1">// add the WordCountProcessor node which takes the source processor as its upstream processor</span>
<span class="o">.</span><span class="na">addProcessor</span><span class="o">(</span><span class="s">&quot;Process&quot;</span><span class="o">,</span> <span class="o">()</span> <span class="o">-&gt;</span> <span class="k">new</span> <span class="n">WordCountProcessor</span><span class="o">(),</span> <span class="s">&quot;Source&quot;</span><span class="o">)</span>
<span class="c1">// add the count store associated with the WordCountProcessor processor</span>
<span class="o">.</span><span class="na">addStateStore</span><span class="o">(</span><span class="n">countStoreBuilder</span><span class="o">,</span> <span class="s">&quot;Process&quot;</span><span class="o">)</span>
<span class="c1">// add the sink processor node that takes Kafka topic &quot;sink-topic&quot; as output</span>
<span class="c1">// and the WordCountProcessor node as its upstream processor</span>
<span class="o">.</span><span class="na">addSink</span><span class="o">(</span><span class="s">&quot;Sink&quot;</span><span class="o">,</span> <span class="s">&quot;sink-topic&quot;</span><span class="o">,</span> <span class="s">&quot;Process&quot;</span><span class="o">);</span>
</pre></div>
</div>
<p>Here is a quick explanation of this example:</p>
<ul class="simple">
<li>A source processor node named <code class="docutils literal"><span class="pre">&quot;Source&quot;</span></code> is added to the topology using the <code class="docutils literal"><span class="pre">addSource</span></code> method, with one Kafka topic
<code class="docutils literal"><span class="pre">&quot;source-topic&quot;</span></code> fed to it.</li>
<li>A processor node named <code class="docutils literal"><span class="pre">&quot;Process&quot;</span></code> with the pre-defined <code class="docutils literal"><span class="pre">WordCountProcessor</span></code> logic is then added as the downstream
processor of the <code class="docutils literal"><span class="pre">&quot;Source&quot;</span></code> node using the <code class="docutils literal"><span class="pre">addProcessor</span></code> method.</li>
<li>A predefined persistent key-value state store is created and associated with the <code class="docutils literal"><span class="pre">&quot;Process&quot;</span></code> node, using
<code class="docutils literal"><span class="pre">countStoreBuilder</span></code>.</li>
<li>A sink processor node is then added to complete the topology using the <code class="docutils literal"><span class="pre">addSink</span></code> method, taking the <code class="docutils literal"><span class="pre">&quot;Process&quot;</span></code> node
as its upstream processor and writing to a separate <code class="docutils literal"><span class="pre">&quot;sink-topic&quot;</span></code> Kafka topic.</li>
</ul>
<p>In this topology, the <code class="docutils literal"><span class="pre">&quot;Process&quot;</span></code> stream processor node is considered a downstream processor of the <code class="docutils literal"><span class="pre">&quot;Source&quot;</span></code> node, and an
upstream processor of the <code class="docutils literal"><span class="pre">&quot;Sink&quot;</span></code> node. As a result, whenever the <code class="docutils literal"><span class="pre">&quot;Source&quot;</span></code> node forwards a newly fetched record from
Kafka to its downstream <code class="docutils literal"><span class="pre">&quot;Process&quot;</span></code> node, the <code class="docutils literal"><span class="pre">WordCountProcessor#process()</span></code> method is triggered to process the record and
update the associated state store. Whenever <code class="docutils literal"><span class="pre">context#forward()</span></code> is called in the
<code class="docutils literal"><span class="pre">WordCountProcessor#punctuate()</span></code> method, the aggregate key-value pair will be sent via the <code class="docutils literal"><span class="pre">&quot;Sink&quot;</span></code> processor node to
the Kafka topic <code class="docutils literal"><span class="pre">&quot;sink-topic&quot;</span></code>. Note that in the <code class="docutils literal"><span class="pre">WordCountProcessor</span></code> implementation, you must refer to the
same store name <code class="docutils literal"><span class="pre">&quot;Counts&quot;</span></code> when accessing the key-value store, otherwise an exception will be thrown at runtime,
indicating that the state store cannot be found. If the state store is not associated with the processor
in the <code class="docutils literal"><span class="pre">Topology</span></code> code, accessing it in the processor&#8217;s <code class="docutils literal"><span class="pre">init()</span></code> method will also throw an exception at
runtime, indicating the state store is not accessible from this processor.</p>
<p>Now that you have fully defined your processor topology in your application, you can proceed to
<a class="reference internal" href="running-app.html#streams-developer-guide-execution"><span class="std std-ref">running the Kafka Streams application</span></a>.</p>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/dsl-api" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/datatypes" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@ -0,0 +1,197 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="running-streams-applications">
<span id="streams-developer-guide-execution"></span><h1>Running Streams Applications<a class="headerlink" href="#running-streams-applications" title="Permalink to this headline"></a></h1>
<p>You can run Java applications that use the Kafka Streams library without any additional configuration or requirements.</p>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first">Table of Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#starting-a-kafka-streams-application" id="id3">Starting a Kafka Streams application</a></li>
<li><a class="reference internal" href="#elastic-scaling-of-your-application" id="id4">Elastic scaling of your application</a><ul>
<li><a class="reference internal" href="#adding-capacity-to-your-application" id="id5">Adding capacity to your application</a></li>
<li><a class="reference internal" href="#removing-capacity-from-your-application" id="id6">Removing capacity from your application</a></li>
<li><a class="reference internal" href="#state-restoration-during-workload-rebalance" id="id7">State restoration during workload rebalance</a></li>
<li><a class="reference internal" href="#determining-how-many-application-instances-to-run" id="id8">Determining how many application instances to run</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="running-streams-applications">
<span id="streams-developer-guide-execution"></span><h1>Running Streams Applications<a class="headerlink" href="#running-streams-applications" title="Permalink to this headline"></a></h1>
<p>You can run Java applications that use the Kafka Streams library without any additional configuration or requirements. Kafka Streams
also provides the ability to receive notification of the various states of the application. The ability to monitor the runtime
status is discussed in <a class="reference internal" href="../monitoring.html#streams-monitoring"><span class="std std-ref">the monitoring guide</span></a>.</p>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first">Table of Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#starting-a-kafka-streams-application" id="id3">Starting a Kafka Streams application</a></li>
<li><a class="reference internal" href="#elastic-scaling-of-your-application" id="id4">Elastic scaling of your application</a><ul>
<li><a class="reference internal" href="#adding-capacity-to-your-application" id="id5">Adding capacity to your application</a></li>
<li><a class="reference internal" href="#removing-capacity-from-your-application" id="id6">Removing capacity from your application</a></li>
<li><a class="reference internal" href="#state-restoration-during-workload-rebalance" id="id7">State restoration during workload rebalance</a></li>
<li><a class="reference internal" href="#determining-how-many-application-instances-to-run" id="id8">Determining how many application instances to run</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="starting-a-kafka-streams-application">
<span id="streams-developer-guide-execution-starting"></span><h2><a class="toc-backref" href="#id3">Starting a Kafka Streams application</a><a class="headerlink" href="#starting-a-kafka-streams-application" title="Permalink to this headline"></a></h2>
<p>You can package your Java application as a fat JAR file and then start the application like this:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span><span class="c1"># Start the application in class `com.example.MyStreamsApp`</span>
<span class="c1"># from the fat JAR named `path-to-app-fatjar.jar`.</span>
$ java -cp path-to-app-fatjar.jar com.example.MyStreamsApp
</pre></div>
</div>
<p>For more information about how you can package your application in this way, see the
<a class="reference internal" href="../code-examples.html#streams-code-examples"><span class="std std-ref">Streams code examples</span></a>.</p>
<p>When you start your application you are launching a Kafka Streams instance of your application. You can run multiple
instances of your application. A common scenario is that there are multiple instances of your application running in
parallel. For more information, see <a class="reference internal" href="../architecture.html#streams-architecture-parallelism-model"><span class="std std-ref">Parallelism Model</span></a>.</p>
<p>When the application instance starts running, the defined processor topology will be initialized as one or more stream tasks.
If the processor topology defines any state stores, these are also constructed during the initialization period. For
more information, see the <a class="reference internal" href="#streams-developer-guide-execution-scaling-state-restoration"><span class="std std-ref">State restoration during workload rebalance</span></a> section).</p>
</div>
<div class="section" id="elastic-scaling-of-your-application">
<span id="streams-developer-guide-execution-scaling"></span><h2><a class="toc-backref" href="#id4">Elastic scaling of your application</a><a class="headerlink" href="#elastic-scaling-of-your-application" title="Permalink to this headline"></a></h2>
<p>Kafka Streams makes your stream processing applications elastic and scalable. You can add and remove processing capacity
dynamically during application runtime without any downtime or data loss. This makes your applications
resilient in the face of failures and for allows you to perform maintenance as needed (e.g. rolling upgrades).</p>
<p>For more information about this elasticity, see the <a class="reference internal" href="../architecture.html#streams-architecture-parallelism-model"><span class="std std-ref">Parallelism Model</span></a> section. Kafka Streams
leverages the Kafka group management functionality, which is built right into the <a class="reference external" href="https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol">Kafka wire protocol</a>. It is the foundation that enables the
elasticity of Kafka Streams applications: members of a group coordinate and collaborate jointly on the consumption and
processing of data in Kafka. Additionally, Kafka Streams provides stateful processing and allows for fault-tolerant
state in environments where application instances may come and go at any time.</p>
<div class="section" id="adding-capacity-to-your-application">
<h3><a class="toc-backref" href="#id5">Adding capacity to your application</a><a class="headerlink" href="#adding-capacity-to-your-application" title="Permalink to this headline"></a></h3>
<p>If you need more processing capacity for your stream processing application, you can simply start another instance of your stream processing application, e.g. on another machine, in order to scale out. The instances of your application will become aware of each other and automatically begin to share the processing work. More specifically, what will be handed over from the existing instances to the new instances is (some of) the stream tasks that have been run by the existing instances. Moving stream tasks from one instance to another results in moving the processing work plus any internal state of these stream tasks (the state of a stream task will be re-created in the target instance by restoring the state from its corresponding changelog topic).</p>
<p>The various instances of your application each run in their own JVM process, which means that each instance can leverage all the processing capacity that is available to their respective JVM process (minus the capacity that any non-Kafka-Streams part of your application may be using). This explains why running additional instances will grant your application additional processing capacity. The exact capacity you will be adding by running a new instance depends of course on the environment in which the new instance runs: available CPU cores, available main memory and Java heap space, local storage, network bandwidth, and so on. Similarly, if you stop any of the running instances of your application, then you are removing and freeing up the respective processing capacity.</p>
<div class="figure align-center" id="id1">
<a class="reference internal image-reference" href="../../../images/streams-elastic-scaling-1.png"><img alt="../../../images/streams-elastic-scaling-1.png" src="../../../images/streams-elastic-scaling-1.png" style="width: 500pt; height: 400pt;" /></a>
<p class="caption"><span class="caption-text">Before adding capacity: only a single instance of your Kafka Streams application is running. At this point the corresponding Kafka consumer group of your application contains only a single member (this instance). All data is being read and processed by this single instance.</span></p>
</div>
<div class="figure align-center" id="id2">
<a class="reference internal image-reference" href="../../../images/streams-elastic-scaling-2.png"><img alt="../../../images/streams-elastic-scaling-2.png" src="../../../images/streams-elastic-scaling-2.png" style="width: 500pt; height: 400pt;" /></a>
<p class="caption"><span class="caption-text">After adding capacity: now two additional instances of your Kafka Streams application are running, and they have automatically joined the application&#8217;s Kafka consumer group for a total of three current members. These three instances are automatically splitting the processing work between each other. The splitting is based on the Kafka topic partitions from which data is being read.</span></p>
</div>
</div>
<div class="section" id="removing-capacity-from-your-application">
<h3><a class="toc-backref" href="#id6">Removing capacity from your application</a><a class="headerlink" href="#removing-capacity-from-your-application" title="Permalink to this headline"></a></h3>
<p>To remove processing capacity, you can stop running stream processing application instances (e.g., shut down two of
the four instances), it will automatically leave the applications consumer group, and the remaining instances of
your application will automatically take over the processing work. The remaining instances take over the stream tasks that
were run by the stopped instances. Moving stream tasks from one instance to another results in moving the processing
work plus any internal state of these stream tasks. The state of a stream task is recreated in the target instance
from its changelog topic.</p>
<div class="figure align-center">
<a class="reference internal image-reference" href="../../../images/streams-elastic-scaling-3.png"><img alt="../../../images/streams-elastic-scaling-3.png" src="../../../images/streams-elastic-scaling-3.png" style="width: 500pt; height: 400pt;" /></a>
</div>
</div>
<div class="section" id="state-restoration-during-workload-rebalance">
<span id="streams-developer-guide-execution-scaling-state-restoration"></span><h3><a class="toc-backref" href="#id7">State restoration during workload rebalance</a><a class="headerlink" href="#state-restoration-during-workload-rebalance" title="Permalink to this headline"></a></h3>
<p>When a task is migrated, the task processing state is fully restored before the application instance resumes
processing. This guarantees the correct processing results. In Kafka Streams, state restoration is usually done by
replaying the corresponding changelog topic to reconstruct the state store. To minimize changelog-based restoration
latency by using replicated local state stores, you can specify <code class="docutils literal"><span class="pre">num.standby.replicas</span></code>. When a stream task is
initialized or re-initialized on the application instance, its state store is restored like this:</p>
<ul class="simple">
<li>If no local state store exists, the changelog is replayed from the earliest to the current offset. This reconstructs the local state store to the most recent snapshot.</li>
<li>If a local state store exists, the changelog is replayed from the previously checkpointed offset. The changes are applied and the state is restored to the most recent snapshot. This method takes less time because it is applying a smaller portion of the changelog.</li>
</ul>
<p>For more information, see <a class="reference internal" href="config-streams.html#streams-developer-guide-standby-replicas"><span class="std std-ref">Standby Replicas</span></a>.</p>
</div>
<div class="section" id="determining-how-many-application-instances-to-run">
<h3><a class="toc-backref" href="#id8">Determining how many application instances to run</a><a class="headerlink" href="#determining-how-many-application-instances-to-run" title="Permalink to this headline"></a></h3>
<p>The parallelism of a Kafka Streams application is primarily determined by how many partitions the input topics have. For
example, if your application reads from a single topic that has ten partitions, then you can run up to ten instances
of your applications. You can run further instances, but these will be idle.</p>
<p>The number of topic partitions is the upper limit for the parallelism of your Kafka Streams application and for the
number of running instances of your application.</p>
<p>To achieve balanced workload processing across application instances and to prevent processing hotpots, you should
distribute data and processing workloads:</p>
<ul class="simple">
<li>Data should be equally distributed across topic partitions. For example, if two topic partitions each have 1 million messages, this is better than a single partition with 2 million messages and none in the other.</li>
<li>Processing workload should be equally distributed across topic partitions. For example, if the time to process messages varies widely, then it is better to spread the processing-intensive messages across partitions rather than storing these messages within the same partition.</li>
</ul>
</div>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/memory-mgmt" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/manage-topics" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@ -0,0 +1,176 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<div class="sticky-top">
<!-- div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div -->
</div>
</div>
<div class="section" id="streams-security">
<span id="streams-developer-guide-security"></span><h1>Streams Security<a class="headerlink" href="#streams-security" title="Permalink to this headline"></a></h1>
<div class="contents local topic" id="table-of-contents">
<p class="topic-title first">Table of Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#required-acl-setting-for-secure-kafka-clusters" id="id1">Required ACL setting for secure Kafka clusters</a></li>
<li><a class="reference internal" href="#security-example" id="id2">Security example</a></li>
</ul>
</div>
<p>Kafka Streams natively integrates with the <a class="reference internal" href="../../kafka/security.html#kafka-security"><span class="std std-ref">Kafka&#8217;s security features</span></a> and supports all of the
client-side security features in Kafka. Streams leverages the <a class="reference internal" href="../../clients/index.html#kafka-clients"><span class="std std-ref">Java Producer and Consumer API</span></a>.</p>
<p>To secure your Stream processing applications, configure the security settings in the corresponding Kafka producer
and consumer clients, and then specify the corresponding configuration settings in your Kafka Streams application.</p>
<p>Kafka supports cluster encryption and authentication, including a mix of authenticated and unauthenticated,
and encrypted and non-encrypted clients. Using security is optional.</p>
<p>Here a few relevant client-side security features:</p>
<dl class="docutils">
<dt>Encrypt data-in-transit between your applications and Kafka brokers</dt>
<dd>You can enable the encryption of the client-server communication between your applications and the Kafka brokers.
For example, you can configure your applications to always use encryption when reading and writing data to and from
Kafka. This is critical when reading and writing data across security domains such as internal network, public
internet, and partner networks.</dd>
<dt>Client authentication</dt>
<dd>You can enable client authentication for connections from your application to Kafka brokers. For example, you can
define that only specific applications are allowed to connect to your Kafka cluster.</dd>
<dt>Client authorization</dt>
<dd>You can enable client authorization of read and write operations by your applications. For example, you can define
that only specific applications are allowed to read from a Kafka topic. You can also restrict write access to Kafka
topics to prevent data pollution or fraudulent activities.</dd>
</dl>
<p>For more information about the security features in Apache Kafka, see <a class="reference internal" href="../../kafka/security.html#kafka-security"><span class="std std-ref">Kafka Security</span></a>.</p>
<div class="section" id="required-acl-setting-for-secure-kafka-clusters">
<span id="streams-developer-guide-security-acls"></span><h2><a class="toc-backref" href="#id1">Required ACL setting for secure Kafka clusters</a><a class="headerlink" href="#required-acl-setting-for-secure-kafka-clusters" title="Permalink to this headline"></a></h2>
<p>When applications are run against a secured Kafka cluster, the principal running the application must have the ACL
<code class="docutils literal"><span class="pre">--cluster</span> <span class="pre">--operation</span> <span class="pre">Create</span></code> set so that the application has the permissions to create
<a class="reference internal" href="manage-topics.html#streams-developer-guide-topics-internal"><span class="std std-ref">internal topics</span></a>.</p>
</div>
<div class="section" id="security-example">
<span id="streams-developer-guide-security-example"></span><h2><a class="toc-backref" href="#id2">Security example</a><a class="headerlink" href="#security-example" title="Permalink to this headline"></a></h2>
<p>The purpose is to configure a Kafka Streams application to enable client authentication and encrypt data-in-transit when
communicating with its Kafka cluster.</p>
<p>This example assumes that the Kafka brokers in the cluster already have their security setup and that the necessary SSL
certificates are available to the application in the local filesystem locations. For example, if you are using Docker
then you must also include these SSL certificates in the correct locations within the Docker image.</p>
<p>The snippet below shows the settings to enable client authentication and SSL encryption for data-in-transit between your
Kafka Streams application and the Kafka cluster it is reading and writing from:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span><span class="c1"># Essential security settings to enable client authentication and SSL encryption</span>
bootstrap.servers<span class="o">=</span>kafka.example.com:9093
security.protocol<span class="o">=</span>SSL
ssl.truststore.location<span class="o">=</span>/etc/security/tls/kafka.client.truststore.jks
ssl.truststore.password<span class="o">=</span>test1234
ssl.keystore.location<span class="o">=</span>/etc/security/tls/kafka.client.keystore.jks
ssl.keystore.password<span class="o">=</span>test1234
ssl.key.password<span class="o">=</span>test1234
</pre></div>
</div>
<p>Configure these settings in the application for your <code class="docutils literal"><span class="pre">StreamsConfig</span></code> instance. These settings will encrypt any
data-in-transit that is being read from or written to Kafka, and your application will authenticate itself against the
Kafka brokers that it is communicating with. Note that this example does not cover client authorization.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Code of your Java application that uses the Kafka Streams library</span>
<span class="n">Properties</span> <span class="n">settings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Properties</span><span class="o">();</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">APPLICATION_ID_CONFIG</span><span class="o">,</span> <span class="s">&quot;secure-kafka-streams-app&quot;</span><span class="o">);</span>
<span class="c1">// Where to find secure Kafka brokers. Here, it&#39;s on port 9093.</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">StreamsConfig</span><span class="o">.</span><span class="na">BOOTSTRAP_SERVERS_CONFIG</span><span class="o">,</span> <span class="s">&quot;kafka.example.com:9093&quot;</span><span class="o">);</span>
<span class="c1">//</span>
<span class="c1">// ...further non-security related settings may follow here...</span>
<span class="c1">//</span>
<span class="c1">// Security settings.</span>
<span class="c1">// 1. These settings must match the security settings of the secure Kafka cluster.</span>
<span class="c1">// 2. The SSL trust store and key store files must be locally accessible to the application.</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">CommonClientConfigs</span><span class="o">.</span><span class="na">SECURITY_PROTOCOL_CONFIG</span><span class="o">,</span> <span class="s">&quot;SSL&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_TRUSTSTORE_LOCATION_CONFIG</span><span class="o">,</span> <span class="s">&quot;/etc/security/tls/kafka.client.truststore.jks&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_TRUSTSTORE_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEYSTORE_LOCATION_CONFIG</span><span class="o">,</span> <span class="s">&quot;/etc/security/tls/kafka.client.keystore.jks&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEYSTORE_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span>
<span class="n">settings</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">SslConfigs</span><span class="o">.</span><span class="na">SSL_KEY_PASSWORD_CONFIG</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span>
<span class="n">StreamsConfig</span> <span class="n">streamsConfiguration</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StreamsConfig</span><span class="o">(</span><span class="n">settings</span><span class="o">);</span>
</pre></div>
</div>
<p>If you incorrectly configure a security setting in your application, it will fail at runtime, typically right after you
start it. For example, if you enter an incorrect password for the <code class="docutils literal"><span class="pre">ssl.keystore.password</span></code> setting, an error message
similar to this would be logged and then the application would terminate:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span><span class="c1"># Misconfigured ssl.keystore.password</span>
Exception in thread <span class="s2">&quot;main&quot;</span> org.apache.kafka.common.KafkaException: Failed to construct kafka producer
<span class="o">[</span>...snip...<span class="o">]</span>
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException:
java.io.IOException: Keystore was tampered with, or password was incorrect
<span class="o">[</span>...snip...<span class="o">]</span>
Caused by: java.security.UnrecoverableKeyException: Password verification failed
</pre></div>
</div>
<p>Monitor your Kafka Streams application log files for such error messages to spot any misconfigured applications quickly.</p>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/manage-topics" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/app-reset-tool" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@ -0,0 +1,198 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<script><!--#include virtual="../../js/templateData.js" --></script>
<script id="content-template" type="text/x-handlebars-template">
<!-- h1>Developer Guide for Kafka Streams</h1 -->
<div class="sub-nav-sticky">
<!-- div class="sticky-top">
<div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
</div>
</div -->
</div>
<div class="section" id="writing-a-streams-application">
<span id="streams-write-app"></span><h1>Writing a Streams Application<a class="headerlink" href="#writing-a-streams-application" title="Permalink to this headline"></a></h1>
<p class="topic-title first">Table of Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#libraries-and-maven-artifacts" id="id1">Libraries and Maven artifacts</a></li>
<li><a class="reference internal" href="#using-kafka-streams-within-your-application-code" id="id2">Using Kafka Streams within your application code</a></li>
</ul>
<p>Any Java application that makes use of the Kafka Streams library is considered a Kafka Streams application.
The computational logic of a Kafka Streams application is defined as a <a class="reference internal" href="../concepts.html#streams-concepts"><span class="std std-ref">processor topology</span></a>,
which is a graph of stream processors (nodes) and streams (edges).</p>
<p>You can define the processor topology with the Kafka Streams APIs:</p>
<dl class="docutils">
<dt><a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl"><span class="std std-ref">Kafka Streams DSL</span></a></dt>
<dd>A high-level API that provides provides the most common data transformation operations such as <code class="docutils literal"><span class="pre">map</span></code>, <code class="docutils literal"><span class="pre">filter</span></code>, <code class="docutils literal"><span class="pre">join</span></code>, and <code class="docutils literal"><span class="pre">aggregations</span></code> out of the box. The DSL is the recommended starting point for developers new to Kafka Streams, and should cover many use cases and stream processing needs.</dd>
<dt><a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a></dt>
<dd>A low-level API that lets you add and connect processors as well as interact directly with state stores. The Processor API provides you with even more flexibility than the DSL but at the expense of requiring more manual work on the side of the application developer (e.g., more lines of code).</dd>
</dl>
<div class="section" id="using-kafka-streams-within-your-application-code">
<h2>Using Kafka Streams within your application code<a class="headerlink" href="#using-kafka-streams-within-your-application-code" title="Permalink to this headline"></a></h2>
<p>You can call Kafka Streams from anywhere in your application code, but usually these calls are made within the <code class="docutils literal"><span class="pre">main()</span></code> method of
your application, or some variant thereof. The basic elements of defining a processing topology within your application
are described below.</p>
<p>First, you must create an instance of <code class="docutils literal"><span class="pre">KafkaStreams</span></code>.</p>
<ul class="simple">
<li>The first argument of the <code class="docutils literal"><span class="pre">KafkaStreams</span></code> constructor takes a topology (either <code class="docutils literal"><span class="pre">StreamsBuilder#build()</span></code> for the
<a class="reference internal" href="dsl-api.html#streams-developer-guide-dsl"><span class="std std-ref">DSL</span></a> or <code class="docutils literal"><span class="pre">Topology</span></code> for the
<a class="reference internal" href="processor-api.html#streams-developer-guide-processor-api"><span class="std std-ref">Processor API</span></a>) that is used to define a topology.</li>
<li>The second argument is an instance of <code class="docutils literal"><span class="pre">StreamsConfig</span></code>, which defines the configuration for this specific topology.</li>
</ul>
<p>Code example:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.kafka.streams.KafkaStreams</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.StreamsConfig</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.kstream.StreamsBuilder</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">org.apache.kafka.streams.processor.Topology</span><span class="o">;</span>
<span class="c1">// Use the builders to define the actual processing topology, e.g. to specify</span>
<span class="c1">// from which input topics to read, which stream operations (filter, map, etc.)</span>
<span class="c1">// should be called, and so on. We will cover this in detail in the subsequent</span>
<span class="c1">// sections of this Developer Guide.</span>
<span class="n">StreamsBuilder</span> <span class="n">builder</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// when using the DSL</span>
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="n">builder</span><span class="o">.</span><span class="na">build</span><span class="o">();</span>
<span class="c1">//</span>
<span class="c1">// OR</span>
<span class="c1">//</span>
<span class="n">Topology</span> <span class="n">topology</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// when using the Processor API</span>
<span class="c1">// Use the configuration to tell your application where the Kafka cluster is,</span>
<span class="c1">// which Serializers/Deserializers to use by default, to specify security settings,</span>
<span class="c1">// and so on.</span>
<span class="n">StreamsConfig</span> <span class="n">config</span> <span class="o">=</span> <span class="o">...;</span>
<span class="n">KafkaStreams</span> <span class="n">streams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KafkaStreams</span><span class="o">(</span><span class="n">topology</span><span class="o">,</span> <span class="n">config</span><span class="o">);</span>
</pre></div>
</div>
<p>At this point, internal structures are initialized, but the processing is not started yet.
You have to explicitly start the Kafka Streams thread by calling the <code class="docutils literal"><span class="pre">KafkaStreams#start()</span></code> method:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Start the Kafka Streams threads</span>
<span class="n">streams</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
</pre></div>
</div>
<p>If there are other instances of this stream processing application running elsewhere (e.g., on another machine), Kafka
Streams transparently re-assigns tasks from the existing instances to the new instance that you just started.
For more information, see <a class="reference internal" href="../architecture.html#streams-architecture-tasks"><span class="std std-ref">Stream Partitions and Tasks</span></a> and <a class="reference internal" href="../architecture.html#streams-architecture-threads"><span class="std std-ref">Threading Model</span></a>.</p>
<p>To catch any unexpected exceptions, you can set an <code class="docutils literal"><span class="pre">java.lang.Thread.UncaughtExceptionHandler</span></code> before you start the
application. This handler is called whenever a stream thread is terminated by an unexpected exception:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Java 8+, using lambda expressions</span>
<span class="n">streams</span><span class="o">.</span><span class="na">setUncaughtExceptionHandler</span><span class="o">((</span><span class="n">Thread</span> <span class="n">thread</span><span class="o">,</span> <span class="n">Throwable</span> <span class="n">throwable</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="o">{</span>
<span class="c1">// here you should examine the throwable/exception and perform an appropriate action!</span>
<span class="o">});</span>
<span class="c1">// Java 7</span>
<span class="n">streams</span><span class="o">.</span><span class="na">setUncaughtExceptionHandler</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">.</span><span class="na">UncaughtExceptionHandler</span><span class="o">()</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">uncaughtException</span><span class="o">(</span><span class="n">Thread</span> <span class="n">thread</span><span class="o">,</span> <span class="n">Throwable</span> <span class="n">throwable</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// here you should examine the throwable/exception and perform an appropriate action!</span>
<span class="o">}</span>
<span class="o">});</span>
</pre></div>
</div>
<p>To stop the application instance, call the <code class="docutils literal"><span class="pre">KafkaStreams#close()</span></code> method:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Stop the Kafka Streams threads</span>
<span class="n">streams</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
</pre></div>
</div>
<p>To allow your application to gracefully shutdown in response to SIGTERM, it is recommended that you add a shutdown hook
and call <code class="docutils literal"><span class="pre">KafkaStreams#close</span></code>.</p>
<ul>
<li><p class="first">Here is a shutdown hook example in Java 8+:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Add shutdown hook to stop the Kafka Streams threads.</span>
<span class="c1">// You can optionally provide a timeout to `close`.</span>
<span class="n">Runtime</span><span class="o">.</span><span class="na">getRuntime</span><span class="o">().</span><span class="na">addShutdownHook</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">(</span><span class="n">streams</span><span class="o">::</span><span class="n">close</span><span class="o">));</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p class="first">Here is a shutdown hook example in Java 7:</p>
<blockquote>
<div><div class="highlight-java"><div class="highlight"><pre><span></span><span class="c1">// Add shutdown hook to stop the Kafka Streams threads.</span>
<span class="c1">// You can optionally provide a timeout to `close`.</span>
<span class="n">Runtime</span><span class="o">.</span><span class="na">getRuntime</span><span class="o">().</span><span class="na">addShutdownHook</span><span class="o">(</span><span class="k">new</span> <span class="n">Thread</span><span class="o">(</span><span class="k">new</span> <span class="n">Runnable</span><span class="o">()</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">run</span><span class="o">()</span> <span class="o">{</span>
<span class="n">streams</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}));</span>
</pre></div>
</div>
</div></blockquote>
</li>
</ul>
<p>After an application is stopped, Kafka Streams will migrate any tasks that had been running in this instance to available remaining
instances.</p>
</div>
</div>
</div>
</div>
<div class="pagination">
<a href="/{{version}}/documentation/streams/developer-guide/" class="pagination__btn pagination__btn__prev">Previous</a>
<a href="/{{version}}/documentation/streams/developer-guide/config-streams" class="pagination__btn pagination__btn__next">Next</a>
</div>
</script>
<!--#include virtual="../../../includes/_header.htm" -->
<!--#include virtual="../../../includes/_top.htm" -->
<div class="content documentation documentation--current">
<!--#include virtual="../../../includes/_nav.htm" -->
<div class="right">
<!--#include virtual="../../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
<li><a href="/documentation/streams/developer-guide/">Developer Guide</a></li>
</ul>
<div class="p-content"></div>
</div>
</div>
<!--#include virtual="../../../includes/_footer.htm" -->
<script>
$(function() {
// Show selected style on nav item
$('.b-nav__streams').addClass('selected');
//sticky secondary nav
var $navbar = $(".sub-nav-sticky"),
y_pos = $navbar.offset().top,
height = $navbar.height();
$(window).scroll(function() {
var scrollTop = $(window).scrollTop();
if (scrollTop > y_pos - height) {
$navbar.addClass("navbar-fixed")
} else if (scrollTop <= y_pos) {
$navbar.removeClass("navbar-fixed")
}
});
// Display docs subnav items
$('.b-nav__docs').parent().toggleClass('nav__item__with__subs--expanded');
});
</script>

View File

@ -16,12 +16,12 @@
<!--#include virtual="../js/templateData.js" -->
</script>
<script id="streams-template" type="text/x-handlebars-template">
<h1>Kafka Streams API</h1>
<h1>Kafka Streams</h1>
<div class="sub-nav-sticky">
<div class="sticky-top">
<div style="height:35px">
<a class="active-menu-item" href="/{{version}}/documentation/streams/">Introduction</a>
<a href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
@ -300,6 +300,7 @@
<!--#include virtual="../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a>
<li><a href="/documentation/streams">Kafka Streams</a></li>
</li>
</ul>
<div class="p-streams"></div>

View File

@ -23,7 +23,7 @@
<div class="sticky-top">
<div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
@ -354,7 +354,7 @@ Looking beyond the scope of this concrete example, what Kafka Streams is doing h
<!--#include virtual="../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Streams</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
</ul>
<div class="p-content"></div>
</div>

View File

@ -22,7 +22,7 @@
<div class="sticky-top">
<div style="height:35px">
<a href="/{{version}}/documentation/streams/">Introduction</a>
<a href="/{{version}}/documentation/streams/developer-guide">Developer Guide</a>
<a href="/{{version}}/documentation/streams/developer-guide/">Developer Guide</a>
<a href="/{{version}}/documentation/streams/core-concepts">Concepts</a>
<a href="/{{version}}/documentation/streams/quickstart">Run Demo App</a>
<a class="active-menu-item" href="/{{version}}/documentation/streams/tutorial">Tutorial: Write App</a>
@ -631,7 +631,7 @@
<!--#include virtual="../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Streams</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
</ul>
<div class="p-content"></div>
</div>

View File

@ -373,7 +373,7 @@
<!--#include virtual="../../includes/_docs_banner.htm" -->
<ul class="breadcrumbs">
<li><a href="/documentation">Documentation</a></li>
<li><a href="/documentation/streams">Kafka Streams API</a></li>
<li><a href="/documentation/streams">Kafka Streams</a></li>
</ul>
<div class="p-content"></div>
</div>