mirror of https://github.com/apache/kafka.git
MINOR: stateful docs for aggregates
Author: Eno Thereska <eno.thereska@gmail.com> Reviewers: Damian Guy <damian.guy@gmail.com> Closes #3730 from enothereska/minor-docs-aggregates
This commit is contained in:
parent
949577ca77
commit
bd54d2e3e0
Binary file not shown.
After Width: | Height: | Size: 120 KiB |
|
@ -565,7 +565,7 @@ Note that in the <code>WordCountProcessor</code> implementation, users need to r
|
||||||
<li>You must specificy serdes explicitly if the key and/or value types of the records in the Kafka input topic do not
|
<li>You must specificy serdes explicitly if the key and/or value types of the records in the Kafka input topic do not
|
||||||
match the configured default serdes.</li>
|
match the configured default serdes.</li>
|
||||||
</ul>
|
</ul>
|
||||||
Several variants of <code>globalTable<code> exist to e.g. specify explicit serdes.
|
Several variants of <code>globalTable</code> exist to e.g. specify explicit serdes.
|
||||||
|
|
||||||
</td>
|
</td>
|
||||||
</tbody>
|
</tbody>
|
||||||
|
@ -575,7 +575,7 @@ Note that in the <code>WordCountProcessor</code> implementation, users need to r
|
||||||
<p>
|
<p>
|
||||||
<code>KStream</code> and <code>KTable</code> support a variety of transformation operations. Each of these operations
|
<code>KStream</code> and <code>KTable</code> support a variety of transformation operations. Each of these operations
|
||||||
can be translated into one or more connected processors into the underlying processor topology. Since <code>KStream</code>
|
can be translated into one or more connected processors into the underlying processor topology. Since <code>KStream</code>
|
||||||
and <code>KTable</code> are strongly typed, all these transformation operations are defined as generics functions where
|
and <code>KTable</code> are strongly typed, all these transformation operations are defined as generic functions where
|
||||||
users could specify the input and output data types.
|
users could specify the input and output data types.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
|
@ -1014,13 +1014,819 @@ Note that in the <code>WordCountProcessor</code> implementation, users need to r
|
||||||
|
|
||||||
|
|
||||||
<h5><a id="streams_dsl_transformations_stateful" href="#streams_dsl_transformations_stateful">Stateful transformations</a></h5>
|
<h5><a id="streams_dsl_transformations_stateful" href="#streams_dsl_transformations_stateful">Stateful transformations</a></h5>
|
||||||
Stateless transformations, by definition, do not depend on any state for processing, and hence implementation-wise they do not
|
<h6><a id="streams_dsl_transformations_stateful_overview" href="#streams_dsl_transformations_stateful_overview">Overview</a></h6>
|
||||||
require a state store associated with the stream processor.Stateful transformations, by definition, depend on state for processing
|
<p>
|
||||||
inputs and producing outputs, and hence implementation-wise they require a state store associated with the stream processor. For
|
Stateful transformations, by definition, depend on state for processing inputs and producing outputs, and
|
||||||
example, in aggregating operations, a windowing state store is used to store the latest aggregation results per window; in join
|
hence implementation-wise they require a state store associated with the stream processor. For example,
|
||||||
operations, a windowing state store is used to store all the records received so far within the defined window boundary.
|
in aggregating operations, a windowing state store is used to store the latest aggregation results per window;
|
||||||
|
in join operations, a windowing state store is used to store all the records received so far within the
|
||||||
|
defined window boundary.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Note, that state stores are fault-tolerant. In case of failure, Kafka Streams guarantees to fully restore
|
||||||
|
all state stores prior to resuming the processing.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Available stateful transformations in the DSL include:
|
||||||
|
<ul>
|
||||||
|
<li><a href=#streams_dsl_aggregations>Aggregating</a></li>
|
||||||
|
<li><a href="#streams_dsl_joins">Joining</a></li>
|
||||||
|
<li><a href="#streams_dsl_windowing">Windowing (as part of aggregations and joins)</a></li>
|
||||||
|
<li>Applying custom processors and transformers, which may be stateful, for Processor API integration</li>
|
||||||
|
</ul>
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The following diagram shows their relationships:
|
||||||
|
</p>
|
||||||
|
<figure>
|
||||||
|
<img class="centered" src="/{{version}}/images/streams-stateful_operations.png" style="width:500pt;">
|
||||||
|
<figcaption style="text-align: center;"><i>Stateful transformations in the DSL</i></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
We will discuss the various stateful transformations in detail in the subsequent sections. However, let's start
|
||||||
|
with a first example of a stateful application: the canonical WordCount algorithm.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
WordCount example in Java 8+, using lambda expressions:
|
||||||
|
</p>
|
||||||
|
<pre class="brush: java;">
|
||||||
|
// We assume record values represent lines of text. For the sake of this example, we ignore
|
||||||
|
// whatever may be stored in the record keys.
|
||||||
|
KStream<String, String> textLines = ...;
|
||||||
|
|
||||||
|
KStream<String, Long> wordCounts = textLines
|
||||||
|
// Split each text line, by whitespace, into words. The text lines are the record
|
||||||
|
// values, i.e. we can ignore whatever data is in the record keys and thus invoke
|
||||||
|
// `flatMapValues` instead of the more generic `flatMap`.
|
||||||
|
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
|
||||||
|
// Group the stream by word to ensure the key of the record is the word.
|
||||||
|
.groupBy((key, word) -> word)
|
||||||
|
// Count the occurrences of each word (record key).
|
||||||
|
//
|
||||||
|
// This will change the stream type from `KGroupedStream<String, String>` to
|
||||||
|
// `KTable<String, Long>` (word -> count). We must provide a name for
|
||||||
|
// the resulting KTable, which will be used to name e.g. its associated
|
||||||
|
// state store and changelog topic.
|
||||||
|
.count("Counts")
|
||||||
|
// Convert the `KTable<String, Long>` into a `KStream<String, Long>`.
|
||||||
|
.toStream();
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
WordCount example in Java 7:
|
||||||
|
</p>
|
||||||
|
<pre class="brush: java;">
|
||||||
|
// Code below is equivalent to the previous Java 8+ example above.
|
||||||
|
KStream<String, String> textLines = ...;
|
||||||
|
|
||||||
|
KStream<String, Long> wordCounts = textLines
|
||||||
|
.flatMapValues(new ValueMapper<String, Iterable<String>>() {
|
||||||
|
@Override
|
||||||
|
public Iterable<String> apply(String value) {
|
||||||
|
return Arrays.asList(value.toLowerCase().split("\\W+"));
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.groupBy(new KeyValueMapper<String, String, String>>() {
|
||||||
|
@Override
|
||||||
|
public String apply(String key, String word) {
|
||||||
|
return word;
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.count("Counts")
|
||||||
|
.toStream();
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<h6><a id="streams_dsl_aggregations" href="#streams_dsl_aggregations">Aggregate a stream</a></h6>
|
||||||
|
<p>
|
||||||
|
Once records are grouped by key via <code>groupByKey</code> or <code>groupBy</code> -- and
|
||||||
|
thus represented as either a <code>KGroupedStream</code> or a
|
||||||
|
<code>KGroupedTable</code> -- they can be aggregated via an operation such as
|
||||||
|
<code>reduce</code>. Aggregations are <i>key-based</i> operations, i.e.
|
||||||
|
they always operate over records (notably record values) <i>of the same key</i>. You may
|
||||||
|
choose to perform aggregations on
|
||||||
|
<a href="#streams_dsl_windowing">windowed</a> or non-windowed data.
|
||||||
|
</p>
|
||||||
|
<table class="data-table" border="1">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<th>Transformation</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><b>Aggregate</b>: <code>KGroupedStream → KTable</code> or <code>KGroupedTable
|
||||||
|
→ KTable</code></td>
|
||||||
|
<td>
|
||||||
|
<p>
|
||||||
|
<b>Rolling aggregation</b>. Aggregates the values of (non-windowed) records by
|
||||||
|
the grouped key. Aggregating is a generalization of <code>reduce</code> and allows, for example, the
|
||||||
|
aggregate value to have a different type than the input values.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
When aggregating a grouped stream, you must provide an initializer (think:
|
||||||
|
<code>aggValue = 0</code>) and an "adder"
|
||||||
|
aggregator (think: <code>aggValue + curValue</code>). When aggregating a <i>grouped</i>
|
||||||
|
table, you must additionally provide a "subtractor" aggregator (think: <code>aggValue - oldValue</code>).
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Several variants of <code>aggregate</code> exist, see Javadocs for details.
|
||||||
|
</p>
|
||||||
|
<pre class="brush: java;">
|
||||||
|
KGroupedStream<byte[], String> groupedStream = ...;
|
||||||
|
KGroupedTable<byte[], String> groupedTable = ...;
|
||||||
|
|
||||||
|
// Java 8+ examples, using lambda expressions
|
||||||
|
|
||||||
|
// Aggregating a KGroupedStream (note how the value type changes from String to Long)
|
||||||
|
KTable<byte[], Long> aggregatedStream = groupedStream.aggregate(
|
||||||
|
() -> 0L, /* initializer */
|
||||||
|
(aggKey, newValue, aggValue) -> aggValue + newValue.length(), /* adder */
|
||||||
|
Serdes.Long(), /* serde for aggregate value */
|
||||||
|
"aggregated-stream-store" /* state store name */);
|
||||||
|
|
||||||
|
// Aggregating a KGroupedTable (note how the value type changes from String to Long)
|
||||||
|
KTable<byte[], Long> aggregatedTable = groupedTable.aggregate(
|
||||||
|
() -> 0L, /* initializer */
|
||||||
|
(aggKey, newValue, aggValue) -> aggValue + newValue.length(), /* adder */
|
||||||
|
(aggKey, oldValue, aggValue) -> aggValue - oldValue.length(), /* subtractor */
|
||||||
|
Serdes.Long(), /* serde for aggregate value */
|
||||||
|
"aggregated-table-store" /* state store name */);
|
||||||
|
|
||||||
|
|
||||||
|
// Java 7 examples
|
||||||
|
|
||||||
|
// Aggregating a KGroupedStream (note how the value type changes from String to Long)
|
||||||
|
KTable<byte[], Long> aggregatedStream = groupedStream.aggregate(
|
||||||
|
new Initializer<Long>() { /* initializer */
|
||||||
|
@Override
|
||||||
|
public Long apply() {
|
||||||
|
return 0L;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
new Aggregator<byte[], String, Long>() { /* adder */
|
||||||
|
@Override
|
||||||
|
public Long apply(byte[] aggKey, String newValue, Long aggValue) {
|
||||||
|
return aggValue + newValue.length();
|
||||||
|
}
|
||||||
|
},
|
||||||
|
Serdes.Long(),
|
||||||
|
"aggregated-stream-store");
|
||||||
|
|
||||||
|
// Aggregating a KGroupedTable (note how the value type changes from String to Long)
|
||||||
|
KTable<byte[], Long> aggregatedTable = groupedTable.aggregate(
|
||||||
|
new Initializer<Long>() { /* initializer */
|
||||||
|
@Override
|
||||||
|
public Long apply() {
|
||||||
|
return 0L;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
new Aggregator<byte[], String, Long>() { /* adder */
|
||||||
|
@Override
|
||||||
|
public Long apply(byte[] aggKey, String newValue, Long aggValue) {
|
||||||
|
return aggValue + newValue.length();
|
||||||
|
}
|
||||||
|
},
|
||||||
|
new Aggregator<byte[], String, Long>() { /* subtractor */
|
||||||
|
@Override
|
||||||
|
public Long apply(byte[] aggKey, String oldValue, Long aggValue) {
|
||||||
|
return aggValue - oldValue.length();
|
||||||
|
}
|
||||||
|
},
|
||||||
|
Serdes.Long(),
|
||||||
|
"aggregated-table-store");
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
Detailed behavior of <code>KGroupedStream</code>:
|
||||||
|
</p>
|
||||||
|
<ul>
|
||||||
|
<li>Input records with <code>null</code> keys are ignored in general.</li>
|
||||||
|
<li>When a record key is received for the first time, the initializer is called
|
||||||
|
(and called before the adder).</li>
|
||||||
|
<li>Whenever a record with a non-null value is received, the adder is called.</li>
|
||||||
|
</ul>
|
||||||
|
<p>
|
||||||
|
Detailed behavior of KGroupedTable:
|
||||||
|
</p>
|
||||||
|
<ul>
|
||||||
|
<li>Input records with null keys are ignored in general.</li>
|
||||||
|
<li>When a record key is received for the first time, the initializer is called
|
||||||
|
(and called before the adder and subtractor). Note that, in contrast to <code>KGroupedStream</code>, over
|
||||||
|
time the initializer may be called more
|
||||||
|
than once for a key as a result of having received input tombstone records
|
||||||
|
for that key (see below).</li>
|
||||||
|
<li>When the first non-<code>null</code> value is received for a key (think:
|
||||||
|
INSERT), then only the adder is called.</li>
|
||||||
|
<li>When subsequent non-<code>null</code> values are received for a key (think:
|
||||||
|
UPDATE), then (1) the subtractor is called
|
||||||
|
with the old value as stored in the table and (2) the adder is called with
|
||||||
|
the new value of the input record
|
||||||
|
that was just received. The order of execution for the subtractor and adder
|
||||||
|
is not defined.</li>
|
||||||
|
<li>When a tombstone record -- i.e. a record with a <code>null</code> value -- is
|
||||||
|
received for a key (think: DELETE), then
|
||||||
|
only the subtractor is called. Note that, whenever the subtractor returns a
|
||||||
|
<code>null</code> value itself, then the
|
||||||
|
corresponding key is removed from the resulting KTable. If that happens, any
|
||||||
|
next input record for that key will trigger the initializer again.</li>
|
||||||
|
</ul>
|
||||||
|
<p>
|
||||||
|
See the example at the bottom of this section for a visualization of the
|
||||||
|
aggregation semantics.
|
||||||
|
</p>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><b>Aggregate (windowed)</b>: <code>KGroupedStream → KTable</code></td>
|
||||||
|
<td>
|
||||||
|
<p>
|
||||||
|
<b>Windowed aggregation</b>. Aggregates the values of records, per window, by
|
||||||
|
the grouped key. Aggregating is a generalization of
|
||||||
|
<code>reduce</code> and allows, for example, the aggregate value to have a
|
||||||
|
different type than the input values.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
You must provide an initializer (think: <code>aggValue = 0</code>), "adder"
|
||||||
|
aggregator (think: <code>aggValue + curValue</code>),
|
||||||
|
and a window. When windowing based on sessions, you must additionally provide a
|
||||||
|
"session merger" aggregator (think:
|
||||||
|
<code>mergedAggValue = leftAggValue + rightAggValue</code>).
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The windowed <code>aggregate</code> turns a <code>KGroupedStream
|
||||||
|
<K , V></code> into a windowed <code>KTable<Windowed<K>, V></code>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Several variants of <code>aggregate</code> exist, see Javadocs for details.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<pre class="brush: java;">
|
||||||
|
import java.util.concurrent.TimeUnit;
|
||||||
|
KGroupedStream<String, Long> groupedStream = ...;
|
||||||
|
|
||||||
|
// Java 8+ examples, using lambda expressions
|
||||||
|
|
||||||
|
// Aggregating with time-based windowing (here: with 5-minute tumbling windows)
|
||||||
|
KTable<Windowed<String>, Long> timeWindowedAggregatedStream = groupedStream.aggregate(
|
||||||
|
() -> 0L, /* initializer */
|
||||||
|
(aggKey, newValue, aggValue) -> aggValue + newValue, /* adder */
|
||||||
|
TimeWindows.of(TimeUnit.MINUTES.toMillis(5)), /* time-based window */
|
||||||
|
Serdes.Long(), /* serde for aggregate value */
|
||||||
|
"time-windowed-aggregated-stream-store" /* state store name */);
|
||||||
|
|
||||||
|
// Aggregating with session-based windowing (here: with an inactivity gap of 5 minutes)
|
||||||
|
KTable<Windowed<String>, Long> sessionizedAggregatedStream = groupedStream.aggregate(
|
||||||
|
() -> 0L, /* initializer */
|
||||||
|
(aggKey, newValue, aggValue) -> aggValue + newValue, /* adder */
|
||||||
|
(aggKey, leftAggValue, rightAggValue) -> leftAggValue + rightAggValue, /* session merger */
|
||||||
|
SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), /* session window */
|
||||||
|
Serdes.Long(), /* serde for aggregate value */
|
||||||
|
"sessionized-aggregated-stream-store" /* state store name */);
|
||||||
|
|
||||||
|
// Java 7 examples
|
||||||
|
|
||||||
|
// Aggregating with time-based windowing (here: with 5-minute tumbling windows)
|
||||||
|
KTable<Windowed<String>, Long> timeWindowedAggregatedStream = groupedStream.aggregate(
|
||||||
|
new Initializer<Long>() { /* initializer */
|
||||||
|
@Override
|
||||||
|
public Long apply() {
|
||||||
|
return 0L;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
new Aggregator<String, Long, Long>() { /* adder */
|
||||||
|
@Override
|
||||||
|
public Long apply(String aggKey, Long newValue, Long aggValue) {
|
||||||
|
return aggValue + newValue;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
TimeWindows.of(TimeUnit.MINUTES.toMillis(5)), /* time-based window */
|
||||||
|
Serdes.Long(), /* serde for aggregate value */
|
||||||
|
"time-windowed-aggregated-stream-store" /* state store name */);
|
||||||
|
|
||||||
|
// Aggregating with session-based windowing (here: with an inactivity gap of 5 minutes)
|
||||||
|
KTable<Windowed<String>, Long> sessionizedAggregatedStream = groupedStream.aggregate(
|
||||||
|
new Initializer<Long>() { /* initializer */
|
||||||
|
@Override
|
||||||
|
public Long apply() {
|
||||||
|
return 0L;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
new Aggregator<String, Long, Long>() { /* adder */
|
||||||
|
@Override
|
||||||
|
public Long apply(String aggKey, Long newValue, Long aggValue) {
|
||||||
|
return aggValue + newValue;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
new Merger<String, Long>() { /* session merger */
|
||||||
|
@Override
|
||||||
|
public Long apply(String aggKey, Long leftAggValue, Long rightAggValue) {
|
||||||
|
return rightAggValue + leftAggValue;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), /* session window */
|
||||||
|
Serdes.Long(), /* serde for aggregate value */
|
||||||
|
"sessionized-aggregated-stream-store" /* state store name */);
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Detailed behavior:
|
||||||
|
</p>
|
||||||
|
<ul>
|
||||||
|
<li>The windowed aggregate behaves similar to the rolling aggregate described
|
||||||
|
above. The additional twist is that the behavior applies per window.</li>
|
||||||
|
<li>Input records with <code>null</code> keys are ignored in general.</li>
|
||||||
|
<li>When a record key is received for the first time for a given window, the
|
||||||
|
initializer is called (and called before the adder).</li>
|
||||||
|
<li>Whenever a record with a non-<code>null</code> value is received for a given window, the
|
||||||
|
adder is called.
|
||||||
|
(Note: As a result of a known bug in Kafka 0.11.0.0, the adder is currently
|
||||||
|
also called for <code>null</code> values. You can work around this, for example, by
|
||||||
|
manually filtering out <code>null</code> values prior to grouping the stream.)</li>
|
||||||
|
<li>When using session windows: the session merger is called whenever two
|
||||||
|
sessions are being merged.</li>
|
||||||
|
</ul>
|
||||||
|
<p>
|
||||||
|
See the example at the bottom of this section for a visualization of the aggregation semantics.
|
||||||
|
</p>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><b>Count</b>: <code>KGroupedStream → KTable or KGroupedTable → KTable</code></td>
|
||||||
|
<td>
|
||||||
|
<p>
|
||||||
|
<b>Rolling aggregation</b>. Counts the number of records by the grouped key.
|
||||||
|
Several variants of <code>count</code> exist, see Javadocs for details.
|
||||||
|
</p>
|
||||||
|
<pre class="brush: java;">
|
||||||
|
KGroupedStream<String, Long> groupedStream = ...;
|
||||||
|
KGroupedTable<String, Long> groupedTable = ...;
|
||||||
|
|
||||||
|
// Counting a KGroupedStream
|
||||||
|
KTable<String, Long> aggregatedStream = groupedStream.count(
|
||||||
|
"counted-stream-store" /* state store name */);
|
||||||
|
|
||||||
|
// Counting a KGroupedTable
|
||||||
|
KTable<String, Long> aggregatedTable = groupedTable.count(
|
||||||
|
"counted-table-store" /* state store name */);
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
Detailed behavior for <code>KGroupedStream</code>:
|
||||||
|
</p>
|
||||||
|
<ul>
|
||||||
|
<li>Input records with null keys or values are ignored.</li>
|
||||||
|
</ul>
|
||||||
|
<p>
|
||||||
|
Detailed behavior for <code>KGroupedTable</code>:
|
||||||
|
</p>
|
||||||
|
<ul>
|
||||||
|
<li>Input records with <code>null</code> keys are ignored. Records with <code>null</code>
|
||||||
|
values are not ignored but interpreted as "tombstones" for the corresponding key, which
|
||||||
|
indicate the deletion of the key from the table.</li>
|
||||||
|
</ul>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><b>Count (Windowed)</b>: <code>KGroupedStream → KTable</code></td>
|
||||||
|
<td>
|
||||||
|
<p>
|
||||||
|
Windowed aggregation. Counts the number of records, per window, by the grouped key.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The windowed <code>count</code> turns a <code>KGroupedStream<<K, V></code> into a windowed <code>KTable<Windowed<K>, V></code>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Several variants of count exist, see Javadocs for details.
|
||||||
|
</p>
|
||||||
|
<pre class="brush: java;">
|
||||||
|
import java.util.concurrent.TimeUnit;
|
||||||
|
KGroupedStream<String, Long> groupedStream = ...;
|
||||||
|
|
||||||
|
// Counting a KGroupedStream with time-based windowing (here: with 5-minute tumbling windows)
|
||||||
|
KTable<Windowed<String>, Long> aggregatedStream = groupedStream.count(
|
||||||
|
TimeWindows.of(TimeUnit.MINUTES.toMillis(5)), /* time-based window */
|
||||||
|
"time-windowed-counted-stream-store" /* state store name */);
|
||||||
|
|
||||||
|
// Counting a KGroupedStream with session-based windowing (here: with 5-minute inactivity gaps)
|
||||||
|
KTable<Windowed<String>, Long> aggregatedStream = groupedStream.count(
|
||||||
|
SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), /* session window */
|
||||||
|
"sessionized-counted-stream-store" /* state store name */);
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
Detailed behavior:
|
||||||
|
</p>
|
||||||
|
<ul>
|
||||||
|
<li>Input records with <code>null</code> keys or values are ignored. (Note: As a result of a known bug in Kafka 0.11.0.0,
|
||||||
|
records with <code>null</code> values are not ignored yet. You can work around this, for example, by manually
|
||||||
|
filtering out <code>null</code> values prior to grouping the stream.)</li>
|
||||||
|
</ul>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><b>Reduce</b>: <code>KGroupedStream → KTable or KGroupedTable → KTable</code></td>
|
||||||
|
<td>
|
||||||
|
<p>
|
||||||
|
<b>Rolling aggregation</b>. Combines the values of (non-windowed) records by the grouped key. The current record value is
|
||||||
|
combined with the last reduced value, and a new reduced value is returned. The result value type cannot be changed,
|
||||||
|
unlike <code>aggregate</code>.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
When reducing a grouped stream, you must provide an "adder" reducer (think: <code>aggValue + curValue</code>).
|
||||||
|
When reducing a grouped table, you must additionally provide a "subtractor" reducer (think: <code>aggValue - oldValue</code>).
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Several variants of <code>reduce</code> exist, see Javadocs for details.
|
||||||
|
</p>
|
||||||
|
<pre class="brush: java;">
|
||||||
|
KGroupedStream<String, Long> groupedStream = ...;
|
||||||
|
KGroupedTable<String, Long> groupedTable = ...;
|
||||||
|
|
||||||
|
// Java 8+ examples, using lambda expressions
|
||||||
|
|
||||||
|
// Reducing a KGroupedStream
|
||||||
|
KTable<String, Long> aggregatedStream = groupedStream.reduce(
|
||||||
|
(aggValue, newValue) -> aggValue + newValue, /* adder */
|
||||||
|
"reduced-stream-store" /* state store name */);
|
||||||
|
|
||||||
|
// Reducing a KGroupedTable
|
||||||
|
KTable<String, Long> aggregatedTable = groupedTable.reduce(
|
||||||
|
(aggValue, newValue) -> aggValue + newValue, /* adder */
|
||||||
|
(aggValue, oldValue) -> aggValue - oldValue, /* subtractor */
|
||||||
|
"reduced-table-store" /* state store name */);
|
||||||
|
|
||||||
|
|
||||||
|
// Java 7 examples
|
||||||
|
|
||||||
|
// Reducing a KGroupedStream
|
||||||
|
KTable<String, Long> aggregatedStream = groupedStream.reduce(
|
||||||
|
new Reducer<Long>() { /* adder */
|
||||||
|
@Override
|
||||||
|
public Long apply(Long aggValue, Long newValue) {
|
||||||
|
return aggValue + newValue;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"reduced-stream-store" /* state store name */);
|
||||||
|
|
||||||
|
// Reducing a KGroupedTable
|
||||||
|
KTable<String, Long> aggregatedTable = groupedTable.reduce(
|
||||||
|
new Reducer<Long>() { /* adder */
|
||||||
|
@Override
|
||||||
|
public Long apply(Long aggValue, Long newValue) {
|
||||||
|
return aggValue + newValue;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
new Reducer<Long>() { /* subtractor */
|
||||||
|
@Override
|
||||||
|
public Long apply(Long aggValue, Long oldValue) {
|
||||||
|
return aggValue - oldValue;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"reduced-table-store" /* state store name */);
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
Detailed behavior for <code>KGroupedStream</code>:
|
||||||
|
</p>
|
||||||
|
<ul>
|
||||||
|
<li>Input records with <code>null</code> keys are ignored in general.</li>
|
||||||
|
<li>When a record key is received for the first time, then the value of that
|
||||||
|
record is used as the initial aggregate value.</li>
|
||||||
|
<li>Whenever a record with a non-<code>null</code> value is received, the adder is called.</li>
|
||||||
|
</ul>
|
||||||
|
<p>
|
||||||
|
Detailed behavior for <code>KGroupedTable</code>:
|
||||||
|
</p>
|
||||||
|
<ul>
|
||||||
|
<li>Input records with null keys are ignored in general.</li>
|
||||||
|
<li>When a record key is received for the first time, then the value of that
|
||||||
|
record is used as the initial aggregate value.
|
||||||
|
Note that, in contrast to KGroupedStream, over time this initialization step
|
||||||
|
may happen more than once for a key as a
|
||||||
|
result of having received input tombstone records for that key (see below).</li>
|
||||||
|
<li>When the first non-<code>null</code> value is received for a key (think: INSERT), then
|
||||||
|
only the adder is called.</li>
|
||||||
|
<li>When subsequent non-<code>null</code> values are received for a key (think: UPDATE), then
|
||||||
|
(1) the subtractor is called with the
|
||||||
|
old value as stored in the table and (2) the adder is called with the new
|
||||||
|
value of the input record that was just received.
|
||||||
|
The order of execution for the subtractor and adder is not defined.</li>
|
||||||
|
<li>When a tombstone record -- i.e. a record with a <code>null</code> value -- is received
|
||||||
|
for a key (think: DELETE), then only the
|
||||||
|
subtractor is called. Note that, whenever the subtractor returns a <code>null</code>
|
||||||
|
value itself, then the corresponding key
|
||||||
|
is removed from the resulting KTable. If that happens, any next input
|
||||||
|
record for that key will re-initialize its aggregate value.</li>
|
||||||
|
</ul>
|
||||||
|
<p>
|
||||||
|
See the example at the bottom of this section for a visualization of the
|
||||||
|
aggregation semantics.
|
||||||
|
<p>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><b>Reduce (windowed)</b>: <code>KGroupedStream → KTable</code></td>
|
||||||
|
<td>
|
||||||
|
<p>
|
||||||
|
Windowed aggregation. Combines the values of records, per window, by the grouped key. The current record value
|
||||||
|
is combined with the last reduced value, and a new reduced value is returned. Records with null key or value are
|
||||||
|
ignored. The result value type cannot be changed, unlike aggregate. (KGroupedStream details)
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The windowed reduce turns a <code>KGroupedStream<K, V></code> into a windowed <code>KTable<Windowed<K>, V></code>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Several variants of reduce exist, see Javadocs for details.
|
||||||
|
</p>
|
||||||
|
<pre class="brush: java;">
|
||||||
|
import java.util.concurrent.TimeUnit;
|
||||||
|
KGroupedStream<String, Long> groupedStream = ...;
|
||||||
|
|
||||||
|
// Java 8+ examples, using lambda expressions
|
||||||
|
|
||||||
|
// Aggregating with time-based windowing (here: with 5-minute tumbling windows)
|
||||||
|
KTable<Windowed<String>, Long> timeWindowedAggregatedStream = groupedStream.reduce(
|
||||||
|
(aggValue, newValue) -> aggValue + newValue, /* adder */
|
||||||
|
TimeWindows.of(TimeUnit.MINUTES.toMillis(5)), /* time-based window */
|
||||||
|
"time-windowed-reduced-stream-store" /* state store name */);
|
||||||
|
|
||||||
|
// Aggregating with session-based windowing (here: with an inactivity gap of 5 minutes)
|
||||||
|
KTable<Windowed<String>, Long> sessionzedAggregatedStream = groupedStream.reduce(
|
||||||
|
(aggValue, newValue) -> aggValue + newValue, /* adder */
|
||||||
|
SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), /* session window */
|
||||||
|
"sessionized-reduced-stream-store" /* state store name */);
|
||||||
|
|
||||||
|
|
||||||
|
// Java 7 examples
|
||||||
|
|
||||||
|
// Aggregating with time-based windowing (here: with 5-minute tumbling windows)
|
||||||
|
KTable<Windowed<String>, Long> timeWindowedAggregatedStream = groupedStream.reduce(
|
||||||
|
new Reducer<Long>() { /* adder */
|
||||||
|
@Override
|
||||||
|
public Long apply(Long aggValue, Long newValue) {
|
||||||
|
return aggValue + newValue;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
TimeWindows.of(TimeUnit.MINUTES.toMillis(5)), /* time-based window */
|
||||||
|
"time-windowed-reduced-stream-store" /* state store name */);
|
||||||
|
|
||||||
|
// Aggregating with session-based windowing (here: with an inactivity gap of 5 minutes)
|
||||||
|
KTable<Windowed<String>, Long> timeWindowedAggregatedStream = groupedStream.reduce(
|
||||||
|
new Reducer<Long>() { /* adder */
|
||||||
|
@Override
|
||||||
|
public Long apply(Long aggValue, Long newValue) {
|
||||||
|
return aggValue + newValue;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), /* session window */
|
||||||
|
"sessionized-reduced-stream-store" /* state store name */);
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Detailed behavior:
|
||||||
|
</p>
|
||||||
|
<ul>
|
||||||
|
<li>The windowed reduce behaves similar to the rolling reduce described above. The additional twist is that
|
||||||
|
the behavior applies per window.</li>
|
||||||
|
<li>Input records with <code>null</code> keys are ignored in general.</li>
|
||||||
|
<li>When a record key is received for the first time for a given window, then the value of that record is
|
||||||
|
used as the initial aggregate value.</li>
|
||||||
|
<li>Whenever a record with a non-<code>null</code> value is received for a given window, the adder is called. (Note: As
|
||||||
|
a result of a known bug in Kafka 0.11.0.0, the adder is currently also called for <code>null</code> values. You can work
|
||||||
|
around this, for example, by manually filtering out <code>null</code> values prior to grouping the stream.)</li>
|
||||||
|
<li>See the example at the bottom of this section for a visualization of the aggregation semantics.</li>
|
||||||
|
</ul>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
<b>Example of semantics for stream aggregations</b>: A <code>KGroupedStream → KTable</code> example is shown below. The streams and the table are
|
||||||
|
initially empty. We use bold font in the column for "KTable <code>aggregated</code>" to highlight changed state. An entry such as <code>(hello, 1)</code>
|
||||||
|
denotes a record with key <code>hello</code> and value <code>1</code>. To improve the readability of the semantics table we assume that all records are
|
||||||
|
processed in timestamp order.
|
||||||
|
</p>
|
||||||
|
<pre class="brush: java;">
|
||||||
|
// Key: word, value: count
|
||||||
|
KStream<String, Integer> wordCounts = ...;
|
||||||
|
|
||||||
|
KGroupedStream<String, Integer> groupedStream = wordCounts
|
||||||
|
.groupByKey(Serdes.String(), Serdes.Integer());
|
||||||
|
|
||||||
|
KTable<String, Integer> aggregated = groupedStream.aggregate(
|
||||||
|
() -> 0, /* initializer */
|
||||||
|
(aggKey, newValue, aggValue) -> aggValue + newValue, /* adder */
|
||||||
|
Serdes.Integer(), /* serde for aggregate value */
|
||||||
|
"aggregated-stream-store" /* state store name */);
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
<b>Impact of <a href=#streams_developer-guide_memory-management_record-cache>record caches</a></b>: For illustration purposes,
|
||||||
|
the column "KTable <code>aggregated</code>" below shows the table's state changes over
|
||||||
|
time in a very granular way. In practice, you would observe state changes in such a granular way only when record caches are
|
||||||
|
disabled (default: enabled). When record caches are enabled, what might happen for example is that the output results of the
|
||||||
|
rows with timestamps 4 and 5 would be compacted, and there would only be a single state update for the key <code>kafka</code> in the KTable
|
||||||
|
(here: from <code>(kafka 1)</code> directly to <code>(kafka, 3)</code>. Typically, you should only disable record caches for testing or debugging purposes
|
||||||
|
-- under normal circumstances it is better to leave record caches enabled.
|
||||||
|
</p>
|
||||||
|
<table class="data-table" border="1">
|
||||||
|
<thead>
|
||||||
|
<col>
|
||||||
|
<colgroup span="2"></colgroup>
|
||||||
|
<colgroup span="2"></colgroup>
|
||||||
|
<col>
|
||||||
|
<tr>
|
||||||
|
<th scope="col"></th>
|
||||||
|
<th colspan="2">KStream wordCounts</th>
|
||||||
|
<th colspan="2">KGroupedStream groupedStream</th>
|
||||||
|
<th scope="col">KTable aggregated</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<th scope="col">Timestamp</th>
|
||||||
|
<th scope="col">Input record</th>
|
||||||
|
<th scope="col">Grouping</th>
|
||||||
|
<th scope="col">Initializer</th>
|
||||||
|
<th scope="col">Adder</th>
|
||||||
|
<th scope="col">State</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>1</td>
|
||||||
|
<td>(hello, 1)</td>
|
||||||
|
<td>(hello, 1)</td>
|
||||||
|
<td>0 (for hello)</td>
|
||||||
|
<td>(hello, 0 + 1)</td>
|
||||||
|
<td>(hello, 1)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>2</td>
|
||||||
|
<td>(kafka, 1)</td>
|
||||||
|
<td>(kafka, 1)</td>
|
||||||
|
<td>0 (for kafka)</td>
|
||||||
|
<td>(kafka, 0 + 1)</td>
|
||||||
|
<td>(hello, 1), (kafka, 1)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>3</td>
|
||||||
|
<td>(streams, 1)</td>
|
||||||
|
<td>(streams, 1)</td>
|
||||||
|
<td>0 (for streams)</td>
|
||||||
|
<td>(streams, 0 + 1)</td>
|
||||||
|
<td>(hello, 1), (kafka, 1), (streams, 1)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>4</td>
|
||||||
|
<td>(kafka, 1)</td>
|
||||||
|
<td>(kafka, 1)</td>
|
||||||
|
<td></td>
|
||||||
|
<td>(kafka, 1 + 1)</td>
|
||||||
|
<td>(hello, 1), (kafka, 2), (streams, 1)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>5</td>
|
||||||
|
<td>(kafka, 1)</td>
|
||||||
|
<td>(kafka, 1)</td>
|
||||||
|
<td></td>
|
||||||
|
<td>(kafka, 2 + 1)</td>
|
||||||
|
<td>(hello, 1), (kafka, 3), (streams, 1)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>6</td>
|
||||||
|
<td>(streams, 1)</td>
|
||||||
|
<td>(streams, 1)</td>
|
||||||
|
<td></td>
|
||||||
|
<td>(streams, 1 + 1)</td>
|
||||||
|
<td>(hello, 1), (kafka, 3), (streams, 2)</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<p>
|
||||||
|
Example of semantics for table aggregations: A <code>KGroupedTable → KTable</code> example is shown below. The tables are initially empty.
|
||||||
|
We use bold font in the column for "KTable <code>aggregated</code>" to highlight changed state. An entry such as <code>(hello, 1)</code> denotes a
|
||||||
|
record with key <code>hello</code> and value <code>1</code>. To improve the readability of the semantics table we assume that all records are processed
|
||||||
|
in timestamp order.
|
||||||
|
</p>
|
||||||
|
<pre class="brush: java;">
|
||||||
|
// Key: username, value: user region (abbreviated to "E" for "Europe", "A" for "Asia")
|
||||||
|
KTable<String, String> userProfiles = ...;
|
||||||
|
|
||||||
|
// Re-group `userProfiles`. Don't read too much into what the grouping does:
|
||||||
|
// its prime purpose in this example is to show the *effects* of the grouping
|
||||||
|
// in the subsequent aggregation.
|
||||||
|
KGroupedTable<String, Integer> groupedTable = userProfiles
|
||||||
|
.groupBy((user, region) -> KeyValue.pair(region, user.length()), Serdes.String(), Serdes.Integer());
|
||||||
|
|
||||||
|
KTable<String, Integer> aggregated = groupedTable.aggregate(
|
||||||
|
() -> 0, /* initializer */
|
||||||
|
(aggKey, newValue, aggValue) -> aggValue + newValue, /* adder */
|
||||||
|
(aggKey, oldValue, aggValue) -> aggValue - oldValue, /* subtractor */
|
||||||
|
Serdes.Integer(), /* serde for aggregate value */
|
||||||
|
"aggregated-table-store" /* state store name */);
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
<b>Impact of <a href=#streams_developer-guide_memory-management_record-cache>record caches</a></b>:
|
||||||
|
For illustration purposes, the column "KTable <code>aggregated</code>" below shows
|
||||||
|
the table's state changes over time in a very granular way. In practice, you would observe state changes
|
||||||
|
in such a granular way only when record caches are disabled (default: enabled). When record caches are enabled,
|
||||||
|
what might happen for example is that the output results of the rows with timestamps 4 and 5 would be
|
||||||
|
compacted, and there would only be a single state update for the key <code>kafka</code> in the KTable
|
||||||
|
(here: from <code>(kafka 1)</code> directly to <code>(kafka, 3)</code>. Typically, you should only disable
|
||||||
|
record caches for testing or debugging purposes -- under normal circumstances it is better to leave record caches enabled.
|
||||||
|
</p>
|
||||||
|
<table class="data-table" border="1">
|
||||||
|
<thead>
|
||||||
|
<col>
|
||||||
|
<colgroup span="2"></colgroup>
|
||||||
|
<colgroup span="2"></colgroup>
|
||||||
|
<col>
|
||||||
|
<tr>
|
||||||
|
<th scope="col"></th>
|
||||||
|
<th colspan="3">KTable userProfiles</th>
|
||||||
|
<th colspan="3">KGroupedTable groupedTable</th>
|
||||||
|
<th scope="col">KTable aggregated</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<th scope="col">Timestamp</th>
|
||||||
|
<th scope="col">Input record</th>
|
||||||
|
<th scope="col">Interpreted as</th>
|
||||||
|
<th scope="col">Grouping</th>
|
||||||
|
<th scope="col">Initializer</th>
|
||||||
|
<th scope="col">Adder</th>
|
||||||
|
<th scope="col">Subtractor</th>
|
||||||
|
<th scope="col">State</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>1</td>
|
||||||
|
<td>(alice, E)</td>
|
||||||
|
<td>INSERT alice</td>
|
||||||
|
<td>(E, 5)</td>
|
||||||
|
<td>0 (for E)</td>
|
||||||
|
<td>(E, 0 + 5)</td>
|
||||||
|
<td></td>
|
||||||
|
<td>(E, 5)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>2</td>
|
||||||
|
<td>(bob, A)</td>
|
||||||
|
<td>INSERT bob</td>
|
||||||
|
<td>(A, 3)</td>
|
||||||
|
<td>0 (for A)</td>
|
||||||
|
<td>(A, 0 + 3)</td>
|
||||||
|
<td></td>
|
||||||
|
<td>(A, 3), (E, 5)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>3</td>
|
||||||
|
<td>(charlie, A)</td>
|
||||||
|
<td>INSERT charlie</td>
|
||||||
|
<td>(A, 7)</td>
|
||||||
|
<td></td>
|
||||||
|
<td>(A, 3 + 7)</td>
|
||||||
|
<td></td>
|
||||||
|
<td>(A, 10), (E, 5)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>4</td>
|
||||||
|
<td>(alice, A)</td>
|
||||||
|
<td>UPDATE alice</td>
|
||||||
|
<td>(A, 5)</td>
|
||||||
|
<td></td>
|
||||||
|
<td>(A, 10 + 5)</td>
|
||||||
|
<td>(E, 5 - 5)</td>
|
||||||
|
<td>(A, 15), (E, 0)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>5</td>
|
||||||
|
<td>(charlie, null)</td>
|
||||||
|
<td>DELETE charlie</td>
|
||||||
|
<td>(null, 7)</td>
|
||||||
|
<td></td>
|
||||||
|
<td></td>
|
||||||
|
<td>(A, 15 - 7)</td>
|
||||||
|
<td>(A, 8), (E, 0)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>6</td>
|
||||||
|
<td>(null, E)</td>
|
||||||
|
<td>ignored</td>
|
||||||
|
<td></td>
|
||||||
|
<td></td>
|
||||||
|
<td></td>
|
||||||
|
<td></td>
|
||||||
|
<td>(A, 8), (E, 0)</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>7</td>
|
||||||
|
<td>(bob, E)</td>
|
||||||
|
<td>UPDATE bob</td>
|
||||||
|
<td>(E, 3)</td>
|
||||||
|
<td></td>
|
||||||
|
<td>(E, 0 + 3)</td>
|
||||||
|
<td>(A, 8 - 3)</td>
|
||||||
|
<td>(A, 5), (E, 3)</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
<h6><a id="streams_dsl_windowing" href="#streams_dsl_windowing">Windowing a stream</a></h6>
|
<h6><a id="streams_dsl_windowing" href="#streams_dsl_windowing">Windowing a stream</a></h6>
|
||||||
A stream processor may need to divide data records into time buckets, i.e. to <b>window</b> the stream by time. This is usually needed for join and aggregation operations, etc. Kafka Streams currently defines the following types of windows:
|
A stream processor may need to divide data records into time buckets, i.e. to <b>window</b> the stream by time. This is usually needed for join and aggregation operations, etc. Kafka Streams currently defines the following types of windows:
|
||||||
|
@ -1066,14 +1872,7 @@ Note that in the <code>WordCountProcessor</code> implementation, users need to r
|
||||||
Depending on the operands the following join operations are supported: <b>inner joins</b>, <b>outer joins</b> and <b>left joins</b>.
|
Depending on the operands the following join operations are supported: <b>inner joins</b>, <b>outer joins</b> and <b>left joins</b>.
|
||||||
Their <a href="https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics">semantics</a> are similar to the corresponding operators in relational databases.
|
Their <a href="https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics">semantics</a> are similar to the corresponding operators in relational databases.
|
||||||
|
|
||||||
<h6><a id="streams_dsl_aggregations" href="#streams_dsl_aggregations">Aggregate a stream</a></h6>
|
|
||||||
An <b>aggregation</b> operation takes one input stream, and yields a new stream by combining multiple input records into a single output record. Examples of aggregations are computing counts or sum. An aggregation over record streams usually needs to be performed on a windowing basis because otherwise the number of records that must be maintained for performing the aggregation may grow indefinitely.
|
|
||||||
|
|
||||||
<p>
|
|
||||||
In the Kafka Streams DSL, an input stream of an aggregation can be a <code>KStream</code> or a <code>KTable</code>, but the output stream will always be a <code>KTable</code>.
|
|
||||||
This allows Kafka Streams to update an aggregate value upon the late arrival of further records after the value was produced and emitted.
|
|
||||||
When such late arrival happens, the aggregating <code>KStream</code> or <code>KTable</code> simply emits a new aggregate value. Because the output is a <code>KTable</code>, the new value is considered to overwrite the old value with the same key in subsequent processing steps.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h4><a id="streams_dsl_sink" href="#streams_dsl_sink">Write streams back to Kafka</a></h4>
|
<h4><a id="streams_dsl_sink" href="#streams_dsl_sink">Write streams back to Kafka</a></h4>
|
||||||
|
|
||||||
|
@ -1124,7 +1923,7 @@ Note that in the <code>WordCountProcessor</code> implementation, users need to r
|
||||||
|
|
||||||
|
|
||||||
<figure>
|
<figure>
|
||||||
<img class="centerd" src="/{{version}}/images/streams-interactive-queries-01.png" style="width:600pt;">
|
<img class="centered" src="/{{version}}/images/streams-interactive-queries-01.png" style="width:600pt;">
|
||||||
<figcaption style="text-align: center;"><i>Without interactive queries: increased complexity and heavier footprint of architecture</i></figcaption>
|
<figcaption style="text-align: center;"><i>Without interactive queries: increased complexity and heavier footprint of architecture</i></figcaption>
|
||||||
</figure>
|
</figure>
|
||||||
|
|
||||||
|
@ -1222,7 +2021,7 @@ Note that in the <code>WordCountProcessor</code> implementation, users need to r
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<figure>
|
<figure>
|
||||||
<img class="centerd" src="/{{version}}/images/streams-interactive-queries-api-01.png" style="width:500pt;">
|
<img class="centered" src="/{{version}}/images/streams-interactive-queries-api-01.png" style="width:500pt;">
|
||||||
<figcaption style="text-align: center;"><i>Every application instance can directly query any of its local state stores</i></figcaption>
|
<figcaption style="text-align: center;"><i>Every application instance can directly query any of its local state stores</i></figcaption>
|
||||||
</figure>
|
</figure>
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue