MINOR: update Kafka Streams docs with 3.3 KIP information (#16316)

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Jim Galasyn <jim.galasyn@confluent.io>
This commit is contained in:
Matthias J. Sax 2024-06-13 15:17:00 -07:00
parent 24d7e38b54
commit 05694da5d4
2 changed files with 80 additions and 3 deletions

View File

@ -2863,10 +2863,11 @@ active-process-ratio metrics which have a recording level of <code>info</code>:
<h5 class="anchor-heading"><a id="kafka_streams_node_monitoring" class="anchor-link"></a><a href="#kafka_streams_node_monitoring">Processor Node Metrics</a></h5>
The following metrics are only available on certain types of nodes, i.e., the process-* metrics are only available for
source processor nodes, the suppression-emit-* metrics are only available for suppression operation nodes, and the
record-e2e-latency-* metrics are only available for source processor nodes and terminal nodes (nodes without successor
source processor nodes, the <code>suppression-emit-*</code> metrics are only available for suppression operation nodes,
<code>emit-final-*</code> metrics are only available for windowed aggregations nodes, and the
<code>record-e2e-latency-*</code> metrics are only available for source processor nodes and terminal nodes (nodes without successor
nodes).
All of the metrics have a recording level of <code>debug</code>, except for the record-e2e-latency-* metrics which have
All of the metrics have a recording level of <code>debug</code>, except for the <code>record-e2e-latency-*</code> metrics which have
a recording level of <code>info</code>:
<table class="data-table">
<tbody>
@ -2905,6 +2906,26 @@ active-process-ratio metrics which have a recording level of <code>info</code>:
<td>The total number of records that have been emitted downstream from suppression operation nodes.</td>
<td>kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)</td>
</tr>
<tr>
<td>emit-final-latency-max</td>
<td>The max latency to emit final records when a record could be emitted.</td>
<td>kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)</td>
</tr>
<tr>
<td>emit-final-latency-avg</td>
<td>The avg latency to emit final records when a record could be emitted.</td>
<td>kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)</td>
</tr>
<tr>
<td>emit-final-records-rate</td>
<td>The rate of records emitted when records could be emitted.</td>
<td>kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)</td>
</tr>
<tr>
<td>emit-final-records-total</td>
<td>The total number of records emitted.</td>
<td>kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)</td>
</tr>
<tr>
<td>record-e2e-latency-avg</td>
<td>The average end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.</td>

View File

@ -105,6 +105,62 @@
More details about the new config <code>StreamsConfig#TOPOLOGY_OPTIMIZATION</code> can be found in <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-295%3A+Add+Streams+Configuration+Allowing+for+Optional+Topology+Optimization">KIP-295</a>.
</p>
<h3><a id="streams_api_changes_330" href="#streams_api_changes_330">Streams API changes in 3.3.0</a></h3>
<p>
Kafka Streams does not send a "leave group" request when an instance is closed. This behavior implies
that a rebalance is delayed until <code>max.poll.interval.ms</code> passed.
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-812%3A+Introduce+another+form+of+the+%60KafkaStreams.close%28%29%60+API+that+forces+the+member+to+leave+the+consumer+group">KIP-812</a>
introduces <code>KafkaStreams.close(CloseOptions)</code> overload, which allows forcing an instance to leave the
group immediately.
Note: Due to internal limitations, <code>CloseOptions</code> only works for static consumer groups at this point
(cf. <a href="https://issues.apache.org/jira/browse/KAFKA-16514">KAFKA-16514</a> for more details and a fix in
some future release).
</p>
<p>
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-820%3A+Extend+KStream+process+with+new+Processor+API">KIP-820</a>
adapts the PAPI type-safety improvement of KIP-478 into the DSL. The existing methods <code>KStream.transform</code>,
<code>KStream.flatTransform</code>, <code>KStream.transformValues</code>, and <code>KStream.flatTransformValues</code>
as well as all overloads of <code>void KStream.process</code> are deprecated in favor of the newly added methods
<ul>
<li><code>KStream&lt;KOut,VOut&gt; KStream.process(ProcessorSupplier, ...)</code></li>
<li><code>KStream&lt;K,VOut&gt; KStream.processValues(FixedKeyProcessorSupplier, ...)</code></li>
</ul>
Both new methods have multiple overloads and return a <code>KStream</code> instead of <code>void</code> as the
deprecated <code>process()</code> methods did. In addition, <code>FixedKeyProcessor</code>, <code>FixedKeyRecord</code>,
<code>FixedKeyProcessorContext</code>, and <code>ContextualFixedKeyProcessor</code> are introduced to guard against
disallowed key modification inside <code>processValues()</code>. Furthermore, <code>ProcessingContext</code> is
added for a better interface hierarchy.
</p>
<p>
Emitting a windowed aggregation result only after a window is closed is currently supported via the
<code>suppress()</code> operator. However, <code>suppress()</code> uses an in-memory implementation and does not
support RocksDB. To close this gap,
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-825%3A+introduce+a+new+API+to+control+when+aggregated+results+are+produced">KIP-825</a>
introduces "emit strategies", which are built into the aggregation operator directly to use the already existing
RocksDB store. <code>TimeWindowedKStream.emitStrategy(EmitStrategy)</code> and
<code>SessionWindowedKStream.emitStrategy(EmitStrategy)</code> allow picking between "emit on window update" (default)
and "emit on window close" strategies. Additionally, a few new emit metrics are added, as well as a necessary
new method, <code>SessionStore.findSessions(long, long)</code>.
</p>
<p>
<a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832">KIP-834</a> allows pausing
and resuming a Kafka Streams instance. Pausing implies that processing input records and executing punctuations will
be skipped; Kafka Streams will continue to poll to maintain its group membership and may commit offsets.
In addition to the new methods <code>KafkaStreams.pause()</code> and <code>KafkaStreams.resume()</code>, it is also
supported to check if an instance is paused via the <code>KafkaStreams.isPaused()</code> method.
</p>
<p>
To improve monitoring of Kafka Streams applications, <a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211886093">KIP-846</a>
adds four new metrics <code>bytes-consumed-total</code>, <code>records-consumed-total</code>,
<code>bytes-produced-total</code>, and <code>records-produced-total</code> within a new <b>topic level</b> scope.
The metrics are collected at INFO level for source and sink nodes, respectively.
</p>
<h3><a id="streams_api_changes_320" href="#streams_api_changes_320">Streams API changes in 3.2.0</a></h3>
<p>
RocksDB offers many metrics which are critical to monitor and tune its performance. Kafka Streams started to make RocksDB metrics accessible