MINOR: update Kafka Streams docs with 3.3 KIP information (#16316)

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Jim Galasyn <jim.galasyn@confluent.io>
This commit is contained in:
Matthias J. Sax 2024-06-13 15:17:00 -07:00
parent 24d7e38b54
commit 05694da5d4
2 changed files with 80 additions and 3 deletions

View File

@ -2863,10 +2863,11 @@ active-process-ratio metrics which have a recording level of <code>info</code>:
<h5 class="anchor-heading"><a id="kafka_streams_node_monitoring" class="anchor-link"></a><a href="#kafka_streams_node_monitoring">Processor Node Metrics</a></h5> <h5 class="anchor-heading"><a id="kafka_streams_node_monitoring" class="anchor-link"></a><a href="#kafka_streams_node_monitoring">Processor Node Metrics</a></h5>
The following metrics are only available on certain types of nodes, i.e., the process-* metrics are only available for The following metrics are only available on certain types of nodes, i.e., the process-* metrics are only available for
source processor nodes, the suppression-emit-* metrics are only available for suppression operation nodes, and the source processor nodes, the <code>suppression-emit-*</code> metrics are only available for suppression operation nodes,
record-e2e-latency-* metrics are only available for source processor nodes and terminal nodes (nodes without successor <code>emit-final-*</code> metrics are only available for windowed aggregations nodes, and the
<code>record-e2e-latency-*</code> metrics are only available for source processor nodes and terminal nodes (nodes without successor
nodes). nodes).
All of the metrics have a recording level of <code>debug</code>, except for the record-e2e-latency-* metrics which have All of the metrics have a recording level of <code>debug</code>, except for the <code>record-e2e-latency-*</code> metrics which have
a recording level of <code>info</code>: a recording level of <code>info</code>:
<table class="data-table"> <table class="data-table">
<tbody> <tbody>
@ -2905,6 +2906,26 @@ active-process-ratio metrics which have a recording level of <code>info</code>:
<td>The total number of records that have been emitted downstream from suppression operation nodes.</td> <td>The total number of records that have been emitted downstream from suppression operation nodes.</td>
<td>kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)</td> <td>kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)</td>
</tr> </tr>
<tr>
<td>emit-final-latency-max</td>
<td>The max latency to emit final records when a record could be emitted.</td>
<td>kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)</td>
</tr>
<tr>
<td>emit-final-latency-avg</td>
<td>The avg latency to emit final records when a record could be emitted.</td>
<td>kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)</td>
</tr>
<tr>
<td>emit-final-records-rate</td>
<td>The rate of records emitted when records could be emitted.</td>
<td>kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)</td>
</tr>
<tr>
<td>emit-final-records-total</td>
<td>The total number of records emitted.</td>
<td>kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)</td>
</tr>
<tr> <tr>
<td>record-e2e-latency-avg</td> <td>record-e2e-latency-avg</td>
<td>The average end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.</td> <td>The average end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.</td>

View File

@ -105,6 +105,62 @@
More details about the new config <code>StreamsConfig#TOPOLOGY_OPTIMIZATION</code> can be found in <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-295%3A+Add+Streams+Configuration+Allowing+for+Optional+Topology+Optimization">KIP-295</a>. More details about the new config <code>StreamsConfig#TOPOLOGY_OPTIMIZATION</code> can be found in <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-295%3A+Add+Streams+Configuration+Allowing+for+Optional+Topology+Optimization">KIP-295</a>.
</p> </p>
<h3><a id="streams_api_changes_330" href="#streams_api_changes_330">Streams API changes in 3.3.0</a></h3>
<p>
Kafka Streams does not send a "leave group" request when an instance is closed. This behavior implies
that a rebalance is delayed until <code>max.poll.interval.ms</code> passed.
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-812%3A+Introduce+another+form+of+the+%60KafkaStreams.close%28%29%60+API+that+forces+the+member+to+leave+the+consumer+group">KIP-812</a>
introduces <code>KafkaStreams.close(CloseOptions)</code> overload, which allows forcing an instance to leave the
group immediately.
Note: Due to internal limitations, <code>CloseOptions</code> only works for static consumer groups at this point
(cf. <a href="https://issues.apache.org/jira/browse/KAFKA-16514">KAFKA-16514</a> for more details and a fix in
some future release).
</p>
<p>
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-820%3A+Extend+KStream+process+with+new+Processor+API">KIP-820</a>
adapts the PAPI type-safety improvement of KIP-478 into the DSL. The existing methods <code>KStream.transform</code>,
<code>KStream.flatTransform</code>, <code>KStream.transformValues</code>, and <code>KStream.flatTransformValues</code>
as well as all overloads of <code>void KStream.process</code> are deprecated in favor of the newly added methods
<ul>
<li><code>KStream&lt;KOut,VOut&gt; KStream.process(ProcessorSupplier, ...)</code></li>
<li><code>KStream&lt;K,VOut&gt; KStream.processValues(FixedKeyProcessorSupplier, ...)</code></li>
</ul>
Both new methods have multiple overloads and return a <code>KStream</code> instead of <code>void</code> as the
deprecated <code>process()</code> methods did. In addition, <code>FixedKeyProcessor</code>, <code>FixedKeyRecord</code>,
<code>FixedKeyProcessorContext</code>, and <code>ContextualFixedKeyProcessor</code> are introduced to guard against
disallowed key modification inside <code>processValues()</code>. Furthermore, <code>ProcessingContext</code> is
added for a better interface hierarchy.
</p>
<p>
Emitting a windowed aggregation result only after a window is closed is currently supported via the
<code>suppress()</code> operator. However, <code>suppress()</code> uses an in-memory implementation and does not
support RocksDB. To close this gap,
<a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-825%3A+introduce+a+new+API+to+control+when+aggregated+results+are+produced">KIP-825</a>
introduces "emit strategies", which are built into the aggregation operator directly to use the already existing
RocksDB store. <code>TimeWindowedKStream.emitStrategy(EmitStrategy)</code> and
<code>SessionWindowedKStream.emitStrategy(EmitStrategy)</code> allow picking between "emit on window update" (default)
and "emit on window close" strategies. Additionally, a few new emit metrics are added, as well as a necessary
new method, <code>SessionStore.findSessions(long, long)</code>.
</p>
<p>
<a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832">KIP-834</a> allows pausing
and resuming a Kafka Streams instance. Pausing implies that processing input records and executing punctuations will
be skipped; Kafka Streams will continue to poll to maintain its group membership and may commit offsets.
In addition to the new methods <code>KafkaStreams.pause()</code> and <code>KafkaStreams.resume()</code>, it is also
supported to check if an instance is paused via the <code>KafkaStreams.isPaused()</code> method.
</p>
<p>
To improve monitoring of Kafka Streams applications, <a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211886093">KIP-846</a>
adds four new metrics <code>bytes-consumed-total</code>, <code>records-consumed-total</code>,
<code>bytes-produced-total</code>, and <code>records-produced-total</code> within a new <b>topic level</b> scope.
The metrics are collected at INFO level for source and sink nodes, respectively.
</p>
<h3><a id="streams_api_changes_320" href="#streams_api_changes_320">Streams API changes in 3.2.0</a></h3> <h3><a id="streams_api_changes_320" href="#streams_api_changes_320">Streams API changes in 3.2.0</a></h3>
<p> <p>
RocksDB offers many metrics which are critical to monitor and tune its performance. Kafka Streams started to make RocksDB metrics accessible RocksDB offers many metrics which are critical to monitor and tune its performance. Kafka Streams started to make RocksDB metrics accessible