mirror of https://github.com/apache/kafka.git
MINOR: replace `late` with `out-of-order` in JavaDocs and docs (#7274)
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>
This commit is contained in:
parent
a0470726c4
commit
e85d671dee
|
|
@ -49,7 +49,7 @@
|
|||
<li>Has <b>no external dependencies on systems other than Apache Kafka itself</b> as the internal messaging layer; notably, it uses Kafka's partitioning model to horizontally scale processing while maintaining strong ordering guarantees.</li>
|
||||
<li>Supports <b>fault-tolerant local state</b>, which enables very fast and efficient stateful operations like windowed joins and aggregations.</li>
|
||||
<li>Supports <b>exactly-once</b> processing semantics to guarantee that each record will be processed once and only once even when there is a failure on either Streams clients or Kafka brokers in the middle of processing.</li>
|
||||
<li>Employs <b>one-record-at-a-time processing</b> to achieve millisecond processing latency, and supports <b>event-time based windowing operations</b> with late arrival of records.</li>
|
||||
<li>Employs <b>one-record-at-a-time processing</b> to achieve millisecond processing latency, and supports <b>event-time based windowing operations</b> with out-of-order arrival of records.</li>
|
||||
<li>Offers necessary stream processing primitives, along with a <b>high-level Streams DSL</b> and a <b>low-level Processor API</b>.</li>
|
||||
|
||||
</ul>
|
||||
|
|
@ -124,7 +124,7 @@
|
|||
<ul>
|
||||
<li> When new output records are generated via processing some input record, for example, <code>context.forward()</code> triggered in the <code>process()</code> function call, output record timestamps are inherited from input record timestamps directly.</li>
|
||||
<li> When new output records are generated via periodic functions such as <code>Punctuator#punctuate()</code>, the output record timestamp is defined as the current internal time (obtained through <code>context.timestamp()</code>) of the stream task.</li>
|
||||
<li> For aggregations, the timestamp of a resulting aggregate update record will be that of the latest arrived input record that triggered the update.</li>
|
||||
<li> For aggregations, the timestamp of a result update record will be the maximum timestamp of all input records contributing to the result.</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
|
|
@ -137,7 +137,7 @@
|
|||
</p>
|
||||
|
||||
<p>
|
||||
In the <code>Kafka Streams DSL</code>, an input stream of an <code>aggregation</code> can be a KStream or a KTable, but the output stream will always be a KTable. This allows Kafka Streams to update an aggregate value upon the late arrival of further records after the value was produced and emitted. When such late arrival happens, the aggregating KStream or KTable emits a new aggregate value. Because the output is a KTable, the new value is considered to overwrite the old value with the same key in subsequent processing steps.
|
||||
In the <code>Kafka Streams DSL</code>, an input stream of an <code>aggregation</code> can be a KStream or a KTable, but the output stream will always be a KTable. This allows Kafka Streams to update an aggregate value upon the out-of-order arrival of further records after the value was produced and emitted. When such out-of-order arrival happens, the aggregating KStream or KTable emits a new aggregate value. Because the output is a KTable, the new value is considered to overwrite the old value with the same key in subsequent processing steps.
|
||||
</p>
|
||||
|
||||
<h3> <a id="streams_concepts_windowing" href="#streams_concepts_windowing">Windowing</a></h3>
|
||||
|
|
@ -145,10 +145,10 @@
|
|||
Windowing lets you control how to <em>group records that have the same key</em> for stateful operations such as <code>aggregations</code> or <code>joins</code> into so-called <em>windows</em>. Windows are tracked per record key.
|
||||
</p>
|
||||
<p>
|
||||
<code>Windowing operations</code> are available in the <code>Kafka Streams DSL</code>. When working with windows, you can specify a <strong>retention period</strong> for the window. This retention period controls how long Kafka Streams will wait for <strong>out-of-order</strong> or <strong>late-arriving</strong> data records for a given window. If a record arrives after the retention period of a window has passed, the record is discarded and will not be processed in that window.
|
||||
<code>Windowing operations</code> are available in the <code>Kafka Streams DSL</code>. When working with windows, you can specify a <strong>grace period</strong> for the window. This grace period controls how long Kafka Streams will wait for <strong>out-of-order</strong> data records for a given window. If a record arrives after the grace period of a window has passed, the record is discarded and will not be processed in that window. Specifically, a record is discarded if its timestamp dictates it belongs to a window, but the current stream time is greater than the end of the window plus the grace period.
|
||||
</p>
|
||||
<p>
|
||||
Late-arriving records are always possible in the real world and should be properly accounted for in your applications. It depends on the effective <code>time semantics </code> how late records are handled. In the case of processing-time, the semantics are "when the record is being processed", which means that the notion of late records is not applicable as, by definition, no record can be late. Hence, late-arriving records can only be considered as such (i.e. as arriving "late") for event-time or ingestion-time semantics. In both cases, Kafka Streams is able to properly handle late-arriving records.
|
||||
Out-of-order records are always possible in the real world and should be properly accounted for in your applications. It depends on the effective <code>time semantics </code> how out-of-order records are handled. In the case of processing-time, the semantics are "when the record is being processed", which means that the notion of out-of-order records is not applicable as, by definition, no record can be out-of-order. Hence, out-of-order records can only be considered as such for event-time. In both cases, Kafka Streams is able to properly handle out-of-order records.
|
||||
</p>
|
||||
|
||||
<h3><a id="streams_concepts_duality" href="#streams-concepts-duality">Duality of Streams and Tables</a></h3>
|
||||
|
|
|
|||
|
|
@ -335,7 +335,7 @@
|
|||
<p>Some KStream transformations may generate one or more KStream objects, for example:
|
||||
- <code class="docutils literal"><span class="pre">filter</span></code> and <code class="docutils literal"><span class="pre">map</span></code> on a KStream will generate another KStream
|
||||
- <code class="docutils literal"><span class="pre">branch</span></code> on KStream can generate multiple KStreams</p>
|
||||
<p>Some others may generate a KTable object, for example an aggregation of a KStream also yields a KTable. This allows Kafka Streams to continuously update the computed value upon arrivals of <a class="reference internal" href="../core-concepts.html#streams_concepts_aggregations"><span class="std std-ref">late records</span></a> after it
|
||||
<p>Some others may generate a KTable object, for example an aggregation of a KStream also yields a KTable. This allows Kafka Streams to continuously update the computed value upon arrivals of <a class="reference internal" href="../core-concepts.html#streams_concepts_aggregations"><span class="std std-ref">out-of-order records</span></a> after it
|
||||
has already been produced to the downstream transformation operators.</p>
|
||||
<p>All KTable transformation operations can only generate another KTable. However, the Kafka Streams DSL does provide a special function
|
||||
that converts a KTable representation into a KStream. All of these transformation methods can be chained together to compose
|
||||
|
|
@ -2961,14 +2961,14 @@ In this diagram the time numbers represent minutes; e.g. t=5 means “at th
|
|||
of time in Kafka Streams is milliseconds, which means the time numbers would need to be multiplied with 60 * 1,000 to
|
||||
convert from minutes to milliseconds (e.g. t=5 would become t=300,000).</span></p>
|
||||
</div>
|
||||
<p>If we then receive three additional records (including two late-arriving records), what would happen is that the two
|
||||
<p>If we then receive three additional records (including two out-of-order records), what would happen is that the two
|
||||
existing sessions for the green record key will be merged into a single session starting at time 0 and ending at time 6,
|
||||
consisting of a total of three records. The existing session for the blue record key will be extended to end at time 5,
|
||||
consisting of a total of two records. And, finally, there will be a new session for the blue key starting and ending at
|
||||
time 11.</p>
|
||||
<div class="figure align-center" id="id6">
|
||||
<img class="centered" src="/{{version}}/images/streams-session-windows-02.png">
|
||||
<p class="caption"><span class="caption-text">Detected sessions after having received six input records. Note the two late-arriving data records at t=4 (green) and
|
||||
<p class="caption"><span class="caption-text">Detected sessions after having received six input records. Note the two out-of-order data records at t=4 (green) and
|
||||
t=5 (blue), which lead to a merge of sessions and an extension of a session, respectively.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
|
|
@ -3007,7 +3007,7 @@ grouped
|
|||
<dl>
|
||||
<dt><code>grace(ofMinutes(10))</code></dt>
|
||||
<dd>This allows us to bound the lateness of events the window will accept.
|
||||
For example, the 09:00 to 10:00 window will accept late-arriving records until 10:10, at which point, the window is <strong>closed</strong>.
|
||||
For example, the 09:00 to 10:00 window will accept out-of-order records until 10:10, at which point, the window is <strong>closed</strong>.
|
||||
</dd>
|
||||
<dt><code>.suppress(Suppressed.untilWindowCloses(...))</code></dt>
|
||||
<dd>This configures the suppression operator to emit nothing for a window until it closes, and then emit the final result.
|
||||
|
|
|
|||
|
|
@ -215,12 +215,12 @@ public final class JoinWindows extends Windows<Window> {
|
|||
}
|
||||
|
||||
/**
|
||||
* Reject late events that arrive more than {@code afterWindowEnd}
|
||||
* Reject out-of-order events that are delayed more than {@code afterWindowEnd}
|
||||
* after the end of its window.
|
||||
* <p>
|
||||
* Delay is defined as (stream_time - record_timestamp).
|
||||
*
|
||||
* Lateness is defined as (stream_time - record_timestamp).
|
||||
*
|
||||
* @param afterWindowEnd The grace period to admit late-arriving events to a window.
|
||||
* @param afterWindowEnd The grace period to admit out-of-order events to a window.
|
||||
* @return this updated builder
|
||||
* @throws IllegalArgumentException if the {@code afterWindowEnd} is negative of can't be represented as {@code long milliseconds}
|
||||
*/
|
||||
|
|
|
|||
|
|
@ -134,14 +134,14 @@ public final class SessionWindows {
|
|||
}
|
||||
|
||||
/**
|
||||
* Reject late events that arrive more than {@code afterWindowEnd}
|
||||
* Reject out-of-order events that arrive more than {@code afterWindowEnd}
|
||||
* after the end of its window.
|
||||
*
|
||||
* <p>
|
||||
* Note that new events may change the boundaries of session windows, so aggressive
|
||||
* close times can lead to surprising results in which a too-late event is rejected and then
|
||||
* close times can lead to surprising results in which an out-of-order event is rejected and then
|
||||
* a subsequent event moves the window boundary forward.
|
||||
*
|
||||
* @param afterWindowEnd The grace period to admit late-arriving events to a window.
|
||||
* @param afterWindowEnd The grace period to admit out-of-order events to a window.
|
||||
* @return this updated builder
|
||||
* @throws IllegalArgumentException if the {@code afterWindowEnd} is negative of can't be represented as {@code long milliseconds}
|
||||
*/
|
||||
|
|
|
|||
|
|
@ -132,7 +132,7 @@ public interface Suppressed<K> extends NamedOperation<Suppressed<K>> {
|
|||
*
|
||||
* To accomplish this, the operator will buffer events from the window until the window close (that is,
|
||||
* until the end-time passes, and additionally until the grace period expires). Since windowed operators
|
||||
* are required to reject late events for a window whose grace period is expired, there is an additional
|
||||
* are required to reject out-of-order events for a window whose grace period is expired, there is an additional
|
||||
* guarantee that the final results emitted from this suppression will match any queriable state upstream.
|
||||
*
|
||||
* @param bufferConfig A configuration specifying how much space to use for buffering intermediate results.
|
||||
|
|
|
|||
|
|
@ -188,12 +188,12 @@ public final class TimeWindows extends Windows<TimeWindow> {
|
|||
}
|
||||
|
||||
/**
|
||||
* Reject late events that arrive more than {@code millisAfterWindowEnd}
|
||||
* Reject out-of-order events that arrive more than {@code millisAfterWindowEnd}
|
||||
* after the end of its window.
|
||||
* <p>
|
||||
* Delay is defined as (stream_time - record_timestamp).
|
||||
*
|
||||
* Lateness is defined as (stream_time - record_timestamp).
|
||||
*
|
||||
* @param afterWindowEnd The grace period to admit late-arriving events to a window.
|
||||
* @param afterWindowEnd The grace period to admit out-of-order events to a window.
|
||||
* @return this updated builder
|
||||
* @throws IllegalArgumentException if {@code afterWindowEnd} is negative or can't be represented as {@code long milliseconds}
|
||||
*/
|
||||
|
|
|
|||
|
|
@ -26,9 +26,10 @@ import static org.apache.kafka.streams.kstream.internals.WindowingDefaults.DEFAU
|
|||
|
||||
/**
|
||||
* The window specification for fixed size windows that is used to define window boundaries and grace period.
|
||||
*
|
||||
* Grace period defines how long to wait on late events, where lateness is defined as (stream_time - record_timestamp).
|
||||
*
|
||||
* <p>
|
||||
* Grace period defines how long to wait on out-of-order events. That is, windows will continue to accept new records until {@code stream_time >= window_end + grace_period}.
|
||||
* Records that arrive after the grace period passed are considered <em>late</em> and will not be processed but are dropped.
|
||||
* <p>
|
||||
* Warning: It may be unsafe to use objects of this class in set- or map-like collections,
|
||||
* since the equals and hashCode methods depend on mutable fields.
|
||||
*
|
||||
|
|
@ -118,9 +119,9 @@ public abstract class Windows<W extends Window> {
|
|||
|
||||
/**
|
||||
* Return the window grace period (the time to admit
|
||||
* late-arriving events after the end of the window.)
|
||||
* out-of-order events after the end of the window.)
|
||||
*
|
||||
* Lateness is defined as (stream_time - record_timestamp).
|
||||
* Delay is defined as (stream_time - record_timestamp).
|
||||
*/
|
||||
public abstract long gracePeriodMs();
|
||||
}
|
||||
|
|
|
|||
|
|
@ -50,8 +50,8 @@ public class UsePreviousTimeOnInvalidTimestamp extends ExtractRecordMetadataTime
|
|||
* @param record a data record
|
||||
* @param recordTimestamp the timestamp extractor from the record
|
||||
* @param partitionTime the highest extracted valid timestamp of the current record's partition˙ (could be -1 if unknown)
|
||||
* @return the provided latest extracted valid timestamp as new timestamp for the record
|
||||
* @throws StreamsException if latest extracted valid timestamp is unknown
|
||||
* @return the provided highest extracted valid timestamp as new timestamp for the record
|
||||
* @throws StreamsException if highest extracted valid timestamp is unknown
|
||||
*/
|
||||
@Override
|
||||
public long onInvalidTimestamp(final ConsumerRecord<Object, Object> record,
|
||||
|
|
|
|||
Loading…
Reference in New Issue