Commit Graph

3240 Commits

Author SHA1 Message Date
Victoria Xia babfb1778b
KAFKA-14864: Close iterator in KStream windowed aggregation emit on window close (#13470)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-04-03 21:29:40 -07:00
Victoria Xia 63fee01366
KAFKA-14491: [19/N] Combine versioned store RocksDB instances into one (#13431)
The RocksDB-based versioned store implementation introduced in KIP-889 currently uses two physical RocksDB instances per store instance: one for the "latest value store" and another for the "segments store." This PR combines those two RocksDB instances into one by representing the latest value store as a special "reserved" segment within the segments store. This reserved segment has segment ID -1, is never expired, and is not included in the regular Segments methods for getting or creating segments, but is represented in the physical RocksDB instance the same way as any other segment.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-04-03 21:27:19 -07:00
Victoria Xia f503aa3ab4
KAFKA-14491: [16/N] Add recovery logic for store inconsistency due to failed write (#13364)
The RocksDB-based implementation of versioned stores introduced via KIP-889 consists of a "latest value store" and separate (logical) "segments stores." A single put operation may need to modify multiple (two) segments, or both a segment and the latest value store, which opens the possibility to store inconsistencies if the first write succeeds while the later one fails. When this happens, Streams will error out, but the store still needs to be able to recover upon restart. This PR adds the necessary repair logic into RocksDBVersionedStore to effectively undo the earlier failed write when a store inconsistency is encountered.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-04-03 21:23:48 -07:00
vamossagar12 c14f56b484
KAFKA-14586: Moving StreamResetter to tools (#13127)
Moves StreamResetter to tools project.

Reviewers: Federico Valeri <fedevaleri@gmail.com>, Christo Lolov <lolovc@amazon.com>, Bruno Cadonna <cadonna@apache.org>
2023-03-28 14:43:22 +02:00
Spacrocket 71ca8ef4ec
KAFKA-14722: Make BooleanSerde public (#13382)
KAFKA-14722: Make BooleanSerde public (#13328)

Addition of boolean serde
https://cwiki.apache.org/confluence/display/KAFKA/KIP-907%3A+Add+Boolean+Serde+to+public+interface

During the task KAFKA-14491 Victoria added BooleanSerde class, It will be useful to have such class in public package.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>, Divij Vaidya <diviv@amazon.com>
2023-03-24 10:41:51 -05:00
hudeqi f79c2a6e04
MINOR:Incorrect/canonical use of constants in AdminClientConfig and StreamsConfigTest (#13427)
Co-authored-by: Deqi Hu <deqi.hu@shopee.com>

Reviewers: Ziming Deng <dengziming1993@gmail.com>, Guozhang Wang <guozhang.wang.us@gmail.com>
2023-03-23 09:36:35 -07:00
Victoria Xia 45ecae6a28
KAFKA-14491: [15/N] Add integration tests for versioned stores (#13340)
Adds integration tests for the new versioned stores introduced in KIP-889.

This PR also contains a small bugfix for the restore record converter, required to get the tests above to pass: even though versioned stores are timestamped stores, we do not want to use the record converter for prepending timestamps when restoring a versioned store.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-03-22 10:26:06 -07:00
Victoria Xia 1560c5bd7e
KAFKA-14491: [18/N] Update versioned store to check latest value on timestamped get (#13409)
Part of KIP-889.

Prior to this PR, versioned stores always returned null for get(key, timestamp) calls where the timestamp has exceeded the store's history retention, even if the latest value for the key (i.e., the one returned from get(key)) satisfies the timestamp bound. This was an oversight from the earlier implementation -- get(key, timestamp) should still return a record in this situation since the record exists in the store. This PR updates both the javadocs and the implementation accordingly.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-03-21 17:12:05 -07:00
Victoria Xia bfd15299b1
KAFKA-14491: [14/N] Set changelog topic configs for versioned stores (#13292)
Sets the correct topic configs for changelog topics for versioned stores introduced in KIP-889. Changelog topics for versioned stores differ from those for non-versioned stores only in that min.compaction.lag.ms needs to be set in order to prevent version history from being compacted prematurely.

The value for min.compaction.lag.ms is equal to the store's history retention plus some buffer to account for the broker's use of wall-clock time in performing compactions. This buffer is analogous to the windowstore.changelog.additional.retention.ms value for window store changelog topic retention time, and uses the same default of 24 hours. In the future, we can propose a KIP to expose a config such as versionedstore.changelog.additional.compaction.lag.ms to allow users to tune this value.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-03-21 17:08:10 -07:00
Victoria Xia 361095a1a7
KAFKA-14491: [17/N] Refactor segments cleanup logic
Part of KIP-899.

AbstractSegments automatically calls the helper method to clean up expired segments as part of getOrCreateSegmentIfLive(). This works fine for windowed store implementations which call getOrCreateSegmentIfLive() exactly once per put() call, but is inefficient and difficult to reason about for the new RocksDBVersionedStore implementation (cf. #13188) which makes potentially multiple calls to getOrCreateSegmentIfLive() for different segments for a single put() call. This PR addresses this by refactoring the call to clean up expired segments out of getOrCreateSegmentIfLive(), opting to have the different segments implementations specify when cleanup should occur instead. After this PR, RocksDBVersionedStore only cleans up expired segments once per call to put().

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-03-20 20:03:50 -07:00
Lucas Brutschy 6fae237638
MINOR: Use JUnit-5 extension to enforce strict stubbing (#13347)
A privious change disabled strict stubbing for the `RocksDBMetricsRecorderTest`. To re-enable the behavior in JUnit-5, we need to pull in a new dependency in the `streams` gradle project.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2023-03-20 13:49:35 -07:00
Victoria Xia 84351efd51
KAFKA-14491: [13/N] Add versioned store builder and materializer (#13274)
This PR introduces VersionedKeyValueStoreBuilder for building the new versioned stores introduced in KIP-889, analogous to the existing TimestampedKeyValueStoreBuilder for building timestamped stores. This PR also updates the existing KTable store materializer class to materialize versioned stores in addition to timestamped stores. As part of this change, the materializer is renamed from TimestampedKeyValueStoreMaterializer to simply KeyValueStoreMaterializer.

Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>
2023-03-06 17:13:33 -08:00
Christo Lolov 5b295293c0
MINOR: Remove unnecessary toString(); fix comment references (#13212)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>, Lucas Brutschy <lbrutschy@confluent.io>
2023-03-06 18:39:04 +01:00
littlehorse-eng a6d8988179
MINOR: Clarify docs for Streams config max.warmup.replicas. (#13082)
Documentation only—Minor clarification on how max.warmup.replicas works; specifically, that one "warmup replica" corresponds to a Task that is restoring its state. Also clarifies how max.warmup.replicas interacts with probing.rebalance.interval.ms.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2023-03-03 11:00:51 -08:00
Lucas Brutschy 47450ee064
MINOR: update RocksDBMetricsRecorder test to JUnit5 and fix memory leak (#13336)
The test was leaking memory via Mockito internals. Piggy-backing an update to JUnit5.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2023-03-03 10:17:08 -08:00
Victoria Xia 517b5d2b09
KAFKA-14491: [12/N] Relax requirement that KTable stores must be TimestampedKVStores (#13264)
As part of introducing versioned key-value stores in KIP-889, we want to lift the existing DSL restriction that KTable stores are always TimestampedKeyValueStores to allow for KTable stores which are VersionedKeyValueStores instead. This PR lifts this restriction by replacing raw usages of TimestampedKeyValueStore with a new KeyValueStoreWrapper which supports either TimestampedKeyValueStore or VersionedKeyValueStore.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-03-02 14:14:30 -08:00
Guozhang Wang 5842953249
MINOR: Fix flaky tests in DefaultStateUpdaterTest (#13319)
Found a few flaky tests while reviewing another PR. The root cause seems to be with changing the return behavior of when in mockito. Fixed those without using reset and also bumped a couple debug log lines to info since they could be very helpful in debugging.

Reviewers: Luke Chen <showuon@gmail.com>, Lucas Brutschy <lbrutschy@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2023-03-01 17:36:23 -08:00
Victoria Xia 400ba0aeae
KAFKA-14491: [11/N] Add metered wrapper for versioned stores (#13252)
Introduces the metered store layer for the new versioned key-value store introduced in KIP-889. This outermost, metered store layer handles all serialization/deserialization from VersionedKeyValueStore to a bytes-representation (VersionedBytesStore) so that all inner stores may operate only with bytes types.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-02-24 14:32:43 -08:00
Guozhang Wang 2fad165294
KAFKA-10199: Add task updater metrics, part 1 (#13228)
* Moved pausing-tasks logic out of the commit-interval loop to be on the top-level loop, similar to resuming tasks.
* Added thread-level restoration metrics.
* Related unit tests.

Reviewers: Lucas Brutschy <lucasbru@users.noreply.github.com>, Matthias J. Sax <matthias@confluent.io>
2023-02-24 10:25:11 -08:00
Lucia Cerchie 8c84d29c2e
KAFKA-14128: Kafka Streams does not handle TimeoutException (#13161)
Kafka Streams is supposed to handle TimeoutException during internal topic creation gracefully. This PR fixes the exception handling code to avoid crashing on an TimeoutException returned by the admin client.

Reviewer: Matthias J. Sax <matthias@confluent.io>, Colin Patrick McCabe <cmccabe@apache.org>, Alexandre Dupriez (@Hangleton)
2023-02-22 22:51:51 -08:00
Victoria Xia a2c9f421af
KAFKA-14491: [10/N] Add changelogging wrapper for versioned stores (#13251)
Introduces the changelogging layer for the new versioned key-value store introduced in KIP-889. The changelogging layer operate on VersionedBytesStore rather than VersionedKeyValueStore so that the outermost metered store can serialize to bytes once and then all inner stores operate only with bytes types.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-02-21 13:27:54 -08:00
Lucas Brutschy 0fc029c6a4
KAFKA-14299: Fix pause and resume with state updater (#13025)
* Fixes required to make the PauseResumeIntegrationTest pass. It was not enabled and it does not pass for the state updater code path.

* Make sure no progress is made on paused topologies. The state updater restored one round of polls from the restore
consumer before realizing that a newly added task was already in paused state when being added.

* Wake up state updater when tasks are being resumed. If a task is resumed, it may be necessary to wake up the state updater from waiting on the tasksAndActions condition.

* Make sure that allTasks methods also return the tasks that are currently being restored.

* Enable PauseResumeIntegrationTest and upgrade it to JUnit5.

Reviewers: Bruno Cadonna <cadonna@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2023-02-21 10:17:09 -08:00
Victoria Xia 2e3bbe63c1
KAFKA-14491: [9/N] Add versioned bytes store and supplier (#13250)
As part of introducing versioned key-value stores in KIP-889, we'd like a way to represent a versioned key-value store (VersionedKeyValueStore<Bytes, byte[]>) as a regular key-value store (KeyValueStore<Bytes, byte[]>) in order to be compatible with existing DSL methods for passing key-value stores, e.g., StreamsBuilder#table() and KTable methods, which are explicitly typed to accept Materialized<K, V, KeyValueStore<Bytes, byte[]>. This way, we do not need to introduce new versions of all relevant StreamsBuilder and KTable methods to relax the Materialized type to accept versioned stores.

This PR introduces the new VersionedBytesStore extends KeyValueStore<Bytes, byte[]> interface for this purpose, along with the corresponding supplier (VersionedBytesStoreSupplier) and implementation (RocksDbVersionedKeyValueBytesStoreSupplier). The RocksDbVersionedKeyValueBytesStoreSupplier implementation leverages an adapter (VersionedKeyValueToBytesStoreAdapter) to assist in converting from VersionedKeyValueStore to VersionedBytesStore.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-02-17 11:06:04 -08:00
Chia-Ping Tsai 7e149990bd
KAFKA-14717 KafkaStreams can' get running if the rebalance happens be… (#13248)
I noticed this issue when tracing #12590.

StreamThread closes the consumer before changing state to DEAD. If the partition rebalance happens quickly, the other StreamThreads can't change KafkaStream state from REBALANCING to RUNNING since there is a PENDING_SHUTDOWN StreamThread

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2023-02-17 08:40:34 -08:00
Philip Nee 82d5720aae
KAFKA-14253 - More informative logging (#13253)
Includes 2 requirements from the ticket:
* Include the number of members in the group (I.e., "15 members participating" and "to 15 clients as")
* Sort the member ids (to help compare the membership and assignment across rebalances)

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2023-02-16 16:54:50 -08:00
Christo Lolov ba0c5b0902
MINOR: Simplify JUnit assertions in tests; remove accidental unnecessary code in tests (#13219)
* assertEquals called on array
* Method is identical to its super method
* Simplifiable assertions
* Unused imports

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>
2023-02-16 16:13:31 +01:00
Victoria Xia dcaf95a35f
KAFKA-14491: [8/N] Add serdes for ValueAndTimestamp with null value (#13249)
Introduces a new Serde, that serializes a value and timestamp as a single byte array, where the value may be null (in order to represent putting a tombstone with timestamp into the versioned store).

Part of KIP-889.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-02-15 18:07:47 -08:00
Victoria Xia bfeef29804
KAFKA-14491: [7/N] Enforce strict grace period for versioned stores (#13243)
Changes the versioned store semantics to define an explicit "grace period" property. Grace period will always be equal to the history retention, though in the future we could introduce a new KIP to expose options to configure grace period separately.

Part of KIP-889.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-02-15 18:02:20 -08:00
Victoria Xia 528a777df6
KAFKA-14491: [6/N] Support restoring RocksDB versioned store from changelog (#13189)
This PR builds on the new RocksDB-based versioned store implementation (see KIP-889) by adding code for restoring from changelog. The changelog topic format is the same as for regular timestamped key-value stores: record keys, values, and timestamps are stored in the Kafka message key, value, and timestamp, respectively. The code for actually writing to this changelog will come in a follow-up PR.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-02-13 17:06:44 -08:00
Victoria Xia df22a9d0e6
KAFKA-14491: [5/N] Basic operations for RocksDB versioned store (#13188)
Introduces the VersionedKeyValueStore interface proposed in KIP-889, along with the RocksDB-based implementation of the interface. This PR includes fully functional put, get, get-with-timestamp, and delete operations, but does not include the ability to restore records from changelog or surrounding store layers (for metrics or writing to the changelog). Those pieces will come in follow-up PRs.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-02-10 17:30:09 -08:00
Guozhang Wang 083e11a22c
KAFKA-14650: Synchronize access to tasks inside task manager (#13167)
1. The major fix: synchronize access to tasks inside task manager, this is a fix of a regression introduced in #12397
2. Clarify on func names of StreamThread that maybe triggered outside the StreamThread.
3. Minor cleanups.

Reviewers: Lucas Brutschy <lucasbru@users.noreply.github.com>
2023-02-09 10:33:19 -08:00
Guozhang Wang 788793dee6
KAFKA-10575: Add onRestoreSuspsnded to StateRestoreListener (#13179)
1. Add the new API (default impl is empty) to StateRestoreListener.
2. Update related unit tests

Reviewers: Lucas Brutschy <lucasbru@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>
2023-02-07 11:33:09 -08:00
Matthias J. Sax 463bb00b11
MINOR: remove unncessary helper method (#13209)
Reviewers: Christo Lolov (@clolov), Lucas Brutschy <lbrutschy@confluent.io>, Ismael Juma <ismale@confluent.io>
2023-02-07 11:21:58 -08:00
Christo Lolov a0a9b6ffea
MINOR: Remove unnecessary code (#13210)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>
2023-02-07 17:37:45 +01:00
Victoria Xia 4a7fedd46a
KAFKA-14491: [3/N] Add logical key value segments (#13143)
Part of KIP-889

Reviewers: Matthias J. Sax <matthias@confuent.io>
2023-02-03 17:26:33 -08:00
Victoria Xia b8e606355b
KAFKA-14491: [4/N] Improvements to segment value format for RocksDB versioned store (#13186)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-02-02 21:48:40 -08:00
Shekhar Rajak 3cf13064cc
Replace EasyMock and PowerMock with Mockito - TimeOrderedWindowStoreTest (#12777)
Related to KAFKA-14059 and KAFKA-14132

* Replace EasyMock and PowerMock with Mockito - TimeOrderedWindowStoreTest.java
* Reset removed which was not needed

Reviewers: Divij Vaidya <diviv@amazon.com>, Guozhang Wang <wangguoz@gmail.com>
2023-02-02 16:03:47 -08:00
Victoria Xia 65bb819313
KAFKA-14491: [1/N] Add segment value format for RocksDB versioned store (#13126)
Part of KIP-889.

The KIP proposed the introduction of versioned key-value stores, as well as a RocksDB-based implementation. The RocksDB implementation will consist of a "latest value store" for storing the latest record version associated with each key, in addition to multiple "segment stores" to store older record versions. Within a segment store, multiple record versions for the same key will be combined into a single bytes array "value" associated with the key and stored to RocksDB.

This PR introduces the utility class that will be used to manage the value format of these segment stores, i.e., how multiple record versions for the same key will be combined into a single bytes array "value." Follow-up PRs will introduce the versioned store implementation itself (which calls heavily upon this utility class).

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-02-01 13:19:53 -08:00
Jorge Esteban Quilcate Otoya 7d61d4505a
KAFKA-14409: Clean ProcessorParameters from casting (#12879)
Reviewers: A. Sophie Blee-Goldman <ableegoldman@gmail.com>, John Roesler <vvcephei@apache.org>
2023-01-31 15:55:50 -06:00
Lucas Brutschy eb7f490159
chore: Fix scaladoc warnings (#13164)
Make sure no scaladoc warnings are emitted from the streams-scala project build.
We cannot fully fix all scaladoc warnings due to limitations of the scaladoc tool,
so this is a best-effort attempt at fixing as many warnings as possible. We also
disable one problematic class of scaladoc wornings (link errors) in the gradle build.

The causes of existing warnings are that we link to java members from scaladoc, which
is not possible, or we fail to disambiguate some members.

The broad rule applied in the changes is
 - For links to Java members such as [[StateStore]], we use the fully qualified name in a code tag
   to make manual link resolution via a search engine easy.
 - For some common terms that are also linked to Java members, like [[Serde]], we omit the link.
 - We disambiguate where possible.
 - In the special case of @throws declarations with Java Exceptions, we do not seem to be able
   to avoid the warning altogther.

Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2023-01-31 09:00:48 -08:00
Victoria Xia 6c98544a96
KAFKA-14491: [2/N] Refactor RocksDB store open iterator management (#13142)
This PR refactors how the list of open iterators for RocksDB stores is managed. Prior to this PR, the `openIterators` list was passed into the constructor for the iterators themselves, allowing `RocksDbIterator.close()` to remove the iterator from the `openIterators` list. After this PR, the iterators themselves will not know about lists of open iterators. Instead, a generic close callback is exposed, and it's the responsibility of the store that creates a new iterator to set the callback on the iterator, to ensure that closing an iterator removes the iterator from the list of open iterators.

This refactor is desirable because it enables more flexible iterator lifecycle management. Building on top of this, RocksDBStore is updated with an option to allow the user (i.e., the caller of methods such as `range()` and `prefixScan()` which return iterators) to pass a custom `openIterators` list for the new iterator to be stored in. This will allow for a new Segments implementation where multiple Segments can share the same RocksDBStore instance, while having each Segment manage its own open iterators.

Part of KIP-889.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-01-31 00:05:43 -08:00
Matthias J. Sax dc01199271
KAFAK-14660: Fix divide-by-zero vulnerability (#13175)
This PR adds a safe-guard for divide-by-zero. While `totalCapacity` can never be zero, an explicit error message is desirable.

Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2023-01-30 23:39:41 -08:00
Lucas Brutschy 1d0585563b
MINOR: fix flaky DefaultStateUpdaterTest (#13160)
Mockito should not make named topologies paused by default.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2023-01-24 15:32:04 -08:00
A. Sophie Blee-Goldman 3799708ff0
KAFKA-14533: re-enable 'false' and disable the 'true' parameter of SmokeTestDriverIntegrationTest (#13156)
I immediately saw a failure with stateUpdaterEnabled = true after disabling the false parameter, which suggests the problem actually does lie in the state updater itself and not the act of parametrization of the test. To verify this theory, and help stabilize the 3.4 release branch, let's try one more test by swapping out the true build in favor of the false one. If the listOffsets requests stop failing and causing this integration test to hit the global timeout as is currently happening at such a high rate, then we have pretty good evidence pointing at the state updater and should be able to debug things more easily from there.

After getting in a few builds to see whether the flakiness subsides, we should merge this PR to re-enable both parameters going forward: https://github.com/apache/kafka/pull/13155

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2023-01-24 00:14:20 -08:00
A. Sophie Blee-Goldman ee8e757878
temporarily disable the 'false' parameter (#13147)
Need to get a clean build for 3.4 and this test has been extremely flaky. I'm looking into the failure as well, and want to pinpoint whether it's the true build that's broken or it's the parameterization itself causing this -- thus, let's start by temporarily disabling the false parameter first.

See KAFKA-14533 for more details

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2023-01-23 15:24:35 -08:00
A. Sophie Blee-Goldman 0601fa0935
MINOR: fix flaky integrations tests by using 60s default timeout for startup (#13141)
The timeouts used for starting up Streams and waiting for the RUNNING state are all over the place across our integration tests, with some as low as 15s (which are unsurprisingly rather flaky). We use 60s as the default timeout for other APIs in the IntegrationTestUtils so we should do the same for #startApplicationAndWaitUntilRunning

I also noticed that we have several versions of that exact API in StreamsTestUtils, so I migrated everyone over to the IntegrationTestUtils#startApplicationAndWaitUntilRunning and added a few overloads for ease of use, including one for single KafkaStreams apps and one for using the default timeout

Reviewers: Matthias J. Sax <mjsax@apache.org>
2023-01-22 15:57:58 -08:00
A. Sophie Blee-Goldman 123e0e9ca9
MINOR: fix warnings in Streams javadocs (#13132)
While working on the 3.4 release I noticed we've built up an embarrassingly long list of warnings within the Streams javadocs. It's unavoidable for some links to break as the source code changes, but let's reset back to a good state before the list gets even longer

Reviewers: Matthias J. Sax <mjsax@apache.org>, Walker Carlson <wcarlson@confluent.io>
2023-01-20 14:19:11 -08:00
Christo Lolov e235e1a3fe
KAFKA-14132: Replace PowerMock and EasyMock with Mockito in streams tests (#12821)
Batch 1 of the tests detailed in https://issues.apache.org/jira/browse/KAFKA-14132 which use PowerMock/EasyMock and need to be moved to Mockito.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2023-01-19 18:44:08 +01:00
Christo Lolov 90967e81e2
Replace EasyMock with Mockito in streams tests (#12818)
Batch 6 of the tests detailed in https://issues.apache.org/jira/browse/KAFKA-14133 which use EasyMock and need to be moved to Mockito.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2023-01-19 14:55:01 +01:00
Divij Vaidya b2bc72dc79
MINOR: Include the inner exception stack trace when re-throwing an exception (#12229)
While wrapping the caught exception into a custom one, information about the caught
exception is being lost, including information about the stack trace of the exception.

When re-throwing an exception, we either include the original exception or the relevant
information is added to the exception message.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Luke Chen <showuon@gmail.com>, dengziming <dengziming1993@gmail.com>, Matthew de Detrich <mdedetrich@gmail.com>
2023-01-15 15:03:23 -08:00
Federico Valeri 111f02cc74
KAFKA-14568: Move FetchDataInfo and related to storage module (#13085)
Part of KAFKA-14470: Move log layer to storage module.

Reviewers: Ismael Juma <ismael@juma.me.uk>

Co-authored-by: Ismael Juma <ismael@juma.me.uk>
2023-01-12 21:32:23 -08:00
Lucas Brutschy 22606a0a4d
KAFKA-14530: Check state updater more often (#13017)
In the new state restoration code, the state updater needs to be checked regularly
by the main thread to transfer ownership of tasks back to the main thread once the
state of the task is restored. The more often we check this, the faster we can
start processing the tasks.

Currently, we only check the state updater once in every loop iteration of the state
updater. And while we couldn't observe this to be strictly not often enough, we can
increase the number of checks easily by moving the check inside the inner processing
loop. This would mean that once we have iterated over `numIterations` records, we can
already start processing tasks that have finished restoration in the meantime.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2023-01-12 12:40:07 +01:00
Christo Lolov 78d4458b94
KAFKA-14003 Kafka Streams JUnit4 to JUnit5 migration part 2 (#12301)
This pull request addresses https://issues.apache.org/jira/browse/KAFKA-14003. It is the second of a series of pull requests which address the move of Kafka Streams tests from JUnit 4 to JUnit 5.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2023-01-11 09:26:48 +01:00
José Armando García Sancio 896573f9bc
KAFKA-14279: Add 3.3.x streams system tests (#13077)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-01-09 23:37:05 -08:00
A. Sophie Blee-Goldman 2060b057b0
MINOR: bump streams quickstart pom versions and add to list in gradle.properties (#13064)
The three pom files for the Streams quickstart also need to bump their versions after a branch cut. For some reason these are included (late) in the Release Process guide, but are missing from the list of what to update when bumping the version in gradle.properties. This commit adds the missing files to this list to help future RMs locate all the required version changes

Reviewers: Ismael Juma <ismael@juma.me.uk>
2023-01-09 15:51:44 -08:00
Ismael Juma 96d9710c17
KAFKA-14478: Move LogConfig/CleanerConfig and related to storage module (#13049)
Additional notable changes to fix multiple dependency ordering issues:

* Moved `ConfigSynonym` to `server-common`
* Moved synonyms from `LogConfig` to `ServerTopicConfigSynonyms `
* Removed `LogConfigDef` `define` overrides and rely on
   `ServerTopicConfigSynonyms` instead.
* Moved `LogConfig.extractLogConfigMap` to `KafkaConfig`
* Consolidated relevant defaults from `KafkaConfig`/`LogConfig` in the latter
* Consolidate relevant config name definitions in `TopicConfig`
* Move `ThrottledReplicaListValidator` to `storage`

Reviewers: Satish Duggana <satishd@apache.org>, Mickael Maison <mickael.maison@gmail.com>
2023-01-04 02:42:52 -08:00
Satish Duggana 8d28c3d55e
MINOR Fix checkstyle failures in streams/examples module. (#13055)
MINOR Fix checkstyle failures in streams/examples module. (#13055)
2022-12-29 16:29:18 +05:30
Himani Arora 202a8cd255
MINOR: Fixed type in KTable JavaDocs(#6867)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-12-28 16:06:59 -08:00
Chia-Ping Tsai a1db11e82b
MINOR: remove unused org.apache.kafka.streams.processor.internals.RestoringTasks (#10164)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-12-28 15:50:37 -08:00
David Karlsson 4e1b6d3f28
MINOR: Update WordCountTransformerDemo comments (#12470)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-12-28 15:39:19 -08:00
Vladimir Korenev eeedde7ea9
MINOR: Add implicit for Serde[UUID] to Streams Scala API (#8335)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-12-28 14:54:22 -08:00
Josep Prat 5f1810209f
MINOR: Fix small warning on javadoc and scaladoc (#11049)
Escape the `>` character in javadoc
Escape the `$` character when part of `${}` in scaladoc as this is the way to reference a variable

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-12-28 13:41:45 -08:00
Qing 9c6c6bfa2b
KAFKA-13817 Always sync nextTimeToEmit with wall clock (#12166)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Hao Li <hli@confluent.io>
2022-12-28 12:32:54 -08:00
Greg Harris 8f0e6c6334
KAFKA-13881: Add Streams package infos (#12936)
Reviewers: Christo Lolov (@clolov), Matthias J. Sax <matthias@confluent.io>
2022-12-27 15:37:25 -08:00
Hao Li ca15735fa7
MINOR: remove onChange call in stream assignor assign method (#13034)
Reviewers: John Roesler <vvcephei@apache.org>
2022-12-21 18:32:05 -06:00
Lucas Brutschy 26daa8d610
MINOR: Fix various memory leaks in tests (#12959)
Various tests in the streams park were leaking native memory.

Most tests were fixed by closing the corresponding rocksdb resource.

I tested that the corresponding leak is gone by using a previous rocksdb
release with finalizers and checking if the finalizers would be called at some
point.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-12-21 13:38:05 +01:00
vamossagar12 409794b5ae
KAFKA-14461: Move StoreQueryIntegrationTest to junit5 and fixing logic in a couple of tests for finding active streams (#13014)
StoreQueryIntegrationTest#shouldQuerySpecificActivePartitionStores and StoreQueryIntegrationTest#shouldQueryOnlyActivePartitionStoresByDefault has a logic to find active partitions by doing a modulo with 2 and comparing the remainder. This can break when a new test is added and since Junit chooses an arbitrary order to run the tests, modulo checks can fail. This PR tries to make it deterministic.
Also, this PR uses Junit5 annotations so that the cluster and input topic can be setup/destroyed once.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-12-21 13:33:55 +01:00
Lucas Brutschy 9df069f372
KAFKA-14299: Avoid interrupted exceptions during clean shutdown (#13026)
The call to `interrupt` on the state updater thread during shutdown
could interrupt the thread while writing the checkpoint file. This
can cause a failure to write the checkpoint file and a misleading
stack trace in the logs.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-12-21 08:48:12 +01:00
Bill Bejeck ea65d74f6b
MINOR: No error with zero results state query (#13002)
This PR updates StateQueryResult.getOnlyPartitionResult() to not throw an IllegaArgumentException when there are 0 query results.

Added a test that will fail without this patch

Reviewers: John Roesler<vvcephei@apache.org>
2022-12-19 13:39:06 -05:00
vamossagar12 a46d16e7ab
Removing Multicasting partitioner for IQ (#12977)
Follow up PR for KIP-837. We don't want to allow multicasting for IQ. This PR imposes that restriction.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-12-15 15:09:41 -08:00
Hao Li 9b23d9305d
KAFKA-14395: add config to configure client supplier (#12944)
Implements KIP-884.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-12-14 09:17:27 -08:00
vamossagar12 73ea6986df
KAFKA-13602: Remove unwanted logging in RecordCollectorImpl.java (#12985)
There is unwanted logging introduced by #12803 as pointed out in this comment: #12803 (comment). This PR removes it.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2022-12-13 16:36:00 +01:00
vamossagar12 2fa1879247
KAFKA-14454: Making unique StreamsConfig for tests (#12971)
Newly added test KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest#shouldThrowIllegalArgumentExceptionWhenCustomPartionerReturnsMultiplePartitions as part of KIP-837 passes when run individually but fails when is part of IT class and hence is marked as Ignored.

That seemed to have been because of the way StreamsConfig was being initialised so any new test would have used the same names. Because of which the second test never got to the desired state. With this PR, every test gets a unique app name which seems to have fixed the issue. Also, a couple of cosmetic changes

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-12-09 17:51:42 -08:00
A. Sophie Blee-Goldman d9b139220e
KAFKA-14318: KIP-878, Introduce partition autoscaling configs (#12962)
First PR for KIP-878: Internal Topic Autoscaling for Kafka Streams

Introduces two new configs related to autoscaling in Streams: a feature flag and retry timeout. This PR just adds the configs and gets them passed through to the Streams assignor where they'll ultimately be needed/used

Reviewers: Bill Bejeck <bill@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2022-12-09 15:02:36 -08:00
Lucas Brutschy 36a2f7bfd0
KAFKA-14432: RocksDBStore relies on finalizers to not leak memory (#12935)
RocksDBStore relied on finalizers to not leak memory (and leaked memory after the upgrade to RocksDB 7).
The problem was that every call to options.statistics creates a new wrapper object that needs to be finalized.

I simplified the logic a bit and moved the ownership of the statistics from ValueProvider to RocksDBStore.

Reviewers: Bruno Cadonna <cadonna@apache.org>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Christo Lolov <lolovc@amazon.com>
2022-12-07 18:25:58 -08:00
Lucia Cerchie 923fea583b
KAFKA-14260: add `synchronized` to `prefixScan` method (#12893)
As a result of "14260: InMemoryKeyValueStore iterator still throws ConcurrentModificationException", I'm adding synchronized to prefixScan as an alternative to going back to the ConcurrentSkipList.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-12-06 19:39:32 -08:00
Lucas Brutschy 96b1db510a
KAFKA-14415: Faster ThreadCache (#12903)
Optimization of `ThreadCache`. The original implementation showed significant slow-down when many caches were registered.

`sizeBytes` was called at least once, and potentially many times
in every `put` and was linear in the number of caches (= number of
state stores, so typically proportional to number of tasks). That
means, with every additional task, every put gets a little slower.
This was confirmed experimentally.

In this change, we modify the implementation of `ThreadCache` to
keep track of the total size in bytes. To be independent of the
concrete implementation of the underlying cache, we update the size
by subtracting the old and adding the new size of the cache before
and after every modifying operation. For this we acquire the object
monitor of the cache, but since all modifying operations on the caches
are synchronized already, this should not cause extra overhead.

This change also fixes a `ConcurrentModificationException` that could
be thrown in a race between `sizeBytes` and `getOrCreate`.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-12-06 17:22:42 -08:00
vamossagar12 77e294e7fc
KAFKA-13602: Adding ability to multicast records (#12803)
This PR implements KIP-837 which enhances StreamPartitioner to multicast records.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, YEONCHEOL JANG
2022-12-06 02:01:38 -08:00
Divij Vaidya f1568e5996
MINOR: Prevent NPE in SmokeTestDriver (fix flaky test) (#12908)
SmokeTestDriverIntegrationTest.java can be flaky because a NullPointerException prevents the retry mechanism that is added to prevent flakiness for this test. This change, prevents the NullPointerException and hence, allows the test to retry itself.

Reviewers: Luke Chen <showuon@gmail.com>, Lucas Brutschy <lbrutschy@confluent.io>
2022-12-06 10:52:58 +08:00
vamossagar12 6663acff23
KAFKA-13152: Add cache size metrics (#12778)
Adds the new DEBUG metric cache-size-bytes-total

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-11-30 17:54:37 -08:00
Joel Hamill d9946a7ffc
MINOR: Fix config documentation formatting (#12921)
Reviewers: José Armando García Sancio <jsancio@apache.org>
2022-11-30 08:54:39 -08:00
Lucas Brutschy 9ea3d0d1c8
KAFKA-12679: Handle lock exceptions in state updater (#12875)
In this change, we enable backing off when the state directory
is still locked during initialization of a task. When the state
directory is locked, the task is reinserted into the
initialization queue. We will reattempt to acquire the lock
after the next round of polling.

Tested through a new unit test.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-11-28 17:17:14 +01:00
Lucas Brutschy fea0eb4ca3
KAFKA-14299: Handle double rebalances better (#12904)
The original implementation of the state updater could not
handle double rebalances within one poll phase correctly,
because it could create tasks more than once if they hadn't
finished initialization yet.

In a55071a, we
moved initialization to the state updater to fix this. However,
with more testing, I found out that this implementation has
it's problems as well: There are problems with locking the
state directory (state updater acquired the lock to the state
directory, so the main thread wouldn't be able to clear the
state directory when closing the task), and benchmarks also
show that this can lead to useless work (tasks are being
initialized, although they will be taken from the thread soon
after in a follow-up rebalance).

In this PR, I propose to revert the original change, and fix
the original problem in a much simpler way: When we
receive an assignment, we simply clear out the
list of tasks pending initialization. This way, no double
tasks instantiations can happen.

The change was tested in benchmarks, system tests,
and the existing unit & integration tests. We also add
the state updater to the smoke integration test, which
triggered the double task instantiations before.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-11-28 13:16:44 +01:00
Christo Lolov 54efc4f109
KAFKA-14133: Replace EasyMock with Mockito in streams tests (#12505)
Batch 2 of the tests detailed in https://issues.apache.org/jira/browse/KAFKA-14133 which use EasyMock and need to be moved to Mockito.

Reviewers: Matthew de Detrich <matthew.dedetrich@aiven.io>, Dalibor Plavcic <dalibor.os@proton.me>, Bruno Cadonna <cadonna@apache.org
2022-11-21 13:12:22 +01:00
Jorge Esteban Quilcate Otoya 0de037423b
KAFKA-14325: Fix NPE on Processor Parameters toString (#12859)
Handle null processor supplier

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-11-20 18:24:04 -08:00
A. Sophie Blee-Goldman 56ab2f8034
KAFKA-14382: wait for current rebalance to complete before triggering followup (#12869)
Fix for the subtle bug described in KAFKA-14382 that was causing rebalancing loops. If we trigger a new rebalance while the current one is still ongoing, it may cause some members to fail the first rebalance if they weren't able to send the SyncGroup request in time (for example due to processing records during the rebalance). This means those consumers never receive their assignment from the original rebalance, and won't revoke any partitions they might have needed to. This can send the group into a loop as each rebalance schedules a new followup cooperative rebalance due to partitions that need to be revoked, and each followup rebalance causes some consumer(s) to miss the SyncGroup and never revoke those partitions.

Reviewers: John Roesler <vvcephei@apache.org>
2022-11-18 22:38:58 -08:00
Nick Telford 1d6430249b
KAFKA-14406: Fix double iteration of restoring records (#12842)
While restoring a batch of records, RocksDBStore was iterating the ConsumerRecords, building a list of KeyValues, and then iterating that list of KeyValues to add them to the RocksDB batch.

Simply adding the key and value directly to the RocksDB batch prevents this unnecessary second iteration, and the creation of itermediate KeyValue objects, improving the performance of state restoration, and reducing unnecessary object allocation.

This also simplifies the API of RocksDBAccessor, as prepareBatchForRestore is no longer needed.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Walker Carlson <wcarlson@confluent.io>
2022-11-18 20:44:56 -08:00
Bill Bejeck 3012332e3d
KAFKA-14388 - Fixes the NPE when using the new Processor API with the DSL (#12861)
With the addition of the new Processor API the newly added FixedKeyProcessorNodeFactory extends the ProcessorNodeFactory class. The ProcessorNodeFactory had a private field Set<String> stateStoreNames initialized to an empty see. The FixedKeyProcessorNodeFactory also had a private field Set<String> stateStoreNames.

When executing InternalTopologyBuilder.build executing the buildProcessorNode method passed any node factory as ProcessorNodeFactory and the method references the stateStoreNames field, it's pointing to the superclass field, which is empty so the corresponding StoreBuilder(s) are never added - causing NPE in the topology.

This PR makes the field protected on the ProcessorNodeFactory class so FixedKeyProcessorNodeFactory inherits it.

The added test fails without this change.

Reviewers: Matthias J. Sax <mjsax@apache.org>,  Sophie Blee-Goldman <sophie@confluent.io>, Jorge Esteban Quilcate Otoya <quilcate.jorge@gmail.com>
2022-11-16 17:06:15 -05:00
Hao Li 76214bfb85
KAFKA-13785: Add JavaDocs for emit final (#12867)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-11-16 11:30:24 -08:00
Lucas Brutschy a55071a99f
KAFKA-14299: Initialize tasks in state updater (#12795)
The state updater code path puts tasks into an
"initialization queue", with created, but not initialized tasks.
These are later, during the event-loop, initialized and added
to the state updater. This might lead to losing track of those 
tasks - in particular it is possible to create
tasks twice, if we do not go once around `runLoop` to initialize
the task. This leads to `IllegalStateExceptions`. 

By handing the task to the state updater immediately and let the
state updater initialize the task, we can fulfil our promise to 
preserve the invariant "every task is owned by either the task 
registry or the state updater".

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-11-14 10:00:29 +01:00
A. Sophie Blee-Goldman e422a67d3f
KAFKA-14294: check whether a transaction is in flight before skipping a commit (#12835)
Add a new #transactionInFlight API to the StreamsProducer to expose the flag of the same name, then check whether there is an open transaction when we determine whether or not to perform a commit in TaskExecutor. This is to avoid unnecessarily dropping out of the group on transaction timeout in the case a transaction was begun outside of regular processing, eg when a punctuator forwards records but there are no newly consumer records and thus no new offsets to commit.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-11-14 09:43:46 +01:00
A. Sophie Blee-Goldman 51b7eb7937
KAFKA-14282: stop tracking Produced sensors by processor node id (#12836)
Users have been seeing a large number of these error messages being logged by the RecordCollectorImpl:

Unable to records bytes produced to topic XXX by sink node YYY as the node is not recognized.
It seems like we try to save all known sink nodes when the record collector is constructed, by we do so by going through the known sink topics which means we could miss some nodes, for example if dynamic topic routing is used. Previously we were logging an error and would skip recording the metric if we tried to send a record from a sink node it didn't recognize, but there's not really any reason to have been tracking the sensors by node in the first place -- we can just track the actual sink topics themselves.

Reviewers: John Roesler <vvcephei@apache.org>, Christo Lolov <lolovc@amazon.com>
2022-11-11 17:58:08 -08:00
Christo Lolov 876c338a60
[KAFKA-14324] Upgrade RocksDB to 7.1.2 (#12809)
Reviewers: Bruno Cadonna <cadonna@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-11-11 17:48:38 -08:00
Lucas Brutschy ce5faa222b
MINOR: Fix flaky RestoreIntegrationTest (#12841)
RestoreIntegrationTest used polling to determine if a rebalance
happens on one client, but if the rebalance would happen too quickly,
the polling would not pick it up and the check would time out.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-11-10 14:11:03 +01:00
Lucas Brutschy c034388a0a
KAFKA-14299: Avoid allocation & synchronization overhead in StreamThread loop (#12808)
The state updater code path introduced allocation and synchronization
overhead by performing relatively heavy operations in every iteration of
the StreamThread loop. This includes various allocations and acquiring
locks for handling `removedTasks` and `failedTasks`, even if the
corresponding queues are empty.

This change introduces `hasRemovedTasks` and
`hasExceptionsAndFailedTasks` in the `StateUpdater` interface that
can be used to skip over any allocation or synchronization. The new
methods do not require synchronization or memory allocation.

This change increases throughput by ~15% in one benchmark.

We extend existing unit tests to cover the slightly modified
behavior.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-11-08 17:55:37 +01:00
Lucas Brutschy 4560978ed7
KAFKA-14309: FK join upgrades not tested with DEV_VERSION (#12760)
The streams upgrade system inserted FK join code for every version of the
the StreamsUpgradeTest except for the latest. Also, the original code
never switched on the `test.run_fk_join` flag for the target version of
the upgrade.

The effect was that FK join upgrades were not tested at all, since
no FK join code was executed after the bounce in the system test.

We introduce `extra_properties` in the system tests, that can be used
to pass any property to the upgrade driver, which is supposed to be
reused by system tests for switching on and off flags (e.g. for the
state restoration code).

Reviewers: Alex Sorokoumov <asorokoumov@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-11-07 15:46:51 -08:00
Lucas Brutschy cd5f6c60b5
KAFKA-14299: Avoid busy polling in state updater (#12772)
The state updater can enter a busy polling loop if it
only updates standby tasks. We need to use the user-provided
poll-time to update always when using the state updater, since
the only other place where the state update blocks
(inside `waitIfAllChangelogsCompletelyRead`) is also
not blocking if there is at least one standby task.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@apache.org>
2022-11-07 16:46:25 +01:00
Ashmeet Lamba a971448f3f
KAFKA-14254: Format timestamps as dates in logs (#12684)
Improves logs withing Streams by replacing timestamps to date instances to improve readability.

Approach - Adds a function within common.utils.Utils to convert a given long timestamp to a date-time string with the same format used by Kafka's logger.

Reviewers: Divij Vaidya <diviv@amazon.com>, Bruno Cadonna <cadonna@apache.org>
2022-11-07 13:42:39 +01:00
Lucas Brutschy 37a3645e7e
KAFKA-14299: Return emptied ChangelogReader to ACTIVE_RESTORING (#12773)
The ChangelogReader starts of in `ACTIVE_RESTORING` state, and
then goes to `STANDBY_RESTORING` when changelogs from standby
tasks are added. When the last standby changelogs are removed,
it remained in `STANDBY_RESTORING`, which means that an empty
ChangelogReader could be in either `ACTIVE_RESTORING` or
`STANDBY_RESTORING` depending on the exact sequence of
add/remove operations. This could lead the state updater into
an illegal state. Instead of changing the state updater,
I propose to stengthen the state invariant of the
`ChangelogReader` slightly: it should always be in
`ACTIVE_RESTORING` state, when empty.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@apache.org>
2022-11-07 13:35:18 +01:00
vamossagar12 7fd6a9b3e2
Kafka 12960: Follow up Commit to filter expired records from Windowed/Session Stores (#12756)
KAFKA-12960: Enforcing strict retention time for WindowStore and SessionStore

Reviewers: Luke Chen <showuon@gmail.com>, Vicky Papavasileiou
2022-11-07 11:53:34 +08:00
Lucas Brutschy e7c1e4a0a1
KAFKA-14299: Handle TaskCorruptedException during initialization (#12771)
State stores are initialized from the StreamThread even when
the state updater thread is enabled. However, we were missing
the corresponding handling of exceptions when thrown directly
during the initialization. In particular, TaskCorruptedException
would directly fall through to runLoop, and the task
would fall out of book-keeping, since the exception is thrown
when neither the StreamThread nor the StateUpdater is owning
the task.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@apache.org>
2022-11-01 18:25:19 +01:00
dengziming 56d588d55a
MINOR: Fix SubscriptionInfoData name in exception message (#12076)
Reviewers: Andrew Choi <andrewchoi5@users.noreply.github.com>, Luke Chen <showuon@gmail.com>
2022-10-31 10:48:16 +08:00
Divij Vaidya 5e399fe6f3
Move to mockito (#12465)
This PR build on top of #11017. I have added the previous author's comment in this PR for attribution. I have also addressed the pending comments from @chia7712 in this PR.

Notes to help the reviewer:

Mockito has mockStatic method which is equivalent to PowerMock's method.
When we run the tests using @RunWith(MockitoJUnitRunner.StrictStubs.class) Mockito performs a verify() for all stubs that are mentioned, hence, there is no need to explicitly verify the stubs (unless you want to verify the number of times etc.). Note that this does not work for static mocks.

Reviewers: Bruno Cadonna <cadonna@apache.org>, Walker Carlson <wcarlson@confluent.io>, Bill Bejeck <bbejeck@apache.org>
2022-10-27 14:08:44 -04:00
Lucas Brutschy 732887b210
MINOR: Get console output in quickstart examples (#12719)
Quickstart examples didn't produce any console output, since it was missing a logger implementation in the classpath.

Also some minor cleanups.

Tested by creating a test project and running the code.
2022-10-24 11:44:57 -04:00
vamossagar12 9a793897ec
KAFKA-13152: KIP-770, cache size config deprecation (#12758)
PR implementing KIP-770 (#11424) was reverted as it brought in a regression wrt pausing/resuming the consumer. That KIP also introduced a change to deprecate config CACHE_MAX_BYTES_BUFFERING_CONFIG and replace it with STATESTORE_CACHE_MAX_BYTES_CONFIG.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-10-20 17:03:50 -07:00
Lucas Brutschy 2c8f14c57e
KAFKA-14299: Never transition to UpdateStandby twice (#12762)
In two situations, the current code could transition the ChangelogReader
to UpdateStandby when already in that state, causing an IllegalStateException. 
Namely these two cases are:

1. When only standby tasks are restoring and one of them crashes.
2. When only standby tasks are restoring and one of them is paused.

This change fixes both issues by only transitioning if the paused or
failed task is an active task.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-10-19 09:29:19 +02:00
Lucas Brutschy 7a7ad9b422
KAFKA-14299: Fix busy polling with separate state restoration (#12749)
StreamThread in state PARTITIONS_ASSIGNED was running in
a busy loop until restoration is finished, stealing CPU
cycles from restoration.

Make sure the StreamThread uses poll_time when
state updater is enabled, and we are in state
PARTITIONS_ASSIGNED.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-10-18 21:33:40 +02:00
Lucas Brutschy cc582897bf
KAFKA-14299: Fix incorrect pauses in separate state restoration (#12743)
The original code path paused the main consumer for
all tasks before entering the restoration section
of the code, and then resumed all after restoration
has finished.

In the new state updater part of the code, tasks that
do not require restoration skip the restoration completely.
They remain with the TaskManger and are never transferred
to the StateUpdater, and thus are never resumed.

This change makes sure that tasks that remain with the
TaskManager are not paused.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-10-18 12:16:44 +02:00
Guozhang Wang 55a3a95b7a
Kafka Streams Threading P3: TaskManager Impl (#12754)
0. Add name to task executors.
1. DefaultTaskManager implementation, for interacting with the TaskExecutors and support add/remove/lock APIs.
2. Related unit tests.
2022-10-14 16:10:57 -07:00
Guozhang Wang dfb5929665
Kafka Streams Threading P2: Skeleton TaskExecutor Impl (#12744)
0. Address comments from P1.
1. Add the DefaultTaskExecutor implementation class.
2. Related DefaultTaskExecutorTest.

Pending in future PRs: a) exception handling, primarily to send them to polling thread, b) light-weight task flushing procedure.
2022-10-14 15:32:48 -07:00
Bruno Cadonna 484f85ff53
HOTFIX: Revert "KAFKA-12960: Enforcing strict retention time for WindowStore and Sess… (#11211)" (#12745)
This reverts commit 07c1002489 which broke trunk.

Reviewers: David Jacot <djacot@confluent.io>, Bill Bejeck <bbejeck@apache.org>
2022-10-13 13:27:19 -07:00
Chris Egerton 18e60cb000
KAFKA-12497: Skip periodic offset commits for failed source tasks (#10528)
Also moves the Streams LogCaptureAppender class into the clients module so that it can be used by both Streams and Connect.

Reviewers: Nigel Liang <nigel@nigelliang.com>, Kalpesh Patel <kpatel@confluent.io>, John Roesler <vvcephei@apache.org>, Tom Bentley <tbentley@redhat.com>
2022-10-13 10:15:42 -04:00
vamossagar12 07c1002489
KAFKA-12960: Enforcing strict retention time for WindowStore and Sess… (#11211)
WindowedStore and SessionStore do not implement a strict retention time in general. We should consider to make retention time strict: even if we still have some record in the store (due to the segmented implementation), we might want to filter expired records on-read. This might benefit PAPI users.

This PR, adds the filtering behaviour in the Metered store so that, it gets automatically applied for cases when a custom state store is implemented

Reviewer: Luke Chen <showuon@gmail.com>, A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <mjsax@apache.org>
2022-10-13 09:39:58 +08:00
Guozhang Wang 69059b5f28
Kafka Streams Threading P1: Add Interface for new TaskManager and TaskExecutor (#12737)
The interfaces (and their future impls) are added under the processor/internals/tasks package, to distinguish with the existing old classes:

1. TaskExecutor is the interface for a processor thread. It takes at most one task to process at a given time from the task manager. When being asked from the task manager to un-assign the current processing task, it will stop processing and give the task back to task manager.
2. TaskManager schedules all the active tasks to assign to TaskExecutors. Specifically: 1) when a task executor ask it for an unassigned task to process (assignNextTask), it will return the available task based on its scheduling algorithm. 2) when the task manager decides to commit (all) tasks, or when a rebalance event requires it to modify the maintained active tasks (via onAssignment), it will lock all the tasks that are going to be closed / committed, asking the TaskExecutor to give them back if they were being processed at the moment.

Reviewers: John Roesler <vvcephei@apache.org>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-10-12 16:33:13 -07:00
Bruno Cadonna daae2a189d
HOTFIX: Only update input partitions of standby tasks if they really changed (#12730)
Updating the input partitions of tasks also updates the mapping from
source nodes to input topics in the processor topology within the task.
The mapping is updated with the topics from the topology metadata.
The topology metadata does not prefix intermediate internal topics with
the application ID. Thus, if a standby task has input partitions from an
intermediate internal topic the update of the mapping in the processor
topology leads to an invalid topology exception during recycling of a
standby task to an active task when the input queues are created. This
is because the input topics in the processor topology and the input
partitions of the task do not match because the former miss the
application ID prefix.

The added verification to only update input partitions of standby tasks
if they really changed avoids the invalid topology exception if the
standby task only has input partitions from intermediate internal
topics since they should never change. If the standby task has input
partitions from intermediate internal topics and external topics
subscribed to via a regex pattern, the invalid topology exception
might still be triggered.

Reviewers: Guozhang Wang <guozhang@apache.org>, John Roesler <vvcephei@apache.org>
2022-10-11 18:13:58 +02:00
Walker Carlson cbdcd20ac1
MINOR: Include all hosts in metadata for topology (#12594)
When building streams metadata we want to build even if the host is empty as it is a common way to find the other host addresses

Reviewers: John Roesler <vvcephei@apache.org>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-10-06 21:33:00 -07:00
Vicky Papavasileiou 1cb7736de1
KAFKA-14209 : Integration tests 3/3 (#12676)
Tests for 21a15c6b1f
Implements KIP-862: https://cwiki.apache.org/confluence/x/WSf1D

Reviewer: John Roesler <vvcephei@apache.org>
2022-10-06 19:07:34 -05:00
Vicky Papavasileiou 21a15c6b1f
KAFKA-14209 : Rewrite self joins to use single state store 2/3 (#12644)
Implements KIP-862: https://cwiki.apache.org/confluence/x/WSf1D

Reviewers: Guozhang Wang <guozhang@apache.org>,  Austin Heyne <aheyne>, John Roesler <vvcephei@apache.org>
2022-10-05 07:36:04 -05:00
Christo Lolov 3a9efc77b2
KAFKA-14133: Replace EasyMock with Mockito in streams tests (#12527)
Batch 4 of the tests detailed in https://issues.apache.org/jira/browse/KAFKA-14133 which use EasyMock and need to be moved to Mockito.

Reviewers: Dalibor Plavcic <dalibor.os@proton.me>, Bruno Cadonna <cadonna@apache.org>
2022-09-30 11:20:51 +02:00
Guozhang Wang d62a42df2e
KAFKA-10199: Integrate Topology Pause/Resume with StateUpdater (#12659)
When a topology is paused / resumed, we also need to pause / resume its corresponding tasks inside state updater.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-09-28 16:26:01 -07:00
Bruno Cadonna 07a31599c3
KAFKA-10199: Fix switching to updating standbys if standby is removed (#12687)
When the state updater only contains standby tasks and then a
standby task is removed, an IllegalStateException is thrown
because the changelog reader does not allow to switch to standby
updating mode more than once in a row.

This commit fixes this bug by checking that the removed task is
an active one before trying to switch to standby updating mode.
If the task to remove is a standby task then either we are already
in standby updating mode and we should not switch to it again or
we are not in standby updating mode which implies that there are
still active tasks that would prevent us to switch to standby
updating mode.

Reviewer: Guozhang Wang <wangguoz@gmail.com>
2022-09-26 20:34:09 +02:00
Ahmed Sobeh b0ace18035
KAFKA-14239: Merge StateRestorationIntegrationTest into RestoreIntegrationTest (#12670)
This PR makes the following changes:

* Moves the only test in StateRestorationIntegrationTest into RestoreIntegrationTest
* Deletes StateRestorationIntegrationTest

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-09-23 17:15:25 -07:00
Vicky Papavasileiou cda5da9b65
KAFKA-14209: Change Topology optimization to accept list of rules 1/3 (#12641)
This PR is part of a series implementing the self-join rewriting. As part of it, we decided to clean up the TOPOLOGY_OPTIMIZATION_CONFIG and make it a list of optimization rules. Acceptable values are: NO_OPTIMIZATION, OPTIMIZE which applies all optimization rules or a comma separated list of specific optimizations.

Reviewers: Guozhang Wang <guozhang@apache.org>, John Roesler <vvcephei@apache.org>
2022-09-22 11:20:37 -05:00
Bruno Cadonna b4fa3496e1
KAFKA-10199: Adapt restoration integration tests to state updater (#12650)
Transforms the integration test that verifies restoration in a
parametrized test. The parametrized test runs once with
state updater enabled and once with state updater disabled.

Reviewer: Guozhang Wang <wangguoz@gmail.com>
2022-09-19 19:27:17 +02:00
Nikolay 51b079dca7
KAFKA-12878: Support --bootstrap-server in kafka-streams-application-reset tool (#12632)
Reviewers: Chris Egerton <chrise@aiven.io>
2022-09-19 13:20:41 -04:00
Bruno Cadonna a1f3c6d160
KAFKA-10199: Register and unregister changelog topics in state updater (#12638)
Registering and unregistering the changelog topics in the
changelog reader outside of the state updater leads to
race conditions between the stream thread and the state
updater thread. Thus, this PR moves registering and
unregistering of changelog topics in the changelog
reader into the state updater if the state updater
is enabled.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Hao Li <1127478+lihaosky@users.noreply.github.com>
2022-09-16 09:05:11 +02:00
Bruno Cadonna 1ab4596ee6
KAFKA-10199: Suspend tasks in the state updater on revocation (#12600)
In the first attempt to handle revoked tasks in the state updater
we removed the revoked tasks from the state updater and added it to
the set of pending tasks to close cleanly. This is not correct since
a revoked task that is immediately reassigned to the same stream thread
would neither be re-added to the state updater nor be created again.
Also a revoked active task might be added to more than one bookkeeping
set in the tasks registry since it might still be returned from
stateUpdater.getTasks() after it was removed from the state updater.
The reason is that the removal from the state updater is done
asynchronously.

This PR solves this issue by introducing a new bookkeeping set
in the tasks registry to bookkeep revoked active tasks (actually
suspended active tasks).

Additionally this PR closes some testing holes around the modified
code.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Hao Li <1127478+lihaosky@users.noreply.github.com>
2022-09-14 09:03:43 +02:00
Divij Vaidya d4fc3186b4
MINOR: Replace usage of File.createTempFile() with TestUtils.tempFile() (#12591)
Why
Current set of integration tests leak files in the /tmp directory which makes it cumbersome if you don't restart the machine often.

Fix
Replace the usage of File.createTempFile with existing TestUtils.tempFile method across the test files. TestUtils.tempFile automatically performs a clean up of the temp files generated in /tmp/ folder.

Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <mlists@juma.me.uk>
2022-09-13 08:44:21 +08:00
Matthew de Detrich e138772ba5
MINOR: Update Scalafmt to latest version (#12475)
Reviewers: Divij Vaidya <diviv@amazon.com>, Chris Egerton <fearthecellos@gmail.com>
2022-09-12 10:05:15 -04:00
Bruno Cadonna 44b500b679
KAFKA-10199: Separate state updater from old restore (#12583)
Separates the code path for the new state updater from
the code path of the old restoration.

Ensures that with the state updater tasks are processed
before all tasks are running.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Walker Carlson <wcarlson@confluent.io
2022-09-07 14:21:36 +02:00
Christo Lolov cdfe4f98c4
KAFKA-14133: Replace EasyMock with Mockito in streams tests (#12492)
Batch 1 of the tests detailed in https://issues.apache.org/jira/browse/KAFKA-14133 which use EasyMock and need to be moved to Mockito.

Reviewers: Dalibor Plavcic, Matthew de Detrich <mdedetrich@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-09-07 10:25:31 +02:00
Guozhang Wang 8380d2edf4
KAFKA-10199: Handle exceptions from state updater (#12519)
1. In state updater, when handling task corrupted exception due to invalid restoring offset, first delete the affected partitions from the checkpoint before reporting it back to the stream thread. This is to mimic the same behavior in stream threads's StateManager#handleCorruption#closeDirtyAndRevive. It's cleaner to do so inside the restore thread, plus it enables us to optimize by only deleting those corrupted partitions, and not all.
2. In the state manager, handle the drained exceptions as follows (this is the same as handling all exceptions from handleAssignment): 1) Task-migrated, throw all the way to stream-thread as handleTaskMigrated, 2) any fatal Streams exception, throw all the way to stream-thread to trigger exception handler, 3) Task-corrupted, throw to the stream-thread as handleCorruption. Note that for 3), we would specially distinguish if the corrupted-tasks are already closed (when they are thrown from handleAssignment or not (when they are thrown from the state updater).

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-09-02 17:50:23 -07:00
A. Sophie Blee-Goldman 7ec10ce19a
HOTFIX: fix PriorityQueue iteration to assign warmups in priority order (#12585)
Based on a patch submitted to the confluentinc fork & then abandoned. Needed some updates and minor expansion but more or less just re-applied the changes proposed in confluentinc#697.

Original PR has a very detailed justification for these changes but the tl;dr of it is that apparently the PriorityQueue's iterator does not actually guarantee to return elements in priority order.

Reviewer: Luke Chen <showuon@gmail.com>
2022-09-02 18:14:34 +08:00
John Roesler 8b64a9e235
MINOR: Demystify rebalance schedule log (#12582)
Reviewers: Bruno Cadonna <cadonna@apache.org>, Bill Bejeck <bbejeck@apache.org>
2022-09-01 16:34:03 -05:00
Bruno Cadonna 2bef80e360
KAFKA-10199: Remove changelog unregister from state updater (#12573)
Changelogs are already unregistered when tasks are closed.
There is no need to also unregister them in the state
updater.

In future, when we will only have the state updater without
the old code path, we should consider registering and
unregistering the changelogs within the state updater.

Reviewer: Guozhang Wang <wangguoz@gmail.com>
2022-09-01 14:29:39 +02:00
Bruno Cadonna bc8f7d07d9
KAFKA-10199: Shutdown state updater on task manager shutdown (#12569)
When the task manager is shutdown, the state updater should also
shutdown. After the shutdown of the state updater, the tasks
in its output queues should be closed.

Reviewer: Guozhang Wang <wangguoz@gmail.com>
2022-08-31 20:45:53 +02:00
Divij Vaidya 140faf9f2b
KAFKA-13036: Replace EasyMock and PowerMock with Mockito for RocksDBMetricsRecorderTest (#12459)
Changes:
- Migrate to Mockito
- Add more assertive checks using verify
- Minor indentation fixes

Reviewers: Dalibor Plavcic <dalibor.os@proton.me>, Bruno Cadonna <cadonna@apache.org>
2022-08-30 19:25:26 +02:00
Bruno Cadonna 7b07b2676b
KAFKA-10199: Remove tasks from state updater on shutdown (#12562)
The state updater removes its updating and paused task on shutdown.
The removed tasks are added to the output queue for removed tasks.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Walker Carlson <wcarlson@confluent.io>
2022-08-29 18:29:21 +02:00
Bruno Cadonna 0e6a3fa978
KAFKA-10199: Handle restored tasks output by state updater (#12554)
Once the state updater restored an active task it puts it
into an output queue. The stream thread reads the restored
active task from the output queue and after it verified
that the task is still owned by the stream thread it transits
it to RUNNING.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Walker Carlson <wcarlson@confluent.io>
2022-08-29 18:26:02 +02:00
Mickael Maison 0507597597
KAFKA-10360: Allow disabling JMX Reporter (KIP-830) (#12046)
This implements KIP-830: https://cwiki.apache.org/confluence/display/KAFKA/KIP-830%3A+Allow+disabling+JMX+Reporter
It adds a new configuration `auto.include.jmx.reporter` that can be set to false to disable the JMX Reporter. This configuration is deprecated and will be removed in the next major version.

Reviewers: Tom Bentley <tbentley@redhat.com>, Christo Lolov <christo_lolov@yahoo.com>
2022-08-24 18:30:31 +02:00
Bruno Cadonna f191126550
KAFKA-10199: Introduce task registry (#12549)
Currently the task manager stores the tasks it manages in an
internally. We recently extracted the code to store and retrieve
tasks into its own class Tasks. However, the task manager creates
the Tasks object internally and during testing of the task
manager we do not have access to it which makes testing the task
manager quite complex.

This commit externalizes the data structure that the task manager
uses to store and rerieve tasks. It introduces the TasksRegistry
interface and lets the Tasks object implementing TasksRegistry.
The Tasks object is passed into the task manager via its
constructor. Passing the TasksRegistry dependency to the task
manager from outside faciliates simpler testing of the task
manager.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Walker Carlson <wcarlson@confluent.io>
2022-08-24 08:19:40 +02:00
Bruno Cadonna add4ca6c7f
KAFKA-10199: Remove tasks from state updater on revoked and lost partitions (#12547)
Removes tasks from the state updater when the input partitions of the tasks are revoked or partitions are lost during a rebalance.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-08-22 11:50:50 -07:00
Alex Sorokoumov 5c77c544c6
KAFKA-13769 Fix version check in SubscriptionStoreReceiveProcessorSupplier (#12535)
This patch fixes another incorrect version check in the FK code and adds unit tests that would have caught this bug.

Reviewers: John Roesler <vvcephei@apache.org>
2022-08-18 13:20:04 -05:00
Jason Gustafson 0243bb98a7
HOTFIX: Revert KAFKA-10199 which is causing compilation failures (#12532)
Compilation is failing after these two commits:
```
> Task :streams:compileJava
/Users/jgustafson/Projects/kafka/streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java:852: error: cannot find symbol
                        tasks.addPendingTaskToClose(restoringTask.id());
                             ^
  symbol:   method addPendingTaskToClose(org.apache.kafka.streams.processor.TaskId)
  location: variable tasks of type org.apache.kafka.streams.processor.internals.Tasks
1 error
```

Also here:
```

[2022-08-17T20:58:20.912Z] > Task :streams:compileTestJava

[2022-08-17T20:58:20.912Z] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-12530/streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java:822: error: method setupForRevocation(Set<Task>,Set<Task>) is already defined in class TaskManagerTest

[2022-08-17T20:58:20.912Z]     private TaskManager setupForRevocation(final Set<Task> tasksInStateUpdater,
```
 This patch reverts them.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2022-08-17 14:29:49 -07:00
Bruno Cadonna b47c4d8598
KAFKA-10199: Remove tasks from state updater on revocation (#12520)
Removes tasks from the state updater when the input partitions of the tasks are revoked during a rebalance.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-08-17 11:13:34 -07:00
Bruno Cadonna 9f20f89953
KAFKA-10199: Remove tasks from state updater on partition lost (#12521)
Removes tasks from the state updater when the input partitions of the tasks are lost during a rebalance.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-08-17 11:12:30 -07:00
Guozhang Wang dc72f6ec02
KAFKA-10199: Handle task closure and recycling from state updater (#12466)
1. Within the tryCompleteRestore function of the thread, try to drain the removed tasks from state updater and handle accordingly: 1) for recycle, 2) for closure, 3) for update input partitions.
2. Catch up on some unit test coverage from previous PRs.
3. Some minor cleanups around exception handling.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-08-15 19:33:46 -07:00
Bruno Cadonna 75d89931e0
MINOR: Add setting input partitions for task mocks (#12510)
We recently added a builder to create task mocks for unit
tests. This PR adds the functionality to add input partitions
to task mocks when the builder is used.

Reviewers: Walker Carlson <wcarlson@confluent.io>, A. Sophie Blee-Goldman <ableegoldman@apache.org>
2022-08-15 11:16:32 -07:00
Bruno Cadonna 4a3e92b1ab
KAFKA-10199: Expose read only task from state updater (#12497)
The state updater exposes tasks that are in restoration
to the stream thread. To ensure that the stream thread
only accesses the tasks to read from the tasks without
modifying any internal state, this PR introduces a
read-only task that throws an exception if the caller
tries to modify the internal state of a task.

This PR also returns read-only tasks from
DefaultStateUpdater#getTasks().

Reviewer: Guozhang Wang <wangguoz@gmail.com>
2022-08-12 17:03:50 +02:00
Guozhang Wang 5ceaa588ee
HOTFIX / KAFKA-14130: Reduce RackAwarenesssTest to unit Test (#12476)
While working on KAFKA-13877, I feel it's an overkill to introduce the whole test class as an integration test, since all we need is to just test the assignor itself which could be a unit test. Running this suite with 9+ instances takes long time and is still vulnerable to all kinds of timing based flakiness. A better choice is to reduce it as a unit test, similar to HighAvailabilityStreamsPartitionAssignorTest that just test the behavior of the assignor itself, rather than creating many instances hence depend on various timing bombs to not explode.

Since we mock everything, there's no flakiness anymore. Plus we greatly reduced the test runtime (on my local machine, the old integration takes about 35 secs to run the whole suite, while the new one take 20ms on average).

Reviewers: Divij Vaidya <diviv@amazon.com>, Dalibor Plavcic
2022-08-03 15:36:59 -07:00
Guozhang Wang 3202459394
KAFKA-13877: Fix flakiness in RackAwarenessIntegrationTest (#12468)
In the current test, we check for tag distribution immediately after everyone is on the running state, however due to the fact of the follow-up rebalances, "everyone is now in running state" does not mean that the cluster is now stable. In fact, a follow-up rebalance may occur, upon which the local thread metadata would return empty which would cause the distribution verifier to fail.

Reviewers: Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>
2022-08-03 09:17:38 -07:00
José Armando García Sancio 6ace67b2de
MINOR; Bump trunk to 3.4.0-SNAPSHOT (#12463)
Version bumps in trunk after the creation of the 3.3 branch.

Reviewers: David Arthur <mumrah@gmail.com>
2022-08-01 09:54:12 -07:00
Hao Li f7ac5d3d00
Minor: enable index for emit final sliding window (#12461)
Enable index for sliding window emit final case as it's faster to fetch windows for particular key

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-07-29 14:47:25 -07:00
Christo Lolov 54af64c33a
KAFKA-14108: Ensure both JUnit 4 and JUnit 5 tests run (#12441)
When the migration of the Streams project to JUnit 5 started with PR #12285, we discovered that the migrated tests were not run by the PR builds. This PR ensures that Streams' tests that are written in JUnit 4 and JUnit 5 are run in the PR builds.

Co-authored-by: Divij Vaidya <diviv@amazon.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Bruno Cadonna <cadonna@apache.org>
2022-07-29 17:21:25 +02:00
Bruno Cadonna 5f7c99dd77
MINOR: Remove code of removed metric (#12453)
When we removed metric skipped-records in 3.0 we missed to
remove some code related to that metric.

Reviewer: Guozhang Wang <wangguoz@gmail.com>
2022-07-29 16:53:01 +02:00
Bruno Cadonna a5d71e1550
MINOR: Fix static mock usage in ThreadMetricsTest (#12454)
Before this PR the calls to the static methods on StreamsMetricsImpl were just calls and not a verification on the mock. This miss happened during the switch from EasyMock to Mockito.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-07-28 13:32:46 -07:00
Bruno Cadonna 2724cc9920
KAFKA-10199: Bookkeep tasks during assignment for use with state updater (#12442)
Bookkeeps tasks to be recycled, closed, and updated during handling of the assignment. The bookkeeping is needed for integrating the state updater.

These change is hidden behind internal config STATE_UPDATER_ENABLED. If the config is false Streams should not use the state updater and behave as usual.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-07-28 13:28:47 -07:00
Guozhang Wang 06f47c3b51
KAFKA-10199: Further refactor task lifecycle management (#12439)
1. Consolidate the task recycle procedure into a single function within the task. The current procedure now becomes: a) task.recycleStateAndConvert, at end of it the task is in closed while its stateManager is retained, and the manager type has been converted; 2) create the new task with old task's fields and the stateManager inside the creators.
2. Move the task execution related metadata into the corresponding TaskExecutionMetadata class, including the task idle related metadata (e.g. successfully processed tasks); reduce the number of params needed for TaskExecutor as well as Tasks.
3. Move the task execution related fields (embedded producer and consumer) and task creators out of Tasks and migrated into TaskManager. Now the Tasks is only a bookkeeping place without any task mutation logic.
4. When adding tests, I realized that we should not add task to state updater right after creation, since it was not initialized yet, while state updater would validate that the task's state is already restoring / running. So I updated that logic while adding unit tests.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-07-27 17:29:05 -07:00
Alex Sorokoumov d076b7ad0e
KAFKA-13769: Add tests for ForeignJoinSubscriptionProcessorSupplier (#12437)
Reviewers: Adam Bellemare <adam.bellemare@gmail.com>, John Roesler <vvcephei@apache.org>
2022-07-27 13:58:12 -05:00
Bruno Cadonna f191e4781e
MINOR: Use builder for mock task in DefaultStateUpdaterTest (#12436)
Reviewer: Guozhang Wang <wangguoz@gmail.com>
2022-07-26 10:12:20 +02:00
Bruno Cadonna 5a52601691
KAFKA-10199: Add tasks to state updater when they are created (#12427)
This PR introduces an internal config to enable the state updater. If the state updater is enabled newly created tasks are added to the state updater. Additionally, this PR introduces a builder for mocks for tasks.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-07-21 12:37:17 -07:00
Christo Lolov 569a358a3f
KAFKA-14001: Migrate streams module to JUnit 5 - Part 1 (#12285)
This pull request addresses https://issues.apache.org/jira/browse/KAFKA-14001. It is the first of a series of pull requests which address the move of Kafka Streams tests from JUnit 4 to JUnit 5.

Reviewers: Divij Vaidya <diviv@amazon.com>, Bruno Cadonna <cadonna@apache.org>
2022-07-21 17:27:53 +02:00
James Hughes ff7cbf264c
KAFKA-14076: Fix issues with KafkaStreams.CloseOptions (#12408)
- used static memberId was incorrect
- need to remove all threads/members from the group
- need to use admit client correctly

Add test to verify fixes.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-07-21 07:35:29 -07:00
Guozhang Wang c9b6e19b3b
KAFKA-10199: Cleanup TaskManager and Task interfaces (#12397)
In order to integrate with the state updater, we would need to refactor the TaskManager and Task interfaces. This PR achieved the following purposes:

    Separate active and standby tasks in the Tasks placeholder, plus adding pendingActiveTasks and pendingStandbyTasks into Tasks. The exposed active/standby tasks from the Tasks set would only be mutated by a single thread, and the pending tasks hold for those tasks that are assigned but cannot be actively managed yet. For now they include two scenarios: a) tasks from unknown sub-topologies and hence cannot be initialized, b) tasks that are pending for being recycled from active to standby and vice versa. Note case b) would be added in a follow-up PR.

    Extract any logic that mutates a task out of the Tasks / TaskCreators. Tasks should only be a place for maintaining the set of tasks, but not for manipulations of a task; and TaskCreators should only be used for creating the tasks, but not for anything else. These logic are all migrated into TaskManger.

    While doing 2) I noticed we have a couple of minor issues in the code where we duplicate the closing logics, so I also cleaned them up in the following way:
    a) When closing a task, we first trigger the corresponding closeClean/Dirty function; then we remove the task from Tasks bookkeeping, and for active task we also remove its task producer if EOS-V1 is used.
    b) For closing dirty, we swallow the exception from close call and the remove task producer call; for closing clean, we store the thrown exception from either close call or the remove task producer, and then rethrow at the end of the caller. The difference though is that, for the exception from close call we need to retry close it dirty; for the exception from the remove task producer we do not need to re-close it dirty.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-07-21 15:11:40 +02:00
Walker Carlson b62d8b975c
KAFKA-12699: Override the default handler for stream threads if the stream's handler is used (#12324)
Override the default handler for stream threads if the stream's handler is used. We do no want the java default handler triggering when a thread is replaced.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-07-19 13:35:26 -07:00
Guozhang Wang 693e283802
KAFKA-10199: Add RESUME in state updater (#12387)
* Need to check enforceRestoreActive / transitToUpdateStandby when resuming a paused task.
* Do not expose another getResumedTasks since I think its caller only need the getPausedTasks.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-07-19 09:44:10 -07:00
Walker Carlson 188b2bf280
Revert "KAFKA-12887 Skip some RuntimeExceptions from exception handler (#11228)" (#12421)
This reverts commit 4835c64f

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-07-19 09:17:46 -07:00
Guozhang Wang 309e0f986e
KAFKA-10199: Add PAUSE in state updater (#12386)
* Add pause action to task-updater.
* When removing a task, also check in the paused tasks in addition to removed tasks.
* Also I realized we do not check if tasks with the same id are added, so I add that check in this PR as well.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-07-18 16:42:48 -07:00
Alex Sorokoumov 4eef28018a
KAFKA-13769 Fix version check in SubscriptionJoinForeignProcessorSupplier (#12420)
This commit changes the version check from != to > as the process method
works correctly on both version 1 and 2. != incorrectly throws on v1
records.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-07-18 14:10:02 -07:00
Levani Kokhreidze edad31811c
MINOR: Fix QueryResult Javadocs (#12404)
Fixes the QueryResult javadocs.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-07-18 13:39:34 +02:00
Sanjana Kaundinya beac86f049
KAFKA-13043: Implement Admin APIs for offsetFetch batching (#10964)
This implements the AdminAPI portion of KIP-709: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=173084258. The request/response protocol changes were implemented in 3.0.0. A new batched API has been introduced to list consumer offsets for different groups. For brokers older than 3.0.0, separate requests are sent for each group.

Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com>
Co-authored-by: David Jacot <djacot@confluent.io>

Reviewers: David Jacot <djacot@confluent.io>,  Rajini Sivaram <rajinisivaram@googlemail.com>
2022-07-14 13:47:34 +01:00
Hao Li b5d4fa7645
KAFKA-13785: [10/N][emit final] more unit test for session store and disable cache for emit final sliding window (#12370)
1. Added more unit test for RocksDBTimeOrderedSessionStore and RocksDBTimeOrderedSessionSegmentedBytesStore
2. Disable cache for sliding window if emit strategy is ON_WINDOW_CLOSE

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2022-07-12 10:57:11 -07:00
Matthias J. Sax 38b08dfd33
MINOR: revert KIP-770 (#12383)
KIP-770 introduced a performance regression and needs some re-design.

Needed to resolve some conflict while reverting.

This reverts commits 1317f3f77a and 0924fd3f9f.

Reviewers:  Sagar Rao <sagarmeansocean@gmail.com>, Guozhang Wang <guozhang@confluent.io>
2022-07-07 11:19:37 -07:00
Guozhang Wang 915c781243
KAFKA-10199: Remove main consumer from store changelog reader (#12337)
When store changelog reader is called by a different thread than the stream thread, it can no longer use the main consumer to get committed offsets since consumer is not thread-safe. Instead, we would remove main consumer and leverage on the existing admin client to get committed offsets.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-07-06 17:23:18 -07:00
Bruno Cadonna 00f395bb88
KAFKA-10199: Remove call to Task#completeRestoration from state updater (#12379)
The call to Task#completeRestoration calls methods on the main consumer.
The state updater thread should not access the main consumer since the
main consumer is not thread-safe. Additionally, Task#completeRestoration
changed the state of active tasks, but we decided to keep task life cycle
management outside of the state updater.

Task#completeRestoration should be called by the stream thread on
restored active tasks returned by the state udpater.

Reviewer: Guozhang Wang <guozhang@apache.org>
2022-07-06 12:36:15 +02:00
Matthew de Detrich 4e6326f889
KAFKA-13957: Fix flaky shouldQuerySpecificActivePartitionStores test (#12289)
Currently the tests fail because there is a missing predicate in the retrievableException which causes the test to fail, i.e. the current predicates

containsString("Cannot get state store source-table because the stream thread is PARTITIONS_ASSIGNED, not RUNNING"),
containsString("The state store, source-table, may have migrated to another instance"),
containsString("Cannot get state store source-table because the stream thread is STARTING, not RUNNING")

wasn't complete. Another one needed to be added, namely "The specified partition 1 for store source-table does not exist.". This is because its possible for

assertThat(getStore(kafkaStreams2, storeQueryParam2).get(key), is(nullValue()));

or

assertThat(getStore(kafkaStreams1, storeQueryParam2).get(key), is(nullValue()));

(depending on which branch) to be thrown, i.e. see

org.apache.kafka.streams.errors.InvalidStateStorePartitionException: The specified partition 1 for store source-table does not exist.

	at org.apache.kafka.streams.state.internals.WrappingStoreProvider.stores(WrappingStoreProvider.java:63)
	at org.apache.kafka.streams.state.internals.CompositeReadOnlyKeyValueStore.get(CompositeReadOnlyKeyValueStore.java:53)
	at org.apache.kafka.streams.integration.StoreQueryIntegrationTest.lambda$shouldQuerySpecificActivePartitionStores$5(StoreQueryIntegrationTest.java:223)
	at org.apache.kafka.streams.integration.StoreQueryIntegrationTest.retryUntil(StoreQueryIntegrationTest.java:579)
	at org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificActivePartitionStores(StoreQueryIntegrationTest.java:186)

This happens when the stream hasn't been initialized yet. I have run the test around 12k times using Intellij's JUnit testing framework without any flaky failures. The PR also does some minor refactoring regarding moving the list of predicates into their own functions.

Co-authored-by: Bruno Cadonna <cadonna@apache.org>

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-07-04 20:26:53 +02:00
Guozhang Wang ae570f5953
HOTFIX: Correct ordering of input buffer and enforced processing sensors (#12363)
1. As titled, fix the right constructor param ordering.
2. Also added a few more loglines.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Sagar Rao <sagarmeansocean@gmail.com>, Hao Li <1127478+lihaosky@users.noreply.github.com>
2022-07-03 10:02:59 -07:00
Bruno Cadonna a82a8e02ce
MINOR: Fix static mock usage in TaskMetricsTest (#12373)
Before this PR the calls to the static methods on StreamsMetricsImpl were just calls and not a verification on the mock. This miss happened during the switch from EasyMock to Mockito.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-07-02 18:48:07 -07:00
Guozhang Wang 3faa6cf6d0
MINOR: Use mock time in DefaultStateUpdaterTest (#12344)
For most tests we would need an auto-ticking mock timer to work with draining-with-timeout functions.
For tests that check for never checkpoint we need no auto-ticking timer to control exactly how much time elapsed.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-06-29 12:33:00 -07:00
Guozhang Wang ababc4261b
[9/N][Emit final] Emit final for session window aggregations (#12204)
* Add a new API for session windows to range query session window by end time (KIP related).
* Augment session window aggregator with emit strategy.
* Minor: consolidated some dup classes.
* Test: unit test on session window aggregator.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-06-29 09:22:37 -07:00
CHUN-HAO TANG 6ac7f4ea8f
KAFKA-13821: Update Kafka Streams WordCount demo to new Processor API (#12139)
https://issues.apache.org/jira/browse/KAFKA-13821

Reviewers: Jorge Esteban Quilcate Otoya <quilcate.jorge@gmail.com>, Bill Bejeck <bbejeck@apache.org>
2022-06-28 21:39:32 -04:00
Tom Kaszuba 025e47b833
KAFKA-13963: Clarified TopologyDescription JavaDoc for Processors API forward() calls (#12293)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-06-27 18:50:34 -07:00
Bruno Cadonna 1ceaf30039
KAFKA-10199: Expose tasks in state updater (#12312)
This PR exposes the tasks managed by the state updater. The state updater manages all tasks that were added to the state updater and that have not yet been removed from it by draining one of the output queues.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-06-24 09:33:24 -07:00
Bruno Cadonna 08e27914cc
HOTFIX: Fix NPE in StreamTask#shouldCheckpointState (#12341)
The mocks were not setup correctly in StreamTask#shouldCheckpointState
which caused a null pointer exception during test execution.
2022-06-24 12:19:22 +02:00
Guozhang Wang 925c628173
KAFKA-10199: Commit the restoration progress within StateUpdater (#12279)
During restoring, we should always commit a.k.a. write checkpoint file regardless of EOS or ALOS, since if there's a failure we would just over-restore them upon recovery so no EOS violations happened.

Also when we complete restore or remove task, we should enforce a checkpoint as well; for failing cases though, we should not write a new one.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-06-23 10:46:14 -07:00
Bruno Cadonna 8026a0edd8
MINOR: Fix static mock usage in NamedCacheMetricsTest (#12322)
Before this PR the call to `StreamsMetricsImpl.addAvgAndMinAndMaxToSensor()`
was just a call and not a verification on the mock. This miss happened
during the switch from EasyMock to Mockito.

Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2022-06-23 09:45:46 +02:00
Bruno Cadonna 269277f73b
MINOR: Fix static mock usage in ProcessorNodeMetricsTest (#12323)
Before this PR the calls to StreamsMetricsImpl.addInvocationRateAndCountToSensor()
were just calls and not a verification on the mock. This miss happened
during the switch from EasyMock to Mockito.

Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2022-06-23 09:45:22 +02:00
Bruno Cadonna 6f5843dae6
MINOR: Fix static mock usage in StateStoreMetricsTest (#12325)
Before this PR the calls to the static methods on
StreamsMetricsImpl were just calls and not a verification
on the mock. This miss happened during the switch from
EasyMock to Mockito.

Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2022-06-23 09:44:50 +02:00
Bruno Cadonna 4d53dd9972
KAFKA-13930: Add 3.2.0 Streams upgrade system tests (#12209)
* KAFKA-13930: Add 3.2.0 Streams upgrade system tests

Apache Kafka 3.2.0 was recently released. Now we need
to test upgrades from 3.2 to trunk in our system tests.

Reviewer: Bill Bejeck <bbejeck@apache.org>
2022-06-21 16:33:40 +02:00
A. Sophie Blee-Goldman 0928666987
MINOR: change Streams topic-level metrics tag from 'topic-name' to 'topic' (#12310)
Changes the tag name from topic-name to just topic to conform to the way this tag is named elsewhere (ie in the clients)
Also:
    - fixes a comment about dynamic topic routing
    - fixes some indentation in MockRecordCollector
    - Undoes the changes to KStreamSplitTest.scala and TestTopicsTest which are no longer necessary after this hotfix

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-06-21 13:10:36 +02:00
Guozhang Wang cfdd567955
KAFKA-13880: Remove DefaultPartitioner from StreamPartitioner (#12304)
There are some considerata embedded in this seemingly straight-forward PR that I'd like to explain here. The StreamPartitioner is used to send records to three types of topics:

1) repartition topics, where key should never be null.
2) changelog topics, where key should never be null.
3) sink topics, where only non-windowed key could be null and windowed key should still never be null.
Also, the StreamPartitioner is used as part of the IQ to determine which host contains a certain key, as determined by the case 2) above.

This PR's main goal is to remove the deprecated producer's default partitioner, while with those things in mind such that:

We want to make sure for not-null keys, the default murmur2 hash behavior of the streams' partitioner stays consistent with producer's new built-in partitioner.
For null-keys (which is only possible for non-window default stream partition, and is never used for IQ), we would fix the issue that we may never rotate to a new partitioner by setting the partition as null hence relying on the newly introduced built-in partitioner.

Reviewers: Artem Livshits <84364232+artemlivshits@users.noreply.github.com>, Matthias J. Sax <matthias@confluent.io>
2022-06-17 20:17:02 -07:00
Divij Vaidya 17637c4ad5
MINOR: Clean up tmp files created by tests (#12233)
There are a bunch of tests which do not clean up after themselves. This leads to
accumulation of files in the tmp directory of the system on which the tests are
running. 

This code change fixes some of the main culprit tests which leak the files in the
temporary directory.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Kvicii <kvicii.yu@gmail.com>
2022-06-16 16:46:07 -07:00
jnewhouse ee565f5f6b
KAFKA-13939: Only track dirty keys if logging is enabled. (#12263)
InMemoryTimeOrderedKeyValueBuffer keeps a Set of keys that have been seen in order to log them for durability. This set is never used nor cleared if logging is not enabled. Having it be populated creates a memory leak. This change stops populating the set if logging is not enabled.

Reviewers: Divij Vaidya <diviv@amazon.com>, Kvicii <42023367+Kvicii@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
2022-06-16 14:27:38 -07:00
James Hughes 683d0bbc4c
MINOR: Guard against decrementing `totalCommittedSinceLastSummary` during rebalancing. (#12299)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-06-16 09:40:08 -07:00
James Hughes 7ed3748a46
KAFKA-13873 Add ability to Pause / Resume KafkaStreams Topologies (#12161)
This PR adds the ability to pause and resume KafkaStreams instances as well as named/modular topologies (KIP-834).

Co-authored-by: Bruno Cadonna <cadonna@apache.org>

Reviewers: Bonnie Varghese <bvarghese@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@apache.org>, Bruno Cadonna <cadonna@apache.org>
2022-06-16 16:06:02 +02:00
Matthias J. Sax 44edad5bb5
MINOR: improve description of `commit.interval.ms` config (#12169)
Reviewers: Luke Chen <showuon@gmail.com>, Kvicii Y <@Kvicii>, Bruno Cadonna <bruno@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2022-06-14 22:29:25 -07:00
Guozhang Wang 39a555ba94
KAFKA-13846: Use the new addMetricsIfAbsent API (#12287)
Use the newly added function to replace the old addMetric function that may throw illegal argument exceptions.

Although in some cases concurrency should not be possible they do not necessarily remain always true in the future, so it's better to use the new API just to be less error-prone.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-06-14 16:04:26 -07:00
A. Sophie Blee-Goldman 3189a8648f
HOTFIX: null check keys of ProducerRecord when computing sizeInBytes (#12288)
Minor followup to #12235 that adds a null check on the record key in the new ClientUtils#producerRecordSizeInBytes utility method, as there are valid cases in which we might be sending records with null keys to the Producer, such as a simple builder.stream("non-keyed-input-topic").filter(...).to("output-topic")

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
2022-06-13 22:27:06 -07:00
vamossagar12 5cab11cf52
KAFKA-13846: Adding overloaded metricOrElseCreate method (#12121)
Reviewers: David Jacot <djacot@confluent.io>, Justine Olshan <jolshan@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2022-06-13 10:36:39 -07:00
Christo Lolov 6c90f3335e
KAFKA-13947: Use %d formatting for integers rather than %s (#12267)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>, Kvicii <kvicii.yu@gmail.com>
2022-06-10 13:55:52 +02:00
Divij Vaidya 0a50005408
KAFKA-13929: Replace legacy File.createNewFile() with NIO.2 Files.createFile() (#12197)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-06-10 13:28:55 +02:00
Bruno Cadonna e67408c859
KAFKA-10199: Implement removing active and standby tasks from the state updater (#12270)
This PR adds removing of active and standby tasks from the default implementation of the state updater. The PR also includes refactoring that clean up the code.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-06-09 10:28:26 -07:00
A. Sophie Blee-Goldman 1e2611aed4
MINOR: adjust logging levels in Stream tests (#12255)
Now that we've turned off logging in the brokers/zookeeper/config classes we can finally see at least some of the logs where Streams is actually doing something when trying to debug tests from a failed PR build. But I've noticed we still have some flooding of warnings from the NetworkClient and info-level junk from Metadata, so to maximize the visible useful logs we should filter out everything bu the producer/consumer client themselves (in addition to Streams) fine-grained logging

Reviewers: Luke Chen <showuon@gmail.com>, Kvicii Y
2022-06-08 02:02:40 -07:00
A. Sophie Blee-Goldman a6c5a74fdb
KAFKA-13945: add bytes/records consumed and produced metrics (#12235)
Implementation of KIP-846: Source/sink node metrics for Consumed/Produced throughput in Streams

Adds the following INFO topic-level metrics for the total bytes/records consumed and produced:

    bytes-consumed-total
    records-consumed-total
    bytes-produced-total
    records-produced-total

Reviewers: Kvicii <Karonazaba@gmail.com>, Guozhang Wang <guozhang@apache.org>, Bruno Cadonna <cadonna@apache.org>
2022-06-07 16:02:17 +02:00
Divij Vaidya 601051354b
MINOR: Correctly mark some tests as integration tests (#12223)
Also fix package name of `ListOffsetsIntegrationTest`.

Reviewers: dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-06-06 11:18:24 -07:00
Colin Patrick McCabe 4c9eeef5b2
MINOR: add timeouts to streams integration tests (#12216)
Reviewers: David Arthur <mumrah@gmail.com>
2022-05-31 14:22:13 -07:00
Bruno Cadonna 286bae4251
KAFKA-10199: Implement adding standby tasks to the state updater (#12200)
This PR adds adding of standby tasks to the default implementation of the state updater.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-05-24 16:59:14 -07:00
Sayantanu Dey 9dc332f5ca
KAFKA-13217: Reconsider skipping the LeaveGroup on close() or add an overload that does so (#12035)
This is for KIP-812:

* added leaveGroup on a new close function in kafka stream
* added logic to resolve future returned by remove member call in close method
* added max check on remainingTime value in close function


Reviewers: David Jacot <david.jacot@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
2022-05-23 10:07:19 -07:00
John Roesler 3f86a183be
MINOR: Deflake OptimizedKTableIntegrationTest (#12186)
This test has been flaky due to unexpected rebalances during the test.
This change fixes it by detecting an unexpected rebalance and retrying
the test logic (within a timeout).

Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <guozhang@apache.org>
2022-05-20 09:17:39 -05:00
Guozhang Wang 46efb72600
KAFKA-13785: [7/N][Emit final] emit final for sliding window (#12135)
This is a copy PR of #12037: Implementation to emit final for sliding window agg. This is authored by lihaosky.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-05-13 19:29:00 -07:00
Alex Sorokoumov 78dd40123c
MINOR: Add upgrade tests for FK joins (#12122)
Follow up PR for KAFKA-13769.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-05-13 17:21:27 -07:00
RivenSun df507e56e2
KAFKA-13793: Add validators for configs that lack validators (#12010)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Luke Chen <showuon@gmail.com>, Chris Egerton <fearthecellos@gmail.com>, Christo Lolov <lolovc@amazon.com>, Divij Vaidya <divijvaidya13@gmail.com>
2022-05-09 20:29:17 +02:00
Artem Livshits f7db6031b8
KAFKA-10888: Sticky partition leads to uneven produce msg (#12049)
The design is described in detail in KIP-794
https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner.

Implementation notes:

The default partitioning logic is moved to the BuiltInPartitioner class
(there is one object per topic).  The object keeps track of how many
bytes are produced per-partition and once the amount exceeds batch.size,
switches to the next partition (note that partition switch decision is
decoupled from batching).  The object also keeps track of probability
weights that are based on the queue sizes (the larger the queue size
is the less chance for the next partition to be chosen).  The queue
sizes are calculated in the RecordAccumulator in the `ready` method,
the method already enumerates all partitions so we just add some extra
logic into the existing O(N) method.  The partition switch decision may
take O(logN), where N is the number partitions per topic, but it happens
only once per batch.size (and the logic is avoided when all queues are
of equal size).  Produce bytes accounting logic is lock-free.

When partitioner.availability.timeout.ms is non-0, RecordAccumulator
keeps stats on "node latency" which is defined as the difference between
the last time the node had a batch waiting to be send and the last time
the node was ready to take a new batch.  If this difference exceeds
partitioner.availability.timeout.ms we don't switch to that partition
until the node is ready.

Reviewers: Jun Rao <junrao@gmail.com>
2022-05-06 11:31:12 -07:00
John Roesler e3202b9999
MINOR: Fix RecordContext Javadoc (#12130)
A prior commit accidentally changed the javadoc for RecordContext.
In reality, it is not reachable from api.Processor, only Processor.

Reviewers: Guozhang Wang <guozhang@apache.org>
2022-05-06 11:31:51 -05:00
Guozhang Wang 3b08deaa76
KAFKA-13785: [8/N][emit final] time-ordered session store (#12127)
Time ordered session store implementation. I introduced AbstractRocksDBTimeOrderedSegmentedBytesStore to make it generic for RocksDBTimeOrderedSessionSegmentedBytesStore and RocksDBTimeOrderedSegmentedBytesStore.

A few minor follow-up changes:

1. Avoid extra byte array allocation for fixed upper/lower range serialization.
2. Rename some class names to be more consistent.

Authored-by: Hao Li <1127478+lihaosky@users.noreply.github.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com.com>, John Roesler <vvcephei@apache.org>
2022-05-05 16:09:16 -07:00
Bruno Cadonna ced5989ff6
KAFKA-10199: Implement adding active tasks to the state updater (#12128)
This PR adds the default implementation of the state updater. The implementation only implements adding active tasks to the state updater.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-05-05 16:00:35 -07:00
Joel Hamill 18b84d0404
MINOR: Fix typos in configuration docs (#11874)
Reviewers: Chris Egerton, Weikang Sun, Andrew Eugene Choi, Luke Chen, Guozhang Wang
2022-05-04 10:27:14 -07:00
Guozhang Wang cc2aa96ae4
KAFKA-13785: [6/N][Emit final] Copy: Emit final for TimeWindowedKStreamImpl (#12100)
This is a copy PR of #11896, authored by @lihaosky (Hao Li): Initial implementation to emit final for TimeWindowedKStreamImpl. This PR is on top of #12030 

Author: Hao Li
Reviewers: John Roesler <vvcephei@apache.org>
2022-05-03 09:42:23 -07:00
Matthias J. Sax 25457377e3
HOTFIX: fix broken trunk due to conflicting and overlapping commits (#12074)
Reviewers: Victoria Xia <victoria.xia@confluent.io>, David Arthur <mumrah@gmail.com>
2022-04-20 14:39:15 -07:00
Sayantanu Dey c5077c679c
KAFKA-13588: consolidate `changelogFor` methods to simplify the generation of internal topic names (#11703)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-04-20 11:39:03 -07:00
Hao Li d83fccd65f
KAFKA-13785: [5/N][emit final] cache for time ordered window store (#12030)
A new cache for RocksDBTimeOrderedWindowStore. Need this because RocksDBTimeOrderedWindowStore's key ordering is different from CachingWindowStore which has issues for MergedSortedCacheWindowStoreIterator

Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2022-04-20 11:09:13 -07:00
Jorge Esteban Quilcate Otoya fa0324485b
KAFKA-13654: Extend KStream process with new Processor API (#11993)
Updates the KStream process API to cover the use cases
of both process and transform, and deprecate the KStream transform API.

Implements KIP-820

Reviewer: John Roesler <vvcephei@apache.org>
2022-04-19 10:29:28 -05:00
Aleksandr Sorokoumov adf5cc5371
KAFKA-13769: Explicitly route FK join results to correct partitions (#11945)
Prior to this commit FK response sink routed FK results to
SubscriptionResolverJoinProcessorSupplier using the primary key.

There are cases, where this behavior is incorrect. For example,
if KTable key serde differs from the data source serde which might
happen without a key changing operation.

Instead of determining the resolver partition by serializing the PK
this patch includes target partition in SubscriptionWrapper payloads.
Default FK response-sink partitioner extracts the correct partition
from the value and routes the message accordingly.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-04-15 11:28:43 -07:00
Hao Li c93b717836
KAFKA-13542: Add rebalance reason in Kafka Streams (#12018)
Reviewers: Bruno Cadonna <bruno@confluent.io>, David Jacot <djacot@confluent.io>
2022-04-13 13:49:31 +02:00
Jorge Esteban Quilcate Otoya 0d518aaed1
MINOR: Fix SessionStore#fetchSession parameter names (#11999)
Fixes a small copy/paste error from #10390 that changed the parameter names
for fetchSession from the singular session form (eg `startTime`) to the range
form (eg `earliestSessionStartTime`).

Reviewers: John Roesler <vvcephei@apache.org>
2022-04-11 16:17:01 -05:00
Hao Li 6b2a0bcf8c
KAFKA-13785: add processor metadata to be committed with offset (#11829)
Part of KIP-825

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-03-31 09:48:21 -07:00
Bounkong Khamphousone 3c279b63fa
fix: make sliding window works without grace period (#kafka-13739) (#11928)
Fix upperbound for sliding window, making it compatible with no grace period (kafka-13739)

Added unit test for early sliding window and "normal" sliding window for both events within one time difference (small input) and above window time difference (large input).

Fixing this window interval may slightly change stream behavior but probability to happen is extremely slow and may not have a huge impact on the result given.

Reviewers Leah Thomas <lthomas@confluent.io>, Bill Bejeck <bbejeck@apache.org>
2022-03-31 10:05:53 -04:00
A. Sophie Blee-Goldman 1317f3f77a
MINOR: log warning when topology override for cache size is non-zero (#11959)
Since the topology-level cache size config only controls whether we disable the caching layer entirely for that topology, setting it to anything other than 0 has no effect. The actual cache memory is still just split evenly between the threads, and shared by all topologies.

It's possible we'll want to change this in the future, but for now we should make sure to log a warning so that users who do try to set this override to some nonzero value are made aware that it doesn't work like this.

Also includes some minor refactoring plus a fix for an off-by-one error in #11796

Reviewers: Luke Chen <showuon@gmail.com>, Walker Carlson <wcarlson@confluent.io>, Sagar Rao <sagarmeansocean@gmail.com>
2022-03-30 16:24:01 -07:00
Guozhang Wang 19a6269780
MINOR: Fix log4j entry in RepartitionTopics (#11958)
I noticed two issues in the log4j entry:

1. It's formatted as "{}...{}" + param1, param2; effectively it is one param only, and the printed line is effectively mis-aligned: we always print Subtopology [sourceTopics set] was missing source topics {}
2. Even fix 1) is not enough, since topologyName may be null. On the other hand I think the original goal is not to print the topology name but the sub-topology id since it's within the per-sub-topology loop.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-03-28 21:19:22 -07:00
John Roesler 7243facb8d
MINOR: Fix stream-join metadata (#11952)
#11356 inadvertently changed
the (undefined) header forwarding behavior of stream-stream joins.

This change does not define the behavior, but just restores the prior
undefined behavior for continuity's sake. Defining the header-forwarding
behavior is future work.

Reviewers: Matthias J. Sax <mjsax@apache.org>, Jorge Esteban Quilcate Otoya <quilcate.jorge@gmail.com>
2022-03-28 11:35:43 -05:00
Tim Patterson 110bccac4a
KAFKA-13600: Kafka Streams - Fall back to most caught up client if no caught up clients exist (#11760)
The task assignor is modified to consider the Streams client with the most caught up states if no Streams client exists that is caught up, i.e., the lag of the states on that client is less than the acceptable recovery lag.  

Unit test for case task assignment where no caught up nodes exist.
Existing unit and integration tests to verify no other behaviour has been changed

Co-authored-by: Bruno Cadonna <cadonna@apache.org>

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-03-28 16:48:39 +02:00
Luke Chen 0586f544ef
KAFKA-10405: Set purge interval explicitly in PurgeRepartitionTopicIntegrationTest (#11948)
In KIP-811, we added a new config repartition.purge.interval.ms to set repartition purge interval. In this flaky test, we expected the purge interval is the same as commit interval, which is not correct anymore (default is 30 sec). Set the purge interval explicitly to fix this issue.

Reviewers: Bruno Cadonna <cadonna@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2022-03-25 09:30:02 -07:00
John Roesler 46df7ee97c
MINOR: Add extra notice about IQv2 compatibility (#11944)
Added an extra notice about IQv2's API compatibility, as discussed in the KIP-796 vote thread.

Reviewers: Bill Bejeck <bbejeck@apache.org>, @Kvicii
2022-03-24 14:04:40 -05:00
Rohan 01533e3dd7
KAFKA-13692: include metadata wait time in total blocked time (#11805)
This patch includes metadata wait time in total blocked time. First, this patch adds a new metric for total producer time spent waiting on metadata, called metadata-wait-time-ms-total. Then, this time is included in the total blocked time computed from StreamsProducer.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-03-24 09:55:26 -07:00
John Roesler 322a065b90
KAFKA-13714: Fix cache flush position (#11926)
The caching store layers were passing down writes into lower store layers upon eviction, but not setting the context to the evicted records' context. Instead, the context was from whatever unrelated record was being processed at the time.

Reviewers: Matthias J. Sax <mjsax@apache.org>
2022-03-23 22:09:05 -05:00
Hao Li a3adf41d8b
[Emit final][4/N] add time ordered store factory (#11892)
Add factory to create time ordered store supplier.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-03-22 20:53:53 -07:00
vamossagar12 0924fd3f9f
KAFKA-13152: Replace "buffered.records.per.partition" with "input.buffer.max.bytes" (#11796)
Implements KIP-770

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-03-21 17:16:00 -07:00
Bruno Cadonna 4c8685e701
MINOR: Bump trunk to 3.3.0-SNAPSHOT (#11925)
Version bumps on trunk following the creation of the 3.2 release branch.

Reviewer: David Jacot <djacot@confluent.io>
2022-03-21 21:37:05 +01:00
Márton Sigmond e5eb180a6f
MINOR: Pass materialized to the inner KTable instance (#11888)
Reviewers: Luke Chen <showuon@gmail.com>
2022-03-21 17:03:04 +08:00
Ludovic DEHON df963ee0a9
MINOR: Fix incorrect log for out-of-order KTable (#11905)
Reviewers: Luke Chen <showuon@gmail.com>
2022-03-18 10:00:03 +08:00
Luke Chen fbe7fb9411
KAFKA-9847: add config to set default store type (KIP-591) (#11705)
Reviewers: Hao Li <hli@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>
2022-03-17 10:19:42 +08:00
Levani Kokhreidze b68463c250
KAFKA-6718 / Add rack awareness configurations to StreamsConfig (#11837)
This PR is part of KIP-708 and adds rack aware standby task assignment logic.

Rack aware standby task assignment won't be functional until all parts of this KIP gets merged.

Splitting PRs into three smaller PRs to make the review process easier to follow. Overall plan is the following:

⏭️ Rack aware standby task assignment logic #10851
⏭️ Protocol change, add clientTags to SubscriptionInfoData #10802
👉 Add required configurations to StreamsConfig (public API change, at this point we should have full functionality)

This PR implements last point of the above mentioned plan.

Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-03-16 18:02:24 +01:00
Nick Telford 9e8ace0809
KAFKA-13549: Add repartition.purge.interval.ms (#11610)
Implements KIP-811.

Add a new config `repartition.purge.interval.ms` that limits how often data is purged from repartition topics.
2022-03-15 15:55:20 -07:00
Walker Carlson f708dc58ed
MINOR: fix shouldWaitForMissingInputTopicsToBeCreated test (#11902)
This test was falling occasionally. It does appear to be a matter of the tests assuming perfecting deduplication/caching when asserting the test output records, ie a bug in the test not in the real code. Since we are not assuming that it is going to be perfect I changed the test to make sure the records we expect arrive, instead of only those arrive.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-03-15 13:54:48 -07:00
Matthias J. Sax 03411ca28b
KAFKA-13721: asymetric join-winodws should not emit spurious left/outer join results (#11875)
Reviewers:  Sergio Peña <sergio@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2022-03-15 09:37:01 -07:00
Guozhang Wang b916cb40bd
KAFKA-13690: Fix flaky test in EosIntegrationTest (#11887)
I found a couple of flakiness with the integration test.

IQv1 on stores failed although getting the store itself is covered with timeouts, since the InvalidStoreException is upon the query (store.all()). I changed to the util function with IQv2 whose timeout/retry covers the whole procedure. Example of such failure is: https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-11802/11/tests/

With ALOS we should not check that the output, as well as the state store content is exactly as of processed once, since it is possible that during processing we got spurious task-migrate exceptions and re-processed with duplicates. I actually cannot reproduce this error locally, but from the jenkins errors it seems possible indeed. Example of such failure is: https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-11433/4/tests/

Some minor cleanups.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2022-03-14 15:42:10 -07:00
Matthias J. Sax b1f36360ed
KAKFA-13699: new ProcessorContext is missing methods (#11877)
We added `currentSystemTimeMs()` and `currentStreamTimeMs()` to the
`ProcessorContext` via KIP-622, but forgot to add both to the new
`api.ProcessorContext`.

Reviewers: Ricardo Brasil <anribrasil@gmail.com>, Guozhang Wang <guozhang@confluent.io>
2022-03-14 09:22:01 -07:00
Hao Li 63ea5db9ec
KIP-825: Part 1, add new RocksDBTimeOrderedWindowStore (#11802)
Initial State store implementation for TimedWindow and SlidingWindow.

RocksDBTimeOrderedWindowStore.java contains one RocksDBTimeOrderedSegmentedBytesStore which contains index and base schema.

PrefixedWindowKeySchemas.java implements keyschema for time ordered base store and key ordered index store.

Reviewers: James Hughes, Guozhang Wang <wangguoz@gmail.com>
2022-03-11 17:51:10 -08:00
Hao Li 17988f4710
MINOR: fix flaky EosIntegrationTest.shouldCommitCorrectOffsetIfInputTopicIsTransactional[at_least_once] (#11878)
In this test, we started Kafka Streams app and then write to input topic in transaction. It's possible when streams commit offset, transaction hasn't finished yet. So the streams committed offset could be less than the eventual endOffset.

This PR moves the logic of writing to input topic before starting streams app.

Reviewers: John Roesler <vvcephei@apache.org>
2022-03-11 12:01:46 -06:00
Levani Kokhreidze 87eb0cf03c
KAFKA-6718: Update SubscriptionInfoData with clientTags (#10802)
adds ClientTags to SubscriptionInfoData

Reviewer: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-03-11 16:29:05 +08:00
Walker Carlson 4d5a28973f
Revert "KAFKA-13542: add rebalance reason in Kafka Streams (#11804)" (#11873)
This reverts commit 2ccc834faa.

This reverts commit 2ccc834. We were seeing serious regressions in our state heavy benchmarks. We saw that our state heavy benchmarks were experiencing a really bad regression. The State heavy benchmarks runs with rolling bounces with 10 nodes.

We regularly saw this exception:  java.lang.OutOfMemoryError: Java heap space                                                                                                                                                                                              

I ran through a git bisect and found this commit. We verified that the commit right before did not have the same issues as this one did. I then reverted the problematic commit and ran the benchmarks again on this commit and did not see any more issues. We are still looking into the root cause, but for now since this isn't a critical improvement so we can remove it temporarily.

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, David Jacot <djacot@confluent.io>, Ismael Juma <ismael@confluent.io>
2022-03-10 13:52:05 -08:00
A. Sophie Blee-Goldman 113595cf5c
KAFKA-12648: fix flaky #shouldAddToEmptyInitialTopologyRemoveResetOffsetsThenAddSameNamedTopologyWithRepartitioning (#11868)
This test has started to become flaky at a relatively low, but consistently reproducible, rate. Upon inspection, we find this is due to IOExceptions during the #cleanUpNamedTopology call -- specifically, most often a DirectoryNotEmptyException with an ocasional FileNotFoundException

Basically, signs pointed to having returned from/completed the #removeNamedTopology future prematurely, and moving on to try and clear out the topology's state directory while there was a streamthread somewhere that was continuing to process/close its tasks.

I believe this is due to updating the thread's topology version before we perform the actual topology update, in this case specifically the act of eg clearing out a directory. If one thread updates its version and then goes to perform the topology removal/cleanup when the second thread finishes its own topology removal, this other thread will check whether all threads are on the latest version and complete any waiting futures if so -- which means it can complete the future before the first thread has actually completed the corresponding action

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2022-03-10 12:02:07 -08:00
A. Sophie Blee-Goldman 9c7d857713
KAFKA-12648: fix #getMinThreadVersion and include IOException + topologyName in StreamsException when topology dir cleanup fails (#11867)
Quick fix to make sure we log the actual source of the failure both in the actual log message as well as the StreamsException that we bubble up to the user's exception handler, and also to report the offending topology by filling in the StreamsException's taskId field.

Also prevents a NoSuchElementException from being thrown when trying to compute the minimum topology version across all threads when the last thread is being unregistered during shutdown.

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2022-03-09 16:30:42 -08:00
John Roesler 717f9e2149
MINOR: Restructure ConsistencyVectorIntegrationTest (#11848)
Reviewers: YEONCHEOL JANG <@YeonCheolGit>, Matthias J. Sax <mjsax@apache.org>
2022-03-08 13:59:58 -06:00
John Roesler 10f34ce6b3
MINOR: Clarify acceptable recovery lag config doc (#11411)
Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>, Andrew Eugene Choi < @andrewchoi5 >
2022-03-08 10:42:36 -06:00
A. Sophie Blee-Goldman fc7133d52d
KAFKA-12648: fix bug where thread is re-added to TopologyMetadata when shutting down (#11857)
We used to call TopologyMetadata#maybeNotifyTopologyVersionWaitersAndUpdateThreadsTopologyVersion when a thread was being unregistered/shutting down, to check if any of the futures listening for topology updates had been waiting on this thread and could be completed. Prior to invoking this we make sure to remove the current thread from the TopologyMetadata's threadVersions map, but this thread is actually then re-added in the #maybeNotifyTopologyVersionWaitersAndUpdateThreadsTopologyVersion call.

To fix this, we should break up this method into separate calls for each of its two distinct functions, updating the version and checking for topology update completion. When unregistering a thread, we should only invoke the latter method

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2022-03-07 23:59:43 -08:00
A. Sophie Blee-Goldman 539f006e65
KAFKA-12648: fix NPE due to race condtion between resetting offsets and removing a topology (#11847)
While debugging the flaky NamedTopologyIntegrationTest. shouldRemoveOneNamedTopologyWhileAnotherContinuesProcessing test, I did discover one real bug. The problem was that we update the TopologyMetadata's builders map (with the known topologies) inside the #removeNamedTopology call directly, whereas the StreamThread may not yet have reached the poll() in the loop and in case of an offset reset, we get an NP.e
I changed the NPE to just log a warning for now, going forward I think we should try to tackle some tech debt by keeping the processing tasks and the TopologyMetadata in sync

Also includes a quick fix on the side where we were re-adding the topology waiter/KafkaFuture for a thread being shut down

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2022-03-07 11:09:18 -08:00
Tim Patterson e3ef29ea03
KAFKA-12959: Distribute standby and active tasks across threads to better balance load between threads (#11493)
Balance standby and active stateful tasks evenly across threads

Reviewer: Luke Chen <showuon@gmail.com>
2022-03-05 16:11:42 +08:00
A. Sophie Blee-Goldman 11143d4883
MINOR: fix flaky shouldRemoveOneNamedTopologyWhileAnotherContinuesProcessing (#11827)
This test has been failing somewhat regularly due to going into the ERROR state before reaching RUNNING during the startup phase. The problem is that we are reusing the DELAYED_INPUT_STREAM topics, which had previously been assumed to be uniquely owned by a particular test. We should make sure to delete and re-create these topics for any test that uses them.
2022-03-04 10:31:37 -08:00
A. Sophie Blee-Goldman 6f54faed2d
KAFKA-12648: fix #add/removeNamedTopology blocking behavior when app is in CREATED (#11813)
Currently the #add/removeNamedTopology APIs behave a little wonky when the application is still in CREATED. Since adding and removing topologies runs some validation steps there is valid reason to want to add or remove a topology on a dummy app that you don't plan to start, or a real app that you haven't started yet. But to actually check the results of the validation you need to call get() on the future, so we need to make sure that get() won't block forever in the case of no failure -- as is currently the case

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2022-03-04 09:58:56 -08:00
Levani Kokhreidze 62e646619b
KAFKA-6718 / Rack aware standby task assignor (#10851)
This PR is part of KIP-708 and adds rack aware standby task assignment logic.

Reviewer: Bruno Cadonna <cadonna@apache.org>, Luke Chen <showuon@gmail.com>, Vladimir Sitnikov <vladimirsitnikov.apache.org>
2022-03-03 11:37:26 +08:00
A. Sophie Blee-Goldman f089bea7ed
MINOR: set log4j.logger.kafka and all Config logger levels to ERROR for Streams tests (#11823)
Pretty much any time we have an integration test failure that's flaky or only exposed when running on Jenkins through the PR builds, it's impossible to debug if it cannot be reproduced locally as the logs attached to the test results have truncated the entire useful part of the logs. This is due to the logs being flooded at the beginning of the test when the Kafka cluster is coming up, eating up all of the allotted characters before we even get to the actual Streams test. Setting log4j.logger.kafka to ERROR greatly improves the situation and cuts down on most of the excessive logging in my local runs. To improve things even more and have some hope of getting the part of the logs we actually need, I also set the loggers for all of the Config objects to ERROR, as these print out the value of every single config (of which there are a lot) and are not useful as we can easily figure out what the configs were if necessary by just inspecting the test locally.

Reviewers:  Luke Chen <showuon@confluent.io>,  Guozhang Wang <guozhang@confluent.io>
2022-03-01 21:58:10 -08:00
John Roesler 7172f35807
MINOR: Improve test assertions for IQv2 (#11828)
Reviewer: Bill Bejeck <bbejeck@apache.org>
2022-03-01 20:30:29 -06:00
A. Sophie Blee-Goldman 84f8c90b13
KAFKA-12648: standardize startup timeout to fix some flaky NamedTopologyIntegrationTest tests (#11824)
Seen a few of the new tests added fail on PR builds lately with 

"java.lang.AssertionError: Expected all streams instances in [org.apache.kafka.streams.processor.internals.namedtopology.KafkaStreamsNamedTopologyWrapper@7fb3e6b0] to be RUNNING within 30000 ms"

We already had some tests using the 30s timeout while others were bumped all the way up to 60s, I figured we should try out a default timeout of 45s and if we still see failures in specific tests we can go from there
2022-03-01 13:15:53 -08:00
A. Sophie Blee-Goldman 6eb57f6df1
KAFKA-12738: address minor followup and consolidate integration tests of PR #11787 (#11812)
This PR addresses the remaining nits from the final review of #11787

It also deletes two integration test classes which had only one test in them, and moves the tests to another test class file to save on the time to bring up an entire embedded kafka cluster just for a single run

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2022-03-01 12:59:18 -08:00
Hao Li 2ccc834faa
KAFKA-13542: add rebalance reason in Kafka Streams (#11804)
Add rebalance reason in Kafka Streams.

Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-02-28 18:26:46 +01:00
Walker Carlson abb74d406a
KAFKA-13281: allow #removeNamedTopology while in the CREATED state (#11810)
We should be able to change the topologies while still in the CREATED state. We already allow adding them, but this should include removing them as well

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-02-25 19:11:06 -08:00
Walker Carlson 29317e6953
KAFKA-13281: add API to expose current NamedTopology set (#11808)
List all the named topologies that have been added to this client

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-02-25 19:04:07 -08:00
A. Sophie Blee-Goldman c2ee1411c8
KAFKA-12738: send LeaveGroup request when thread dies to optimize replacement time (#11801)
Quick followup to #11787 to optimize the impact of the task backoff by reducing the time to replace a thread. When a thread is going through a dirty close, ie shutting down from an uncaught exception, we should be sending a LeaveGroup request to make sure the broker acknowledges the thread has died and won't wait up to the `session.timeout` for it to join the group if the user opts to `REPLACE_THREAD` in the handler

Reviewers: Walker Carlson <wcarlson@confluent.io>, John Roesler <vvcephei@apache.org>
2022-02-24 16:18:13 -08:00
A. Sophie Blee-Goldman cd4a1cb410
KAFKA-12738: track processing errors and implement constant-time task backoff (#11787)
Part 1 in the initial series of error handling for named topologies.

*Part 1: Track tasks with errors within a named topology & implement constant-time based task backoff
Part 2: Implement exponential task backoff to account for recurring errors
Part 3: Pause/backoff all tasks within a named topology in case of a long backoff/frequent errors for any individual task

Reviewers:  Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2022-02-24 12:10:31 -08:00
Bruno Cadonna 8d88b20b27
KAFKA-10199: Add interface for state updater (#11499)
Reviewers: Andrew Eugene Choi <andrew.choi@uwaterloo.ca>, Guozhang Wang <wangguoz@gmail.com>
2022-02-23 10:13:08 -08:00
Walker Carlson d8cf47bf28
KAFKA-13676: Commit successfully processed tasks on error (#11791)
When we hit an exception when processing tasks we should save the work we have done so far.
This will only be relevant with ALOS and EOS-v1, not EOS-v2. It will actually reduce the number of duplicated record in ALOS because we will not be successfully processing tasks successfully more than once in many cases.

This is currently enabled only for named topologies.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang@confluent.io>
2022-02-22 23:10:05 -08:00
Rob Leland 06ca4850c5
KAFKA-13666 Don't Only ignore test exceptions for windows OS for certain tests. (#11752)
Tests are swallowing exceptions for supported operating systems, which could hide regressions.

Co-authored-by: Rob Leland <rleland@apache.org>

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-02-18 14:49:03 +01:00
A. Sophie Blee-Goldman 4c23e47bd5
MINOR: move non-management methods from TaskManager to Task Executor (#11738)
Basic refactoring with no logical changes to lay the groundwork & facilitate reviews for error handling work.

This PR just moves all methods that go beyond the management of tasks into a new TaskExecutor class, such as processing, committing, and punctuating. This breaks up the ever-growing TaskManager class so it can focus on the tracking and updating of the tasks themselves, while the TaskExecutor can focus on the actual processing. In addition to cleaning up this code this should make it easier to test this part of the code.

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2022-02-18 00:39:41 -08:00
Bruno Cadonna 333278d9bb
MINOR: Add actual state directory to related exceptions (#11751)
For debugging it is useful to see the actual state directory when
an exception regarding the state directory is thrown.

Reviewer: Bill Bejeck <bbejeck@apache.org>
2022-02-16 20:32:00 +01:00
Matthias J. Sax c012fc411c
MINOR: improve JavaDocs for ReadOnlySessionStore (#11759)
Reviewer: Guozhang Wang <guozhang@confluent.io>
2022-02-16 08:40:47 -08:00
A. Sophie Blee-Goldman fdb98df839
KAFKA-12648: avoid modifying state until NamedTopology has passed validation (#11750)
Previously we were only verifying the new query could be added after we had already inserted it into the TopologyMetadata, so we need to move the validation upfront.

Also adds a test case for this and improves handling of NPE in case of future or undiscovered bugs.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-02-15 13:06:54 -08:00
dengziming b5b590cb67
MINOR: Use bootstrap-server instead of broker-list in doc (#10832)
* MINOR: Use bootstrap-server instead of broker-list in doc

Reviewers: Luke Chen <showuon@gmail.com>
2022-02-14 20:24:20 +08:00
Jorge Esteban Quilcate Otoya 99310360a5
KAFKA-12939: After migrating processors, search the codebase for missed migrations (#11534)
Migrated internal usages that had previously been marked with TODO suppressions.

Reviewer: John Roesler<vvcephei@apache.org>
2022-02-11 22:25:03 -06:00
Ismael Juma 7c2d672413
MINOR: Update library dependencies (Q1 2022) (#11306)
- scala 2.13: 2.13.6 -> 2.13.8
  * Support Java 18 and improve Android compatibility
  * https://www.scala-lang.org/news/2.13.7
  * https://www.scala-lang.org/news/2.13.8
- scala 2.12: 2.12.14 -> 2.12.15. 
  * The `-release` flag now works with Scala 2.12, backend parallelism
    can be enabled via `-Ybackend-parallelism N` and string interpolation
    is more efficient.
  * https://www.scala-lang.org/news/2.12.5
- gradle versions plugin: 0.38.0 -> 0.42.0
  * Minor fixes
  * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.40.0
  * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.41.0
  * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.42.0
- gradle dependency check plugin: 6.1.6 -> 6.5.3
  * Minor fixes
- gradle spotbugs plugin: 4.7.1 -> 5.0.5
  * Fixes and minor improvements
  * There were too many releases to include all the links, include the major version bump
  * https://github.com/spotbugs/spotbugs-gradle-plugin/releases/tag/5.0.0
- gradle scoverage plugin: 5.0.0 -> 7.0.0
  * Support newer Gradle versions and other improvements
  * https://github.com/scoverage/gradle-scoverage/releases/tag/6.0.0
  * https://github.com/scoverage/gradle-scoverage/releases/tag/6.1.0
  * https://github.com/scoverage/gradle-scoverage/releases/tag/7.0.0
- gradle shadow plugin: 7.0.0 -> 7.1.2
  * Support gradle toolchains and security fixes
  * https://github.com/johnrengelman/shadow/releases/tag/7.1.0
  * https://github.com/johnrengelman/shadow/releases/tag/7.1.1
  * https://github.com/johnrengelman/shadow/releases/tag/7.1.2
- bcpkix: 1.66 -> 1.70
  * Several improvements and fixes
  * https://www.bouncycastle.org/releasenotes.html
- jline: 3.12.1 -> 3.21.0
  * Various fixes and improvements
- jmh: 1.32 -> 1.34
  * Compiler blackhole enabled by default when using Java 17 and improved
    gradle incremental compilation
  * https://mail.openjdk.java.net/pipermail/jmh-dev/2021-August/003355.html
  * https://mail.openjdk.java.net/pipermail/jmh-dev/2021-December/003406.html
- scalaLogging: 3.9.3 -> 3.9.4
  * Support for Scala 3.0
- jose4j: 0.7.8 -> 0.7.9
  * Minor fixes
- junit: 5.7.1 -> 5.8.2
  * Minor improvements and fixes
  * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.0
  * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.1
  * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.2
- jqwik: 1.5.0 -> 1.6.3
  * Numerous improvements
  * https://github.com/jlink/jqwik/releases/tag/1.6.0
- mavenArtifact: 3.8.1 -> 3.8.4
- mockito: 3.12.4 -> 4.3.1
  * Removed deprecated methods, `DoNotMock` annotation and
    minor fixes/improvements
  * https://github.com/mockito/mockito/releases/tag/v4.0.0
  * https://github.com/mockito/mockito/releases/tag/v4.1.0
  * https://github.com/mockito/mockito/releases/tag/v4.2.0
  * https://github.com/mockito/mockito/releases/tag/v4.3.0
- scalaCollectionCompat: 2.4.4 -> 2.6.0
  * Minor fixes
  * https://github.com/scala/scala-collection-compat/releases/tag/v2.5.0
  * https://github.com/scala/scala-collection-compat/releases/tag/v2.6.0
- scalaJava8Compat: 1.0.0 -> 1.0.2
  * Minor changes
- scoverage: 1.4.1 -> 1.4.11
  * Support for newer Scala versions
- slf4j: 1.7.30 -> 1.7.32
  * Minor fixes, 1.7.35 automatically uses reload4j and 1.7.33/1.7.34
    cause build failures, so we stick with 1.7.32 for now.
- zstd: 1.5.0-4 -> 1.5.2-1
  * zstd 1.5.2
  * Small refinements and performance improvements
  * https://github.com/facebook/zstd/releases/tag/v1.5.1
  * https://github.com/facebook/zstd/releases/tag/v1.5.2

Checkstyle, spotBugs and spotless will be upgraded separately as they
either require non trivial code changes or they have regressions
that affect us.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2022-02-07 15:24:50 -08:00
Jonathan Albrecht ec05f90a3d
KAFKA-13599: Upgrade RocksDB to 6.27.3 (#11690)
RocksDB v6.27.3 has been released and it is the first release to support s390x. RocksDB is currently the only dependency in gradle/dependencies.gradle without s390x support.

RocksDB v6.27.3 has added some new options that require an update to streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapter.java but no other changes are needed to upgrade.

I have run the unit/integration tests locally on s390x and also the :streams tests on x86_64 and they pass.

Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-02-02 10:56:14 +01:00
Matthias J. Sax 67cf187603 Revert "KAFKA-13152: Replace "buffered.records.per.partition" with "input.buffer.max.bytes" (#11424)"
This reverts commit 14c6030c6a.
Reason: Implemenation breaks backward compatibility
2022-02-01 14:08:11 -08:00
kurtostfeld 830d83e2cd
MINOR: Fix typo "Exsiting" -> "Existing" (#11547)
Co-authored-by: Kurt Ostfeld <kurt@samba.tv>
Reviewers: Kvicii <Karonazaba@gmail.com>, Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2022-02-01 11:09:04 +01:00
David Jacot 7215c90c5e
MINOR: Add 3.0 and 3.1 to streams system tests (#11716)
Reviewers: Bill Bejeck <bill@confluent.io>
2022-01-28 10:06:31 +01:00
vamossagar12 14c6030c6a
KAFKA-13152: Replace "buffered.records.per.partition" with "input.buffer.max.bytes" (#11424)
This PR is an implementation of: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390. The following changes have been made:

* Adding a new config input.buffer.max.bytes applicable at a topology level.
* Adding new config statestore.cache.max.bytes.
* Adding new metric called input-buffer-bytes-total.
* The per partition config buffered.records.per.partition is deprecated.
* The config cache.max.bytes.buffering is deprecated.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang@confluent.io>
2022-01-27 21:19:04 -08:00
Matthias J. Sax af377b5f30
KAFKA-13423: GlobalThread should not log ERROR on clean shutdown (#11455)
Reviewers: Guozhang Wang <guozhang@confluent.io>, Bruno Cadonna <cadonna@confluent.io>
2022-01-27 20:40:43 -08:00
Patrick Stuedi 1a21892663
KAFKA-13605: checkpoint position in state stores (#11676)
There are cases in which a state store neither has an in-memory position built up nor has it gone through the state restoration process. If a store is persistent (i.e., RocksDB), and we stop and restart Streams, we will have neither of those continuity mechanisms available.

This patch:
* adds a test to verify that all stores correctly recover their position after a restart
* implements storage and recovery of the position for persistent stores alongside on-disk state

Reviewers: Vicky Papavasileiou <vpapavasileiou@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <guozhang@apache.org>, John Roesler <vvcephei@apache.org>
2022-01-27 09:25:04 -06:00
Vicky Papavasileiou fe72187cb1
KAFKA-13524: Add IQv2 query handling to the caching layer (#11682)
Currently, IQv2 forwards all queries to the underlying store. We add this bypass to allow handling of key queries in the cache. If a key exists in the cache, it will get answered from there.
As part of this PR, we realized we need access to the position of the underlying stores. So, I added the method getPosition to the public API and ensured all state stores implement it. Only the "leaf" stores (Rocks*, InMemory*) have an actual position, all wrapping stores access their wrapped store's position.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, John Roesler <vvcephei@apache.org>
2022-01-26 09:36:39 -06:00
Vicky Papavasileiou 868cbcb8e5
MINOR: Fix bug of empty position in windowed and session stores #11713
Reviewers: John Roesler <vvcephei@apache.org>
2022-01-25 13:46:20 -06:00
John Roesler 96fa468106
MINOR: fix NPE in iqv2 (#11702)
There is a brief window between when the store is registered and when
it is initialized when it might handle a query, but there is no context.
We treat this condition just like a store that hasn't caught up to the
desired position yet.

Reviewers: Guozhang Wang <guozhang@apache.org>, Matthias J. Sax <mjsax@apache.org>, A. Sophie Blee-Goldman <ableegoldman@apache.org>, Patrick Stuedi <pstuedi@apache.org>
2022-01-25 13:23:46 -06:00
A. Sophie Blee-Goldman 9d602a01be
KAFKA-12648: invoke exception handler for MissingSourceTopicException with named topologies (#11686)
Followup to #11600 to invoke the streams exception handler on the MissingSourceTopicException, without killing/replacing the thread

Reviewers: Guozhang Wang <guozhang@confluent.io>, Bruno Cadonna <cadonna@confluent.io>
2022-01-25 10:37:35 -08:00
A. Sophie Blee-Goldman 265d3199ec
KAFKA-12648: fixes for query APIs with named topologies (#11609)
Fixes some issues with the NamedTopology version of the IQ methods that accept a topologyName argument, and adds tests for all.

Reviewers:  Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2022-01-25 05:49:23 -08:00
Sayantanu Dey d13d09fb68
KAFKA-13590:rename InternalTopologyBuilder#topicGroups (#11704)
Renamed the often confusing and opaque #topicGroups API to #subtopologyToTopicsInfo

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-01-24 21:03:37 -08:00
Aleksandr Sorokoumov 7d9b9847f1
KAFKA-6502: Update consumed offsets on corrupted records. (#11683)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-01-20 09:26:38 -08:00
A. Sophie Blee-Goldman 529dde904a
KAFKA-12648: handle MissingSourceTopicException for named topologies (#11600)
Avoid throwing a MissingSourceTopicException inside the #assign method when named topologies are used, and just remove those topologies which are missing any of their input topics from the assignment.

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Bruno Cadonna <cadonna@confluent.io>
2022-01-18 11:49:23 -08:00
Walker Carlson c182a431d2
MINOR: prefix topics if internal config is set (#11611)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-01-10 16:08:48 -08:00
Guozhang Wang 9078451e37
MINOR: Add num threads logging upon shutdown (#11652)
1. Add num of threads logging upon shutdown.
2. Prefix the shutdown thread with client id.

Reviewers: John Roesler <vvcephei@apache.org>
2022-01-06 11:28:27 -08:00
Richard 7567cbc857
KAFKA-13476: Increase resilience timestamp decoding Kafka Streams (#11535)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-01-05 21:38:10 -08:00
John Roesler b424553101
KAFKA-13553: Add PAPI Window and Session store tests for IQv2 (#11650)
During some recent reviews, @mjsax pointed out that StateStore layers
are constructed differently the stores are added via the PAPI vs. the DSL.

This PR adds PAPI construction for Window and Session stores to the
IQv2StoreIntegrationTest so that we can ensure IQv2 works on every
possible state store.

Reviewer: Guozhang Wang <guozhang@apache.org>
2022-01-05 23:16:33 -06:00
John Roesler 7ef8701cca
KAFKA-13553: add PAPI KV store tests for IQv2 (#11624)
During some recent reviews, @mjsax pointed out that StateStore layers
are constructed differently the stores are added via the PAPI vs. the DSL.

This PR adds KeyValueStore PAPI construction to the
IQv2StoreIntegrationTest so that we can ensure IQv2 works on every
possible state store.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, Guozhang Wang <guozhang@apache.org>
2022-01-05 21:04:37 -06:00
Patrick Stuedi b8f1cf14c3
KAFKA-13494: WindowKeyQuery and WindowRangeQuery (#11567)
Implement WindowKeyQuery and WindowRangeQuery as
proposed in KIP-806

Reviewer: John Roesler <vvcephei@apache.org>
2022-01-02 22:17:38 -06:00
John Roesler 018fb88efa
KAFKA-13557: Remove swapResult from the public API (#11617)
Follow-on from #11582 .
Removes a public API method in favor of an internal utility method.

Reviewer: Matthias J. Sax <mjsax@apache.org>
2021-12-20 19:04:08 -06:00
John Roesler 5747788659
KAFKA-13525: Implement KeyQuery in Streams IQv2 (#11582)
Implement the KeyQuery as proposed in KIP-796

Reviewers: Vicky Papavasileiou <vpapavasileiou@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <guozhang@apache.org>
2021-12-20 12:22:05 -06:00
Walker Carlson 247c271353
MINOR: retry when deleting offsets for named topologies (#11604)
When this was made I didn't expect deleteOffsetsResult to be set if an exception was thrown. But it is and to retry we need to reset it to null. Changing the KafkaStreamsNamedTopologyWrapper for remove topology when resetting offsets to retry upon GroupSubscribedToTopicException and swallow/complete upon GroupIdNotFoundException

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@ache.>
2021-12-16 19:39:55 -08:00
Vicky Papavasileiou b38f6ba5cc
KAFKA-13479: Implement range and scan queries (#11598)
Implement the RangeQuery as proposed in KIP-805

Reviewers: John Roesler <vvcephei@apache.org>
2021-12-16 11:09:01 -06:00
Walker Carlson 04787334a5
MINOR: Update log and method name in TopologyMetadata (#11589)
Update an unclear log message and method name in TopologyMetadata

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang@confluent.io>, Luke Chen <showuon@confluent.io>
2021-12-15 19:43:40 -08:00
John Roesler acd1f9c563
KAFKA-13522: add position tracking and bounding to IQv2 (#11581)
* Fill in the Position response in the IQv2 result.
* Enforce PositionBound in IQv2 queries.
* Update integration testing approach to leverage consistent queries.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, Vicky Papavasileiou <vpapavasileiou@confluent.io>, Guozhang Wang <guozhang@apache.org>
2021-12-11 01:00:59 -06:00
A. Sophie Blee-Goldman 1e459271d7
KAFKA-12648: fix IllegalStateException in ClientState after removing topologies (#11591)
Fix for one of the causes of failure in the NamedTopologyIntegrationTest: org.apache.kafka.streams.errors.StreamsException: java.lang.IllegalStateException: Must initialize prevActiveTasks from ownedPartitions before initializing remaining tasks.

This exception could occur if a member sent in a subscription where all of its ownedPartitions were from a named topology that is no longer recognized by the group leader, eg because it was just removed from the client. We should filter each ClientState based on the current topology only so the assignor only processes the partitions/tasks it can identify. The member with the out-of-date tasks will eventually clean them up when the #removeNamedTopology API is invoked on them

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2021-12-10 14:26:27 -08:00
A. Sophie Blee-Goldman d5eb3c10ec
HOTFIX: fix failing StreamsMetadataStateTest tests (#11590)
Followup to #11562 to fix broken tests in StreamsMetadataStateTest

Reviewers: Walker Carlson <wcarlson@confluent.io>
2021-12-09 16:19:56 -08:00
Vicky Papavasileiou e1dba7af57
MINOR: Cleanup for #11513 (#11585)
Clean up some minor things that were left over from PR #11513

Reviewer: John Roesler <vvcephei@apache.org>
2021-12-09 13:23:01 -06:00
Tamara Skokova 133b515b5e
KAFKA-13507: GlobalProcessor ignores user specified names (#11573)
Use the name specified via consumed parameter in InternalStreamsBuilder#addGlobalStore method for initializing the source name and processor name. If not specified, the names are generated.

Reviewers: Luke Chen <showuon@gmail.com>, Bill Bejeck <bbejeck@apache.org>
2021-12-09 09:42:00 -05:00
Tolga H. Dur e20f102298
KAFKA-12648: extend IQ APIs to work with named topologies (#11562)
In the new NamedTopology API being worked on, state store names and their uniqueness requirement is going to be scoped only to the owning topology, rather than to the entire app. In other words, two different named topologies can have different state stores with the same name.

This is going to cause problems for some of the existing IQ APIs which require only a name to resolve the underlying state store. We're now going to need to take in the topology name in addition to the state store name to be able to locate the specific store a user wants to query

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-12-09 03:54:28 -08:00
Vicky Papavasileiou 7acd12d6e3
KAFKA-13506: Write and restore position to/from changelog (#11513)
Introduces changelog headers to pass position information
to standby and restoring stores. The headers are guarded by an internal
config, which defaults to `false` for backward compatibility. Once IQv2
is finalized, we will make that flag a public config.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, John Roesler <vvcephei@apache.org>
2021-12-08 11:58:35 -06:00
Bruno Cadonna 68c3018a5a
MINOR: Fix internal topic manager tests (#11574)
When the unit tests of the internal topic manager test
are executed on a slow machine (like sometimes in automatic builds)
they sometimes fail with a timeout exception instead of the expected
exception. To fix this behavior, this commit replaces the use of
system time with mock time.

Reviewer: John Roesler <vvcephei@apache.org>
2021-12-07 18:23:52 +01:00
Walker Carlson 965ec40c0a
KAFKA-12648: Make changing the named topologies have a blocking option (#11479)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-12-03 11:32:55 -08:00
John Roesler 14c2449050
KAFKA-13491: IQv2 framework (#11557)
Implements the major part of the IQv2 framework as proposed in KIP-796.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, Vicky Papavasileiou <vpapavasileiou@confluent.io>, Bruno Cadonnna <cadonna@apache.org>
2021-12-03 12:53:31 -06:00
Patrick Stuedi 62f73c30d3
KAFKA-13498: track position in remaining state stores (#11541)
Reviewers: Vicky Papavasileiou <vpapavasileiou@confluent.io>, John Roesler<vvcephei@apache.org>
2021-12-01 11:49:10 -06:00
Bruno Cadonna 4fed0001ec
MINOR: Fix system test StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance (#11532)
Log messages were changed in the AssignorConfiguration (#11490) that are
also used for verification in system test
StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance.

This commit fixes the test and adds comments to the log messages
that point to the test that needs to be updated in case of
changes to the log messages.

Reviewers: John Roesler <vvcephei@apache.org>, Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2021-11-25 10:48:09 +01:00
Patrick Stuedi 23e9818e62
KAFKA-13480: Track Position in KeyValue stores (#11514)
Add position tracking to KeyValue stores in support of KIP-796

Reviewers: John Roesler <vvcephei@apache.org>
2021-11-24 18:28:00 -06:00
Jorge Esteban Quilcate Otoya 1e0916580f
KAFKA-13117: migrate TupleForwarder and CacheFlushListener to new Record API (#11481)
* Migrate TupleForwarder and CacheFlushListener to new Processor API
* Update the affected Processors

Reviewers: John Roesler <vvcephei@apache.org>
2021-11-22 21:34:59 -06:00
Bruno Cadonna 9285820df0
MINOR: Set mock correctly in RocksDBMetricsRecorderTest (#11462)
With a nice mock in RocksDBMetricsRecorderTest#shouldCorrectlyHandleHitRatioRecordingsWithZeroHitsAndMisses() and RocksDBMetricsRecorderTest#shouldCorrectlyHandleAvgRecordingsWithZeroSumAndCount() were green although getTickerCount() was never called. The tests were green because EasyMock returns 0 for a numerical return value by default if no expectation is specified. Thus, commenting out the expectation for getTickerCount() did not change the result of the test.

This commit changes the mock to a default mock and fixes the expectation to expect getAndResetTickerCount(). Now, commenting out the expectation leads to a test failure.

Reviewers: Luizfrf3 <lf.fonseca@hotmail.com>, Guozhang Wang <wangguoz@gmail.com>
2021-11-18 18:14:36 +01:00
Luke Chen 1b4cffdcb7
KAFKA-13439: Deprecate eager rebalance protocol in kafka stream (#11490)
Deprecate eager rebalancing protocol in kafka streams and log warning message when upgrade.from is set to 2.3 or lower. Also add a note in upgrade doc to prepare users for the removal of eager rebalancing support

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-11-17 03:05:19 -08:00
Matthias J. Sax 30d1989db1
MINOR: update Kafka Streams standby task config (#11404)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Antony Stubbs <antony@confluent.io>, James Galasyn <jim.galasyn@confluent.io>
2021-11-16 17:34:49 -08:00
Patrick Stuedi ffbef88cd7
Add recordMetadata() to StateStoreContext (#11498)
Implements KIP-791

Reviewers: John Roesler <vvcephei@apache.org>
2021-11-16 10:51:41 -06:00
mkandaswamy babc54333c
MINOR: Improve KafkaStreamsTest: testInitializesAndDestroysMetricsReporters (#11494)
Add additional asserts for KafkaStreamsTest: testInitializesAndDestroysMetricsReporters to help diagnose if it flakily fails in the future.
- MockMetricsReporter gets initialized only once during KafkaStreams construction, so make assert check stricter by ensuring initDiff is one.
- Assert KafkaStreams is not running before we validate whether MockMetricsMetricsReporter close count got incremented after streams close.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2021-11-16 10:50:15 +01:00
A. Sophie Blee-Goldman 908a6d2ad7
KAFKA-12648: introduce TopologyConfig and TaskConfig for topology-level overrides (#11272)
Most configs that are read and used by Streams today originate from the properties passed in to the KafkaStreams constructor, which means they get applied universally across all threads, tasks, subtopologies, and so on. The only current exception to this is the topology.optimization config which is parsed from the properties that get passed in to StreamsBuilder#build. However there are a handful of configs that could also be scoped to the topology level, allowing users to configure each NamedTopology independently of the others, where it makes sense to do so.

This PR refactors the handling of these configs by interpreting the values passed in via KafkaStreams constructor as the global defaults, which can then be overridden for individual topologies via the properties passed in when building the NamedTopology. More topology-level configs may be added in the future, but this PR covers the following:

max.task.idle.ms
task.timeout.ms
buffered.records.per.partition
default.timestamp.extractor.class
default.deserialization.exception.handler

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Walker Carlson <wcarlson@confluent.io>, Luke Chen <showuon@confluent.io>
2021-11-10 11:27:59 -08:00
Jorge Esteban Quilcate Otoya 807c5b4d28
KAFKA-10543: Convert KTable joins to new PAPI (#11412)
* Migrate KTable joins to new Processor API.
* Migrate missing KTableProcessorSupplier implementations.
* Replace KTableProcessorSupplier with new Processor API implementation.

Reviewers: John Roesler <vvcephei@apache.org>
2021-11-08 14:48:54 -06:00
Victoria Xia 01e6a6ebf2
KAFKA-13261: Add support for custom partitioners in foreign key joins (#11368)
Implements KIP-775.

Co-authored-by: Tomas Forsman <tomas-forsman@users.noreply.github.com>
2021-11-03 10:55:24 -07:00
Luizfrf3 252a40ea1f
KAFKA-8941: Add RocksDB Metrics that Could not be Added due to RocksD… (#11441)
This PR adds some RocksDB metrics that could not be added in KIP-471 due to RocksDB version. The new metrics are extracted using Histogram data provided by RocksDB API, and the old ones were extracted using Tickers. The new metrics added are memtable-flush-time-(avg|min|max) and compaction-time-(avg|min|max).

Reviewer: Bruno Cadonnna <cadonna@apache.org>
2021-11-03 12:18:28 +01:00
David Jacot 3aef0a5ceb
MINOR: Bump trunk to 3.2.0-SNAPSHOT (#11458)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
2021-11-02 13:38:54 +01:00
A. Sophie Blee-Goldman 22aa9d2ce1
KAFKA-12648: fill in javadocs for the StreamsException class with new guarantees (#11436)
Minor followup to #11405 / KIP-783 to write down the new guarantees we're providing about the meaning of a StreamsException in the javadocs of that class

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-10-27 10:47:57 -07:00
Andrew Patterson 86e83de742
kafka-12994: Migrated SlidingWindowsTest to new API (#11379)
As raised in KAFKA-12994, All tests that use the old API should be either eliminated or migrated to the new API in order to remove the @SuppressWarnings("deprecation") annotations.

This PR migrates the SlidingWindowsTest to the new API.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-10-21 16:50:37 -07:00
A. Sophie Blee-Goldman b534124224
KAFKA-12648: Wrap all exceptions thrown to handler as StreamsException & add TaskId field (#11405)
To help users distinguish which task an exception was thrown from, and which NamedTopology if it exists, we add a TaskId field to the StreamsException class. We then make sure that all exceptions thrown to the handler are wrapped as StreamsExceptions, to help the user simplify their handling code as they know they will always need to unwrap the thrown exception exactly once.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Luke Chen <showuon@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, John Roesler <vvcephei@apache.org>
2021-10-21 16:21:20 -07:00
Jorge Esteban Quilcate Otoya 68223d3227
KAFKA-10539: Convert KStreamImpl joins to new PAPI (#11356)
Part of the migration to new Processor API, this PR converts KStream to KStream joins.

Depends #11315

Reviewers: John Roesler <vvcephei@apache.org>
2021-10-18 15:34:08 -05:00
Lucas Bradstreet da38a1df27
MINOR: "partition" typos and method doc arg fix (#11298)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Luke Chen <showuon@gmail.com>
2021-10-18 10:44:11 +02:00
Matthias J. Sax f58d79d549
KAFKA-13345: Use "delete" cleanup policy for windowed stores if duplicates are enabled (#11380)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Luke Chen <showuon@gmail.com>
2021-10-14 22:06:30 -07:00
Matthias J. Sax 7fbe482289
HOTFIX: suppress deprecation warnings to fix compilation errors (#11400)
Reviewers: Bill Bejeck <bill@confluent.io>
2021-10-14 18:58:20 -07:00
Luke Chen 65b01a0464
KAFKA-13212: add support infinite query for session store (#11234)
Add support for infinite range query for SessionStore.

Reviewers: Patrick Stuedi <pstuedi@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2021-10-12 16:14:38 -07:00
Luke Chen d1415866cc
KAFKA-13021: disallow grace called after grace set via new API (#11188)
Disallow calling grace() if it was already set via ofTimeDifferenceAndGrace/WithNoGrace(). Add the check to disallow grace called after grace set via new API, and add tests for them.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-10-12 16:05:22 -07:00
Guozhang Wang e31a53a2cd
KAFKA-13319: Do not commit empty offsets on producer (#11362)
We observed on the broker side that txn-offset-commit request with empty topics are received. After checking the source code I found there's on place on Streams which is unnecessarily sending empty offsets. This PR cleans up the streams layer logic a bit to not send empty offsets, and at the same time also guard against empty offsets at the producer layer as well.

Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>
2021-10-12 13:33:23 -07:00
Luke Chen 769882d910
MINOR: remove unneeded size and add lock coarsening to inMemoryKeyValueStore (#11370)
Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-10-06 18:32:10 -07:00
Jorge Esteban Quilcate Otoya ce83e5be66
KAFKA-10540: Migrate KStream aggregate operations (#11315)
As part of the migration of KStream/KTable operations to the new
Processor API https://issues.apache.org/jira/browse/KAFKA-8410,
this PR includes the migration of KStream aggregate/reduce operations.

Reviewers: John Roesler <vvcephei@apache.org>
2021-09-30 11:40:40 -05:00
A. Sophie Blee-Goldman 6d7a785956
MINOR: expand logging and improve error message during partition count resolution (#11364)
Recently a user hit this TaskAssignmentException due to a bug in their regex that meant no topics matched the pattern subscription, which in turn meant that it was impossible to resolve the number of partitions of the downstream repartition since there was no upstream topic to get the partition count for. Debugging this was pretty difficult and ultimately came down to stepping through the code line by line, since even with TRACE logging we only got a partial picture.

We should expand the logging to make sure the TRACE logging hits both conditional branches, and improve the error message with a suggestion for what to look for should someone hit this in the future

Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2021-09-29 18:19:43 -07:00
Luke Chen 361b7845c6
KAFKA-13309: fix InMemorySessionStore#fetch/backwardFetch order issue (#11337)
In #9139, we added backward iterator on SessionStore. But there is a bug that when fetch/backwardFetch the key range, if there are multiple records in the same session window, we can't return the data in the correct order.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman<apache.org>
2021-09-28 17:31:29 -07:00
vamossagar12 7e57300148
KAFKA-12486: Enforce Rebalance when a TaskCorruptedException is throw… (#11076)
This PR aims to utilize HighAvailabilityTaskAssignor to avoid downtime on corrupted tasks. The idea is that, when we hit TaskCorruptedException on an active task, a rebalance is triggered after we've wiped out the corrupted state stores. This will allow the assignor to temporarily redirect this task to another client who can resume work on the task while the original owner works on restoring the state from scratch.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-09-28 16:50:16 -07:00
Andrew Patterson 24a335b338
KAFKA-12994: Migrate TimeWindowsTest to new API (#11215)
As raised in KAFKA-12994, All tests that use the old API should be either eliminated or migrated to the new API in order to remove the @SuppressWarnings("deprecation") annotations. This PR will migrate over all the relevant tests in TimeWindowsTests.java

Reviewers: Anna Sophie Blee-Goldman
2021-09-28 15:58:49 -07:00
Walker Carlson 4eb386f6e0
KAFKA-13296: warn if previous assignment has duplicate partitions (#11347)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Luke Chen <showuon@gmail.com>
2021-09-25 11:41:52 -07:00
Jorge Esteban Quilcate Otoya d15969af28
KAFKA-10544: Migrate KTable aggregate and reduce (#11316)
As part of the migration of KStream/KTable operations to the new Processor API https://issues.apache.org/jira/browse/KAFKA-8410, this PR includes the migration of KTable aggregate/reduce operations.

Reviewers: John Roesler <vvcephei@apache.org>
2021-09-22 14:25:41 -05:00
Luke Chen b61ec0003f
KAFKA-13211: add support for infinite range query for WindowStore (#11227)
Add support for infinite range query for WindowStore. Story JIRA: https://issues.apache.org/jira/browse/KAFKA-13210

Reviewers: Patrick Stuedi <pstuedi@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
2021-09-22 09:14:58 -07:00
andy0x01 5a6f19b2a1
KAFKA-13246: StoreQueryIntegrationTest#shouldQueryStoresAfterAddingAndRemovingStreamThread now waits for the client state to go to REBALANCING/RUNNING after adding/removing a thread and waits for state RUNNING before querying the state store. (#11334)
KAFKA-13246: StoreQueryIntegrationTest#shouldQueryStoresAfterAddingAndRemovingStreamThread does not gate on stream state well

The test now waits for the client to transition to REBALANCING/RUNNING after adding/removing a thread as well as to transition to RUNNING before querying the state store.

Reviewers: singingMan <@3schwartz>, Walker Carlson <wcarlson@confluent.io>, John Roesler <vvcephei@apache.org>
2021-09-21 11:18:15 -05:00
Guozhang Wang 1b0294dfc4
MINOR: Let the list-store return null in putifabsent (#11335)
This is to make sure that even if logging is disabled, we would still return null in order to workaround the deserialization issue for stream-stream left/outer joins.

Reviewers: Matthias J. Sax <mjsax@apache.org>
2021-09-17 12:05:21 -07:00
Guozhang Wang a0c7e6d8b4
KAFKA-13216: Use a KV with list serde for the shared store (#11252)
This is an alternative approach in parallel to #11235. After several unsuccessful trials to improve its efficiency i've come up with a larger approach, which is to use a kv-store instead as the shared store, which would store the value as list. The benefits of this approach are:

Only serde once that compose <timestamp, byte, key>, at the outer metered stores, with less byte array copies.
Deletes are straight-forward with no scan reads, just a single call to delete all duplicated <timestamp, byte, key> values.
Using a KV store has less space amplification than a segmented window store.
The cons though:

Each put call would be a get-then-write to append to the list; also we would spend a few more bytes to store the list (most likely a singleton list, and hence just 4 more bytes).
It's more complicated definitely.. :)
The main idea is that since the shared store is actively GC'ed by the expiration logic, not based on time retention, and since that the key format is in <timestamp, byte, key>, the range expiration query is quite efficient as well.

Added testing covering for the list stores (since we are still use kv-store interface, we cannot leverage on the get() calls in the stream-stream join, instead we use putIfAbsent and range only). Another minor factoring piggy-backed is to let toList to always close iterator to avoid leaking.

Reviewers: Sergio Peña <sergio@confluent.io>, Matthias J. Sax <mjsax@apache.org>
2021-09-16 16:44:32 -07:00
Luke Chen 9628c1278e
KAFKA-13264: fix inMemoryWindowStore backward fetch not in reversed order (#11292)
When introducing backward iterator for WindowStroe in #9138, we forgot to make "each segment" in reverse order (i.e. in descendingMap) in InMemoryWindowStore. Fix it and add integration tests for it.

Currently, in Window store, we store records in [segments -> [records] ].

For example:
window size = 500,
input records:

key: "a", value: "aa", timestamp: 0 ==> will be in [0, 500] window
key: "b", value: "bb", timestamp: 10 ==> will be in [0, 500] window
key: "c", value: "cc", timestamp: 510 ==> will be in [500, 1000] window

So, internally, the "a" and "b" will be in the same segment, and "c" in another segments.
segments: [0 /* window start */, records], [500, records].
And the records for window start 0 will be "a" and "b".
the records for window start 500 will be "c".

Before this change, we did have a reverse iterator for segments, but not in "records". So, when doing backwardFetchAll, we'll have the records returned in order: "c", "a", "b", which should be "c", "b", "a" obviously.

Reviewers: Jorge Esteban Quilcate Otoya <quilcate.jorge@gmail.com>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2021-09-13 14:40:54 -07:00
Oliver Hutchison a03bda61e0
KAFKA-13249: Always update changelog offsets before writing the checkpoint file (#11283)
When using EOS checkpointed offsets are not updated to the latest offsets from the changelog because the maybeWriteCheckpoint method is only ever called when commitNeeded=false. This change will force the update if enforceCheckpoint=true .

I have also added a test which verifies that both the state store and the checkpoint file are completely up to date with the changelog after the app has shutdown.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2021-09-13 14:15:22 -07:00
Josep Prat 286126f9a5
KAFKA-13243: KIP-773 Differentiate metric latency measured in ms and ns (#11302)
KAFKA-13243: KIP-773 Differentiate metric latency measured in ms and ns

Implementation of KIP-773

Deprecates inconsistent metrics bufferpool-wait-time-total,
io-waittime-total, and iotime-total.
Introduces new metrics bufferpool-wait-time-ns-total,
io-wait-time-ns-total, and io-time-ns-total with the same semantics as
before.
Adds metrics (old and new) in ops.html.
Adds upgrade guide for these metrics.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Tom Bentley <tbentley@redhat.com>
2021-09-08 18:00:58 +01:00
Tomer Wizman fb77da941a
KAFKA-12766 - Disabling WAL-related Options in RocksDB (#11250)
Description
Streams disables the write-ahead log (WAL) provided by RocksDB since it replicates the data in changelog topics. Hence, it does not make much sense to set WAL-related configs for RocksDB.

Proposed solution
Ignore any WAL-related configuration and state in the log that we are ignoring them.

Co-authored-by: Tomer Wizman <tomer.wizman@personetics.com>
Co-authored-by: Bruno Cadonna <cadonna@apache.org>

Reviewers: Boyang Chen <boyang@apache.org>, Bruno Cadonna <cadonna@apache.org>
2021-09-08 13:57:08 +02:00
Christo Lolov 6472e79092
KAFKA-12994 Migrate JoinWindowsTest and SessionWindowsTest to new API (#11214)
As detailed in KAFKA-12994, unit tests using the old API should be either removed or migrated to the new API.
This PR migrates relevant tests in JoinWindowsTest.java and SessionWindowsTest.java.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-09-07 19:50:18 -07:00
Jorge Esteban Quilcate Otoya 5f89ce5f20
KAFKA-13201: Convert KTable suppress to new PAPI (#11213)
Migrate Suppress as part of the migration of KStream/KTable
 operations to the new Processor API (KAFKA-8410)

Reviewers: John Roesler <vvcephei@apache.org>
2021-09-07 17:17:44 -05:00
CHUN-HAO TANG 89ee72b5b5
KAFKA-13088: Replace EasyMock with Mockito for ForwardingDisabledProcessorContextTest (#11051)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-09-06 08:18:52 -07:00
Yanwen(Jason) Lin 7c64077b34
KAFKA-13032: add NPE checker for KeyValueMapper (#11241)
Currently KStreamMap and KStreamFlatMap classes will throw NPE if the call to KeyValueMapper#apply() return null. This commit checks whether the result of KeyValueMapper#apply() is null and throws a more meaningful error message for better debugging.

Two unit tests are also added to check if we successfully captured nulls.

Reviewers: Josep Prat <josep.prat@aiven.io>,  Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2021-09-06 14:03:16 +02:00
Josep Prat 4835c64f89
KAFKA-12887 Skip some RuntimeExceptions from exception handler (#11228)
Instead of letting all RuntimeExceptions go through and be processed by the uncaught exception handler, IllegalStateException and IllegalArgumentException are not passed through and fail fast. In this PR when setting the uncaught exception handler we check if the exception is in an "exclude list", if so, we terminate the client, otherwise we continue as usual.

Added test checking this new case. Added integration test checking that user defined exception handler is not used when an IllegalStateException is thrown.

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-09-01 09:58:36 -07:00
Rohan a5ce43781e
MINOR: add units to metrics descriptions + test fix post KAFKA-13229 (#11286)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-08-31 11:44:45 -07:00
Rohan 01ab888dbd
KAFKA-13229: add total blocked time metric to streams (KIP-761) (#11149)
* Add the following producer metrics:
flush-time-total: cumulative sum of time elapsed during in flush.
txn-init-time-total: cumulative sum of time elapsed during in initTransactions.
txn-begin-time-total: cumulative sum of time elapsed during in beginTransaction.
txn-send-offsets-time-total: cumulative sum of time elapsed during in sendOffsetsToTransaction.
txn-commit-time-total: cumulative sum of time elapsed during in commitTransaction.
txn-abort-time-total: cumulative sum of time elapsed during in abortTransaction.

* Add the following consumer metrics:
commited-time-total: cumulative sum of time elapsed during in committed.
commit-sync-time-total: cumulative sum of time elapsed during in commitSync.

* Add a total-blocked-time metric to streams that is the sum of:
consumer’s io-waittime-total
consumer’s iotime-total
consumer’s committed-time-total
consumer’s commit-sync-time-total
restore consumer’s io-waittime-total
restore consumer’s iotime-total
admin client’s io-waittime-total
admin client’s iotime-total
producer’s bufferpool-wait-time-total
producer's flush-time-total
producer's txn-init-time-total
producer's txn-begin-time-total
producer's txn-send-offsets-time-total
producer's txn-commit-time-total
producer's txn-abort-time-total

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-08-30 15:39:25 -07:00
dengziming 1d22b0d706
KAFKA-10774; Admin API for Describe topic using topic IDs (#9769)
Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Satish Duggana <satishd@apache.org>, Rajini Sivaram <rajinisivaram@googlemail.com>
2021-08-28 09:00:36 +01:00
Walker Carlson 49aed781d8
KAFKA-13128: extract retry checker and update with retriable exception causing flaky StoreQueryIntegrationTest (#11275)
Add a new case to the list of possible retriable exceptions for the flaky tests to take care of threads starting up

Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman
2021-08-27 20:12:12 -07:00
Andy Lapidas 84b111f968
KAFKA-12963: Add processor name to error (#11262)
This PR adds the processor name to the ClassCastException exception text in process()

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-08-27 20:06:49 -07:00
A. Sophie Blee-Goldman d9bb988954
MINOR: remove unused Properties from GraphNode#writeToTopology (#11263)
The GraphNode#writeToTopology method accepts a Properties input parameter, but never uses it in any of its implementations. We can remove this parameter to clean things up and help make it clear that writing nodes to the topology doesn't involve the app properties.

Reviewers: Bruno Cadonna <cadonna@confluent.io>
2021-08-26 17:19:03 -07:00
Luke Chen 844c1259a9
MINOR: Optimize the OrderedBytes#upperRange for not all query cases (#11181)
Currently in OrderedBytes#upperRange method, we'll check key bytes 1 by 1, to see if there's a byte value >= first timestamp byte value, so that we can skip the following key bytes, because we know compareTo will always return 0 or 1. However, in most cases, the first timestamp byte is always 0, more specifically the upperRange is called for both window store and session store. For former, the suffix is in timestamp, Long.MAX_VALUE and for latter the suffix is in Long.MAX_VALUE, timestamp. For Long.MAX_VALUE the first digit is not 0, for timestamp it could be 0 or not, but as long as it is up to "now" (i.e. Aug. 23rd) then the first byte should be 0 since the value is far smaller than what a long typed value could have. So in practice for window stores, that suffix's first byte has a large chance to be 0, and hence worth optimizing for.

This PR optimizes the not all query cases by not checking the key byte 1 by 1 (because we know the unsigned integer will always be >= 0), instead, put all bytes and timestamp directly. So, we won't have byte array copy in the end either.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-08-26 14:37:34 -07:00
A. Sophie Blee-Goldman 53277a92a6
HOTFIX: decrease session timeout in flaky NamedTopologyIntegrationTest (#11259)
Decrease session timeout back to 10s to improve test flakiness

Reviewers: Walker Carlson <wcarlson@confluent.io>
2021-08-25 21:52:32 -07:00
Phil Hardwick a594747d75
HOTFIX: Fix null pointer when getting metric value in MetricsReporter (#11248)
The alive stream threads metric relies on the threads field as a monitor object for
its synchronized block. When the alive stream threads metric is registered it isn't
initialised so any call to get the metric value before it is initialised will result
in a null pointer exception.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Walker Carlson <wcarlson@confluent.io>
2021-08-23 13:21:38 -07:00
John Roesler 45ecaa19f8
MINOR: Set session timeout back to 10s for Streams system tests (#11236)
We increased the default session timeout to 30s in KIP-735:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout

Since then, we are observing sporadic system test failures
due to rebalances taking longer than the test timeout.
Rather than increase the test wait times, we can just override
the session timeout to a value more appropriate in the testing
domain.

Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>
2021-08-20 11:27:54 -05:00
Luke Chen 2bfd0ae2e9
MINOR: update the branch(split) doc and java doc and tests (#11195)
Reviewers: Ivan Ponomarev <iponomarev@mail.ru>, Matthias J. Sax <matthias@confluent.io>
2021-08-10 13:37:59 -07:00
Guozhang Wang 35ebdd7ed0
MINOR: Fix flaky shouldCreateTopicWhenTopicLeaderNotAvailableAndThenTopicNotFound (#11155)
1. This test is taking two iterations since the firs iteration is designed to fail due to unknow topic leader. However both the timeout and the backoff are set to 50ms, while the actual SYSTEM time is used. This means we are likely to timeout before executing the second iteration. I thought about using a mock time but dropped that idea as it may forgo the purpose of this test, instead I set the backoff time to 10ms so that we are almost unlikely to hit this error anymore.

2. Found a minor issue while fixing this which is that when we have non-empty not-ready topics, but the topics-to-create is empty (which is possible as the test shouldCreateTopicWhenTopicLeaderNotAvailableAndThenTopicNotFound itself illustrates), we still call an empty create-topic function. Though it does not incur any round-trip it would still waste some cycles, so I branched it off and hence also simplified some unit tests.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@confluent.io>
2021-08-10 12:27:44 -07:00
Walker Carlson 9565a529e0
KAFKA-12779: rename namedTopology in TaskId to topologyName #11192
Update to KIP-740.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Israel Ekpo <israelekpo@gmail.com>
2021-08-09 15:19:21 -07:00
John Roesler f16a9499ec
MINOR: Increase smoke test production time (#11190)
We've seen a few failures recently due to the driver finishing
the production of data and verifying the results before the
whole cluster is even running.

Reviewers: Leah Thomas <lthomas@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Matthias J. Sax <mjsax@apache.org>
2021-08-09 14:19:35 -05:00
Josep Prat 83f0ae3821
KAFKA-12862: Update Scala fmt library and apply fixes (#10784)
Updates the scala fmt to the latest stable version.
Applies all the style fixes (all source code changes are done by scala 
fmt).
Removes setting about dangling parentheses as `true` is already the
default.

Reviewer: John Roesler <john@confluent.io>
2021-08-09 12:05:31 +02:00
A. Sophie Blee-Goldman 6854eb8332
KAFKA-12648: Pt. 3 - addNamedTopology API (#10788)
Pt. 1: #10609
Pt. 2: #10683
Pt. 3: #10788

In Pt. 3 we implement the addNamedTopology API. This can be used to update the processing topology of a running Kafka Streams application without resetting the app, or even pausing/restarting the process. It's up to the user to ensure that this API is called on every instance of an application to ensure all clients are able to run the newly added NamedTopology. 

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2021-08-06 00:18:27 -07:00
Walker Carlson be8820cdac
MINOR: Port fix to other StoreQueryIntegrationTests (#11153)
Port the fix from #11129 to the other store-query tests.

Reviewers: John Roesler <vvcephei@apache.org>
2021-08-05 16:39:33 -05:00
Guozhang Wang e5df7fd90b
MINOR: Should commit a task if the consumer position advanced as well (#11151)
Reviewers: John Roesler <vvcephei@apache.org>
2021-08-05 13:41:46 -07:00
Walker Carlson d414de3779
MINOR: use relative counts for restores (#11176)
Use a relative count from using 0 for totalNumbRestores to prevent flakiness.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-08-05 00:32:25 -07:00
Kamal Chandraprakash a103c95a31
KAFKA-12724: Add 2.8.0 to system tests and streams upgrade tests. (#10602)
Also adjusted the acceptable recovery lag to stabilize Streams tests.

Reviewers: Justine Olshan <jolshan@confluent.io>, Matthias J. Sax <mjsax@apache.org>, John Roesler <vvcephei@apache.org>
2021-08-04 17:31:10 -05:00
dengziming 4eb72add11
MINOR: Replace EasyMock with Mockito in test-utils module (#11157)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-08-03 17:26:46 -07:00
A. Sophie Blee-Goldman 5d52de2ccf
KAFKA-12648: minor followup from Pt. 2 and some new tests (#11146)
Addresses the handful of remaining feedback from Pt. 2, plus adds two new tests: one verifying a multi-topology application with a FKJ and its internal topics, another to make sure IQ works with named topologies (though note that there is a bit more work left for IQ to be fully supported, will be tackled after Pt. 3

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2021-07-30 10:56:57 -07:00
Patrick Stuedi 22541361b7
Add support for infinite endpoints for range queries (#11120)
Add support to open endpoint range queries in key-value stores

Implements: KIP-763

Reviewers: Almog Gavra <almog@confluent.io>, Luke Chen <showuon@gmail.com>, John Roesler <vvcephei@apache.org>
2021-07-29 21:52:16 -05:00
Patrick Stuedi c074b67395
Fix for flaky test in StoreQueryIntegrationTest (#11129)
Fix a bug in StoreQueryIntegrationTest::shouldQueryOnlyActivePartitionStoresByDefault that causes the test to fail in the case of a client rebalancing. The changes in this PR make sure the test keeps re-trying after a rebalancing operation, instead of failing.

Reviewers: Luke Chen <showuon@gmail.com>, John Roesler <vvcephei@apache.org>
2021-07-29 16:26:00 -05:00
Luke Chen a2aa3b9ea5
close TopologyTestDriver to release resources (#11143)
Close TopologyTestDriver to release resources

Reviewers: Bill Bejeck <bbejeck@apache.org>
2021-07-29 15:50:57 -04:00
Jorge Esteban Quilcate Otoya 3190ebd1e6
KAFKA-10542: Migrate KTable mapValues, passthrough, and source to new Processor API (#11099)
As part of the migration of KStream/KTable operations to the new Processor API (KAFKA-8410), this PR includes the migration of KTable:
* mapValues,
* passthrough,
* and source operations.

Reviewers: John Roesler <vvcephei@apache.org>
2021-07-28 20:58:15 -05:00
Walker Carlson 58402a6fe8
MINOR: add helpful error message (#11139)
I noticed that replace thread actions would not be logged unless the user added a log in the handler. I think it would be very useful for debugging.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-07-28 11:38:31 -07:00
A. Sophie Blee-Goldman 4710a49146
KAFKA-12648: Pt. 2 - Introduce TopologyMetadata to wrap InternalTopologyBuilders of named topologies (#10683)
Pt. 1: #10609
Pt. 2: #10683
Pt. 3: #10788

The TopologyMetadata is next up after Pt. 1 #10609. This PR sets up the basic architecture for running an app with multiple NamedTopologies, though the APIs to add/remove them dynamically are not implemented until Pt. 3

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2021-07-28 11:18:56 -07:00
Luke Chen 818cbfba6d
KAFKA-13125: close KeyValueIterator instances in internals tests (part 2) (#11107)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-07-26 16:26:02 -07:00
Luke Chen ded66d92a4
KAFKA-13124: close KeyValueIterator instance in internals tests (part 1) (#11106)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-07-26 16:25:22 -07:00
Luke Chen f9aeebed05
KAFKA-13123: close KeyValueIterator instances in example code and tests (#11105)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-07-26 16:23:04 -07:00
A. Sophie Blee-Goldman 246a8afb63
MINOR: factor state checks into descriptive methods and clarify javadocs (#11123)
Just a bit of minor cleanup that (a) does some prepwork for another PR I'm working on, (b) updates the javadocs & exception messages to report a more useful error to the user and describe what they actually need to do, and (c) hopefully makes these state checks more future-proof by defining methods for each kind of check in one place that can be easily updated instead of tracking down every individual check.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Luke Chen <showuon@gmail.com>
2021-07-26 15:53:30 -07:00
A. Sophie Blee-Goldman 8b1eca1c58
KAFKA-13126: guard against overflow when computing `joinGroupTimeoutMs` (#11111)
Setting the max.poll.interval.ms to MAX_VALUE causes overflow when computing the joinGroupTimeoutMs and results in the JoinGroup timeout being set to the request.timeout.ms instead, which is much lower.

This can easily make consumers drop out of the group, since they must rejoin now within 30s (by default) yet have no obligation to almost ever call poll() given the high max.poll.interval.ms, especially when each record takes a long time to process or the `max.poll.records` is also very large. We just need to check for overflow and fix it to Integer.MAX_VALUE when it occurs.

Reviewers: Luke Chen <showuon@gmail.com>, John Roesler <vvcephei@apache.org>
2021-07-23 16:22:41 -07:00
A. Sophie Blee-Goldman 7fbc6b73aa
KAFKA-13021: clarify KIP-633 javadocs and address remaining feedback (#11114)
There were a few followup things to address from #10926, most importantly a number of updates to the javadocs. Also includes a few missing verification checks.

Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Israel Ekpo
2021-07-23 16:14:37 -07:00
A. Sophie Blee-Goldman 03edcdd972
KAFKA-13128: wait for all keys to be fully processed in #shouldQueryStoresAfterAddingAndRemovingStreamThread (#11113)
This test is flaky due to waiting on all records to be processed for only a single key before issuing IQ lookups and asserting whether data was found.

Reviewers:  Phil Hardwick, Walker Carlson <wcarlson@confluent.io>
2021-07-23 14:56:46 -07:00
A. Sophie Blee-Goldman d99562a145
HOTFIX: Set session interval back to 10s for StreamsCooperativeRebalanceUpgradeTest (#11103)
This test is hitting pretty frequent timeouts after bouncing a node and waiting for it to come back and fully rejoin the group. It seems to now take 45s for the initial JoinGroup to succeed, which I suspect is due to the new default session.interval.ms (which was recently changed to 45s). Let's try fixing this config to the old value of 10s and see if that helps it rejoin in time.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2021-07-22 19:27:07 +02:00
Phil Hardwick 02dc615c1e
KAFKA-13096: Ensure queryable store providers is up to date after adding stream thread (#10921)
When a new thread is added the queryable store providers continues to use the store providers it was given when KafkaStreams was instantiated. This means IQ will start performing lookups against an out-of-date list of threads, and may eventually become completely broken. We must make sure the QueryableStoreProvider is updated when threads are added and removed.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-21 13:23:15 -07:00
leah 89286668eb
MINOR: add serde configs to properly set serdes in failing StreamsStaticMembershipTest (#11093)
After changing the default serde to be null, some system tests started failing. This test didn't explicitly pass in a serde and didn't set the default config so when the test was trying to setup the source node it wasn't able to find any config to use and threw a config exception.

 Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@confluent.io>
2021-07-21 12:54:59 -07:00
Luke Chen ad59e3b622
MINOR: update doc to reflect the grace period change (#11100)
We removed default 24 hours grace period in KIP-633, and deprecate some grace methods, but we forgot to update the stream docs.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2021-07-21 13:26:41 +02:00
Bruno Cadonna 9b3687e0ac
HOTFIX: Modify system test config to reduce time to stable task assignment. (#11090)
Currently, we verify the startup of a Streams client by checking the transition
from REBALANCING to RUNNING and if the client processed some records
in the EOS system test. However, if the Streams client only
has standby tasks assigned as it can happen if the client is catching 
up by using warm-up replicas, the client will never process
records within the timeout of the startup verification. Hence, the test 
will fail although everything is fine. This commit fixes this by reducing
the time to the next probing rebalance and by increasing the number of 
max warm-up replicas. In such a way, the catch up of the client and the 
following processing of records should still be within the startup verification 
timeout of the client.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-21 07:58:14 +02:00
Walker Carlson eeb788d1b9
KAFKA-13010: Retry getting tasks incase of rebalance for TaskMetadata tests (#11083)
If there is a cooperative rebalance the tasks might not be assigned to a thread at all for a very short timeframe, causing this test to fail. We can just retry getting the metadata until the group has finished rebalancing and all tasks are assigned

Reviewers: Bruno Cadonna <cadonna@apache.org>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Josep Prat <josep.prat@aiven.io>
2021-07-20 14:56:07 -07:00
CHUN-HAO TANG 47a0974f5a
KAFKA-13082: Replace EasyMock with Mockito for ProcessorContextTest (#11045)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-07-20 11:28:11 -07:00
Matthias J. Sax 3e38038278
HOTFIX: Init stream-stream left/outer join emit interval correctly (#11055)
Follow up to #10917

The fix from #10917 intended to reduce the emit frequency to save the creation cost of RocksDB iterators. However, we incorrectly initialized the "timer" with timestamp zero, and thus, the timer was always in the past and we did try to emit left/outer join result too often.

This PR fixes the initialization of the emit interval timer to current wall-clock time to effectively 'enable' the fix from #10917.

Reviewers: Sergio Peña <sergio@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-07-16 13:30:19 -07:00
Guozhang Wang 13b2df733a
MINOR: Default GRACE with Old API should set as 24H minus window-size / inactivity-gap (#10953)
In 2.8 and before, we computed the default grace period with Math.max(maintainDurationMs - sizeMs, 0); in method gracePeriodMs() in TimeWindows, SessionWindows, and JoinWindows. That means that the default grace period has never been 24 hours but 24 hours - window size. Since gracePeriodMs() is used to compute the retention time of the changelog topic for the corresponding window state store and the segments for the window state store it is important to keep the same computation for the deprecated methods. Otherwise, Streams app that run with 2.8 and before might not be compatible with Streams 3.0 because the retention time of the changelog topics created with older Streams apps will be smaller than the assumed retention time for Streams apps in 3.0. For example, with a window size of 10 hours, an old Streams app would have created a changelog topic with retention time 10 hours (window size) + 14 hours (default grace period, 24 hours - 10 hours). A 3.0 Streams app would assume a retention time of 10 hours (window size) + 24 hours (deprecated default grace period as currently specified on trunk). In the presence of failures, where a state store needs to recreated, records might get lost, because before the failure the state store of a 3.0 Streams app contained 10 hours + 24 hours of records whereas the changelog topic that was created with the old Streams app would only contain 10 hours + 14 hours of records.

All this happened due to us always stating that the default grace period was 24 hours although it was not completely correct and a connected and unfortunate misunderstanding when we removed deprecated windows APIs (#10378).

Co-authors: Bruno Cadonna <cadonna@apache.org>
Reviewers: Luke Chen <showuon@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <cadonna@apache.org>
2021-07-16 11:22:26 -07:00
Guozhang Wang 3e3264760b
KAFKA-10847: Remove internal config for enabling the fix (#10941)
Also update the upgrade guide indicating about the grace period KIP and its indication on the fix with throughput impact.

Reviewers: Luke Chen <showuon@gmail.com>, Matthias J. Sax <mjsax@apache.org>
2021-07-15 10:58:15 -07:00
vamossagar12 f413435585
KAFKA-12925: adding presfixScan operation for missed implementations (#10877)
The new prefixScan API may still throw UnsupportedVersionOperationException due to some missing implementations in vast store hierarchy of Streams, this PR adds those missing overrides and expands the test coverage.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-14 15:55:50 -07:00
John Gray 5e5d5bff3b
KAFKA-13037: "Thread state is already PENDING_SHUTDOWN" log spam
Demote this from INFO to debug since it's absolutely spamming the logs

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-13 21:00:23 -07:00
CHUN-HAO TANG f3017683a2
KAFKA-13075: Consolidate RocksDBStoreTest and RocksDBKeyValueStoreTest (#11034)
Consolidate the RocksDBStoreTest and RocksDBKeyValueStoreTest files into a single test class for the RocksDBStore.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-13 13:34:40 -07:00
John Roesler a08e0cfe65
KAFKA-8410: Update the docs to reference the new PAPI (#10994)
Reviewers: Jim Galasyn <jim.galasyn@confluent.io>, Luke Chen <showuon@gmail.com>, Matthias J. Sax <mjsax@apache.org>
2021-07-13 10:23:50 -05:00
John Roesler bfdef11b97
KAFKA-12360: Document new time semantics (#11003)
Update the docs for task idling, since the semantics have
changed in 3.0.

Reviewers: Jim Galasyn <jim.galasyn@confluent.io>, Luke Chen <showuon@gmail.com>, Boyang Chen <boyang@apache.org>
2021-07-12 16:16:29 -05:00
Bruno Cadonna 332db13047
HOTFIX: Fix verification of version probing (#10943)
Fixes and improves version probing in system test test_version_probing_upgrade().
2021-07-12 18:50:25 +02:00
A. Sophie Blee-Goldman 66e8b8b413
MINOR: StreamsPartitionAssignor should log the individual members of each client (#10996)
Log the specific StreamThreads participating in the rebalance for each client in the Streams application

Reviewers: Walker Carlson <wcarlson@confluent.io>, John Roesler <vvcephei@apache.org>
2021-07-08 11:11:39 -07:00
Konstantine Karantasis d2a05d71c0
Bump trunk to 3.1.0-SNAPSHOT (#10981)
Typical version bumps on trunk following the creation of the 3.0 release branch.

Reviewer: Randall Hauch <rhauch@gmail.com>
2021-07-06 14:28:13 -07:00
dengziming 08757d0d19
MINOR: Add default serde in stream test to fix QA ERROR (#10958)
We changed the default serde in Streams to be null in #10813, but forgot to add some in tests, for example TestTopicsTest and TopologyTestDriverTest.

Reviewers: David Jacot <djacot@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2021-07-02 20:02:20 +02:00
Matthias J. Sax a095e1fd8c
KAFKA-10847: improve throughput of stream-stream join with spurious left/outer join fix (#10917)
The fix to avoid spurious left/outer stream-stream join results, showed
very low throughput for RocksDB, due to excessive creation of iterators.
Instead of trying to emit left/outer stream-stream join result for every
input record, this PR adds tracking of the lower timestamp bound of
left/outer join candidates, and only tries to emit them (and create an
iterator) if they are potentially old enough.

Reviewers: Luke Chen <showuon@gmail.com>, Guozhang Wang <guozhang@confluent.io>, Sergio Peña <sergio@confluent.io>
2021-07-01 15:46:22 -07:00
leah 4fd71a7ef1
KAFKA-9559: Change default serde to be `null` (#10813)
Implements KIP-741

Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-07-01 14:58:29 -07:00
Justine Olshan cee2e975d1
KAFKA-13011; Update deleteTopics Admin API (#10892)
This patch adds two new apis to support topic deletion using topic IDs or names. It uses a new class `TopicCollection` to keep a collection of topics defined either by names or IDs. Finally, it modifies `DeleteTopicsResult` to support both names and IDs and deprecates the old methods which have become ambiguous. Eventually we will want to deprecate the old `deleteTopics` apis as well, but this patch does not do so.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-06-30 23:20:21 -07:00
Israel Ekpo b3905d9f71
KAFKA-8613: New APIs for Controlling Grace Period for Windowed Operations (#10926)
Implements KIP-633.

Grace-period is an important parameter and its best to make it the user's responsibility to set it expliclity. Thus, we move off to provide a default and make it a mandatory parameter when creating a window.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Luke Chen <showuon@gmail.com>, Matthias J. Sax <matthias@confluent.io>
2021-06-30 17:09:19 -07:00
Bruno Cadonna 19edbda164
Avoid increasing app ID when test is executed multiple times (#10939)
The integration test TaskMetadataIntegrationTest will increase
the length of the app ID when its test methods are called multiple
times in one execution. This is for example the case if you
repeatedly run the test until failure in IntelliJ IDEA. This might
also lead to exceptions because the state directory depends on the
app ID and directory names have a length limit.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-06-30 09:55:27 +02:00
Bruno Cadonna dc6805df73
MINOR: Improve test of log messages for dropped records (#10920)
Reviewers: Luke Chen <showuon@gmail.com>,  Boyang Chen <boyang@apache.org>
2021-06-29 13:00:35 +02:00
Juan Gonzalez-Zurita cfcabc368c
KAFKA-12718: SessionWindows are closed too early (#10824)
Session windows should not be close directly when "window end" time is reached, but "window close" time should be "window-end + gap + grace-period".

Reviewer: Matthias J. Sax <matthias@confluent.io>
2021-06-28 15:39:49 -07:00
Matthias J. Sax 2540b77769
KAFKA-12909: add missing tests (#10893)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2021-06-28 15:32:16 -07:00
Matthias J. Sax 670630ae5b
KAFKA-12951: restore must terminate for tx global topic (#10894)
Reviewers: Guozhang Wang <guozhang@confluent.io>, Luke Chen <showuon@gmail.com>, Gasparina Damien <d.gasparina@gmail.com>
2021-06-28 14:10:25 -07:00
Josep Prat 6655a09e99
KAFKA-12849: KIP-744 TaskMetadata ThreadMetadata StreamsMetadata as API (#10840)
Implementation of KIP-744.

Creates new Interfaces for TaskMetadata, ThreadMetadata, and
StreamsMetadata, providing internal implementations for each of them.

Deprecates current TaskMetadata, ThreadMetadata under o.a.k.s.processor,
and SreamsMetadata under a.o.k.s.state.

Updates references on internal classes from deprecated classes to new interfaces.

Deprecates methods on KafkaStreams returning deprecated ThreadMeatada and
StreamsMetadata, and provides new ones returning the new interfaces.

Update Javadocs referencing to deprecated classes and methods to point
to the right ones.

Co-authored-by: Bruno Cadonna <cadonna@apache.org>

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@apache.org>
2021-06-25 18:31:49 +02:00
Geordie b0cfd1f4ca
KAFKA-12336 Custom stream naming does not work while calling stream[K… (#10190)
Custom stream naming does not work while calling stream[K, V](topicPattern: Pattern)

Reviewers: Bill Bejeck <bbejeck@apache.org>
2021-06-24 12:07:22 -04:00
Lee Dongjin 7da881ffbe
KAFKA-12928: Add a check whether the Task's statestore is actually a directory (#10862)
Throw an exception if a state directory exists as a regular file

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Luke Chen <showuon@gmail.com>
2021-06-22 16:35:30 -07:00
John Roesler c3475081c5
KAFKA-10546: Deprecate old PAPI (#10869)
* Deprecate the old Processor API
* Suppress warnings on all internal usages of the old API
  (which will be migrated in other child tickets of KAFKA-8410)
* Add new KStream#process methods, since KAFKA-10603 has not seen any action.
2021-06-22 09:17:11 -05:00
Bruno Cadonna 2ad9350cc1
MINOR: Remove obsolete variables for metric sensors (#10912)
This is a clean-up that we missed for "KIP-743: Remove config value 0.10.0-2.4 of Streams built-in metrics version config"

Reviewers: Luke Chen <showuon@gmail.com>, Matthias J. Sax <mjsax@apache.org>
2021-06-22 14:10:28 +02:00
Bruno Cadonna b38cadca23
MINOR: Remove log warning for RocksDB 6+ upgrade (#10911)
Reviewers: Boyang Chen <boyang@apache.org>
2021-06-22 13:17:39 +02:00
Ismael Juma d27a84f70c
KAFKA-12945: Remove port, host.name and related configs in 3.0 (#10872)
They have been deprecated since 0.10.0. Full list of removes configs:
* port
* host.name
* advertised.port
* advertised.host.name

Also adjust tests to take the removals into account. Some tests were
no longer relevant and have been removed.

Finally, took the chance to:
* Clean up unnecessary usage of `KafkaConfig$.MODULE$` in
related files.
* Add missing `Test` annotations to `AdvertiseBrokerTest` and
make necessary changes for the tests to pass.

Reviewers: David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>
2021-06-17 05:32:34 -07:00
dengziming 1a16fc139a
KAFKA-10437: Update WordCount examples to use new PAPI (#10701)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-06-16 22:24:11 -07:00
Matthias J. Sax 96767a60db
KAFKA-12909: disable spurious left/outer stream-stream join fix for old JoinWindows API (#10861)
We changed the behavior of left/outer stream-stream join via KAFKA-10847.
To avoid a breaking change during an upgrade, we need to disable this
fix by default.

We only enable the fix if users opt-in expliclity by changing their
code. We leverage KIP-633 (KAFKA-8613) that offers a new JoinWindows
API with mandatory grace-period to enable the fix.

Reviewers: Sergio Peña <sergio@confluent.io>, Israel Ekpo <israelekpo@gmail.com>, Guozhang Wang <guozhang@confluent.io>
2021-06-16 09:25:16 -07:00
Colin Patrick McCabe 135de5801e
KAFKA-12877: Make flexibleVersions mandatory (#10804)
Many Kafka protocol JSON files were accidentally configured to not use
flexible versions, since it was not on by default.  This PR requires
JSON files to specify a flexibleVersions value. If the JSON file does
not specify the flexibleVersions value, display an error message
suggesting the correct value to use for new messages.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-06-15 16:04:30 -07:00
Matthias J. Sax 01967e48a2
KAFKA-12914: StreamSourceNode should return `null` topic name for pattern subscription (#10846)
Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-06-15 00:59:55 -07:00
John Roesler 987391958d
MINOR: enable EOS during smoke test IT (#10870)
This IT has been failing on trunk recently. Enabling EOS during the integration test
makes it easier to be sure that the test's assumptions are really true during verification
and should make the test more reliable.

I also noticed that in the actual system test file, we are using the deprecated property
name "beta" instead of "v2".

Reviewers: Boyang Chen <boyang@apache.org>
2021-06-13 21:35:02 -05:00
Luke Chen 4724083a32
KAFKA-8940: decrease session timeout to make test faster and reliable (#10871)
While there might still be some issue about the test as described here by @ableegoldman , but I found the reason why this test failed quite frequently recently. It's because we increased the session timeout to 45 sec in KIP-735.

The reason why increasing session timeout affected this test is because in this test, we will keep adding new stream clients and remove old one, to maintain only 3 stream clients alive. The problem here is, when old stream closed, we won't trigger rebalance immediately due to the stream clients are all static members as described in KIP-345, which means, we will trigger trigger group rebalance only when session.timeout expired. That said, when old client closed, we'll have at least 45 sec with some tasks not working.

Also, in this test, we have 2 timeout conditions to fail this test before verification passed:

1. 6 minutes timeout
2. polling 30 times (each with 5 seconds) without getting any data. (that is, 5 * 30 = 150 sec without consuming any data)

For (1), in my test under 45 session timeout, we'll create 8 stream clients, which means, we'll have 5 clients got closed. And each closed client need 45 sec to trigger rebalance, so we'll have 45 * 5 = 225 sec (~4 mins) of the time having some tasks not working.
For (2), during new client created and old client closed, it need some time to do rebalance. With 45 session timeout, we only got ~100 sec left. In slow jenkins env, it might reach the 30 retries without getting any data timeout.

Therefore, decreasing session timeout can make this test completes faster and more reliable.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-06-13 09:49:05 -07:00
Josep Prat 787b4fe955
MINOR: clean up unneeded `@SuppressWarnings` (#10855)
Reviewers: Luke Chen <showuon@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2021-06-13 19:00:14 +08:00
Luke Chen 8eecb91419
KAFKA-9295: revert session timeout to default value (#10736)
Revert the hard-coded increased session timeout now that the default is 45s

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-06-10 13:57:25 -07:00
wycccccc 69d507590e
KAFKA-12924 Replace EasyMock and PowerMock with Mockito in streams metrics tests (#10850)
Reviewers: John Roesler <vvcephei@apache.org>, Ismael Juma <ijuma@apache.org>
2021-06-10 15:21:46 -05:00
Lee Dongjin 7cd09b6a14
KAFKA-10585: Kafka Streams should clean up the state store directory from cleanup (#9414)
1. Update StateDirectory#clean
  - Delete application's statestore directory in cleanup process if it is empty.
2. Add Tests
  - StateDirectoryTest#shouldDeleteAppDirWhenCleanUpIfEmpty: asserting the empty application directory is deleted with StateDirectory#clean.
  - StateDirectoryTest#shouldNotDeleteAppDirWhenCleanUpIfNotEmpty: asserting the non-empty application directory is not deleted with StateDirectory#clean and appropriate log message is generated.
  - Add Integration test: StateDirectoryIntegrationTest
3. Improve EOSUncleanShutdownIntegrationTest: test all available cases regarding cleanup process on unclean shutdown.

Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <guozhang@apache.org>
2021-06-10 14:59:57 -05:00
Josep Prat d496103864
MINOR: Small optimizations and removal of unused code in Streams (#10856)
Remove unused methods in internal classes
Mark fields that can be final as final
Remove unneeded generic type annotation
Convert single use fields to local final variables
Use method reference in lambdas when it's more readable

Reviewers: Matthias J. Sax <mjsax@apache.org>, Bruno Cadonna <cadonna@apache.org>
2021-06-10 16:51:31 +02:00
wycccccc 8b6752d183
KAFKA-12905: Replace EasyMock and PowerMock with Mockito for NamedCacheMetricsTest (#10835)
* Development of EasyMock and PowerMock has stagnated while Mockito continues to be actively developed. With the new Java cadence, it's a problem to depend on libraries that do bytecode generation and are not actively maintained. In addition, Mockito is also easier to use.KAFKA-7438

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2021-06-10 12:18:19 +02:00
Matthias J. Sax 953ec98100
MINOR: Improve Kafka Streams JavaDocs with regard to record metadata (#10810)
Reviewers: Luke Chen <howuon@gmail.com>, Josep Prat <josep.prat@aiven.io>, John Roesler <john@confluent.io>
2021-06-09 22:51:36 -07:00
Jason Gustafson a75b5c635b
KAFKA-12874; Increase default consumer session timeout to 45s (#10803)
This patch increases the default consumer session timeout to 45s as documented in KIP-735: https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout.

Reviewers: Luke Chen <showuon@gmail.com>, David Arthur <mumrah@gmail.com>, David Jacot <djacot@confluent.io>
2021-06-09 15:09:31 -07:00
Matthias J. Sax b5f7ce8b7b
KAFKA-12815: Update JavaDocs of ValueTransformerWithKey (#10731)
Reviewers: Luke Chen <howuon@gmail.com>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-06-07 22:04:33 -07:00
A. Sophie Blee-Goldman 48379bd6e5
KAFKA-12648: Pt. 1 - Add NamedTopology to protocol and state directory structure (#10609)
This PR includes adding the NamedTopology to the Subscription/AssignmentInfo, and to the StateDirectory so it can place NamedTopology tasks within the hierarchical structure with task directories under the NamedTopology parent dir.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-06-07 15:38:12 -07:00
Guozhang Wang b2d463aa12
KAFKA-8897 Follow-up: Consolidate the global state stores (#10646)
1. When register state stores, add the store to globalStateStores before calling any blocking calls that may throw errors, so that upon closing we would close the stores as well.
2. Remove the other list as a local field, and call topology.globalStateStores when needed to get the list.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2021-06-04 08:30:14 -07:00
Viswanathan Ranganathan 93dca8ebd9
KAFKA-12749: Changelog topic config on suppressed KTable lost (#10664)
Refactored logConfig to be passed appropriately when using shutDownWhenFull or emitEarlyWhenFull. Removed the constructor that doesn't accept a logConfig parameter so you're forced to specify it explicitly, whether it's empty/unspecified or not.

Co-authored-by: Bruno Cadonna <cadonna@apache.org>

Reviewers: Walker Carlson <wcarlson@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2021-06-03 20:00:19 +02:00
Vito Jeng c2c08b41f2
MINOR: Apply try-with-resource to KafkaStreamsTest (#10668)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Matthias J. Sax <matthias@confluent.io>
2021-06-02 21:13:09 -07:00
Bruno Cadonna cfe642edee
KAFKA-12519: Remove built-in Streams metrics for versions 0.10.0-2.4 (#10765)
As specified in KIP-743, this PR removes the built-in metrics
in Streams that are superseded by the refactoring proposed in KIP-444.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Luke Chen <showuon@gmail.com>
2021-06-01 14:05:08 +02:00
John Roesler f207bac20c
KAFKA-8410: KTableProcessor migration groundwork (#10744)
* Lay the groundwork for migrating KTable Processors to the new PAPI.
* Migrate the KTableFilter processor to prove that the groundwork works.

This is an effort to help break up #10507 into multiple PRs.

Reviewers: Boyang Chen <boyang@apache.org>
2021-05-28 14:59:35 -05:00
Ismael Juma e4b3a3cdeb
MINOR: Adjust parameter ordering of `waitForCondition` and `retryOnExceptionWithTimeout` (#10759)
New parameters in overloaded methods should appear later apart from
lambdas that should always be last.
2021-05-27 06:25:00 -07:00
Josep Prat a02b19cb77
KAFKA-12796: Removal of deprecated classes under streams-scala (#10710)
Removes previously deprecated methods in older KIPs

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-05-27 11:30:15 +02:00
A. Sophie Blee-Goldman bbe170af70
MINOR: deprecate TaskMetadata constructor and add KIP-740 notes to upgrade guide (#10755)
Quick followup to KIP-740 to actually deprecate this constructor, and update the upgrade guide with what we changed in KIP-740. I also noticed the TaskId#parse method had been modified previously, and should be re-added to the public TaskId class. It had no tests, so now it does

Reviewers: Matthias J. Sax <mjsax@confluent.io>, Luke Chen <showuon@gmail.com>
2021-05-26 10:35:12 -07:00
Matthias J. Sax 77573d88c8
MINOR: add window verification to sliding-window co-group test (#10745)
Reviewers: Luke Chen <showuon@gmail.com>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-05-25 23:43:33 -07:00
Luke Chen efb7cda178
MINOR: update java doc for deprecated methods (#10722)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-05-23 18:33:01 -07:00
Ismael Juma 47796d2f87
MINOR: Fix deprecation warnings in SlidingWindowedCogroupedKStreamImplTest (#10703)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-05-22 14:22:42 -07:00
Boyang Chen ae8b784537
KAFKA-12499: add transaction timeout verification (#10482)
This PR tries to add the check for transaction timeout for a comparison against commit interval of streams. If transaction timeout is smaller than commit interval, stream should crash and inform user to update their commit interval to be larger or equal to the given transaction timeout, or vise versa.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-05-21 15:05:39 -07:00
Josep Prat b46e17b1d7
KAFKA-12808: Remove Deprecated Methods under StreamsMetrics (#10724)
Removal of methods already deprecated since 2.5.
Adapt test to use the new alternative method.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-05-21 12:37:35 +02:00
A. Sophie Blee-Goldman b56d9e4416
KAFKA-12779: KIP-740, Clean up public API in TaskId and fix TaskMetadata#taskId() (#10735)
As described in KIP-740, we clean up the public TaskId class and introduce new APIs to return it from TaskMetadata

Reviewers: Guozhang Wang <guozhang@confluent.io>
2021-05-20 15:01:23 -07:00
Josep Prat 26b5352260
KAFKA-12814: Remove Deprecated Method StreamsConfig getConsumerConfigs (#10737)
Removes method deprecated via KIP-276.

Reviewer: Matthias J. Sax <matthias@confluent.io>
2021-05-20 14:32:51 -07:00
Josep Prat e23ede1ece
KAFKA-12809: Remove deprecated methods of Stores factory (#10729)
Removes methods deprecated via KIP-319 and KIP-358.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-05-19 16:07:35 -07:00
Josep Prat 0af37730fc
KAFKA-12813: Remove deprecated schedule method in ProcessorContext (#10730)
Removes methods deprecated via KIP-358.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-05-19 16:03:01 -07:00
Matthias J. Sax 476eccb968
KAFKA-12815: Preserve context for KTable.transformValues when getting value from upstream state store (#10720)
Reviewers: Victoria Xia <victoria.xia@confluent.io>, John Roesler <john@confluent.io>
2021-05-19 14:58:46 -07:00
Josep Prat b58da356be
KAFKA-12810: Remove deprecated TopologyDescription.Source#topics (#10727)
Removes methods that were deprecated via KIP-321.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-05-19 11:49:26 -07:00
Luke Chen e11f249327
KAFKA-9295: increase session timeout to fix flaky KTableKTableForeignKeyInnerJoinMultiIntegrationTest (#10715)
Increase session timeout to fix flaky KTableKTableForeignKeyInnerJoinMultiIntegrationTest.shouldInnerJoinMultiPartitionQueryable

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-18 23:00:43 -07:00
A. Sophie Blee-Goldman 3a42baa260
HOTFIX: undo renaming of public part of Subtopology API (#10713)
In #10676 we renamed the internal Subtopology class that implemented the TopologyDescription.Subtopology interface. By mistake, we also renamed the interface itself, which is a public API. This wasn't really the intended point of that PR, so rather than do a retroactive KIP, let's just reverse the renaming.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-05-18 12:59:20 -07:00
A. Sophie Blee-Goldman cc6f4c49a9
KAFKA-12574: remove internal Producer config and auto downgrade logic (#10675)
Minor followup to #10573. Removes this internal Producer config which was only ever used to avoid a very minor amount of work to downgrade the consumer group metadata in the txn commit request in Kafka Streams

Reviewers: Ismael Juma <ismael@juma.me.uk>, Matthias J. Sax <mjsax@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-05-17 10:25:35 -07:00
vamossagar12 b9acc492a5
KAFKA-12313: KIP-725: Streamlining configs for Windowed Deserialisers (#10542)
This PR aims to streamline the configurations for WindowedDeserialisers as described in KIP-725. It deprecates default.windowed.key.serde.inner and default.windowed.value.serde.inner configs in StreamConfig and adds windowed.inner.class.serde. 

Reviewers: Anna Sophie Blee-Goldman<ableegoldman@apache.org>
2021-05-17 10:17:31 -07:00
Walker Carlson f2785f3c4f
KAFKA-12754: Improve endOffsets for TaskMetadata (#10634)
Improve endOffsets for TaskMetadata by updating immediately after polling a new batch

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-14 12:17:31 -07:00
A. Sophie Blee-Goldman 4153e754f1
MINOR: prevent cleanup() from being called while Streams is still shutting down (#10666)
Currently KafkaStreams#cleanUp only throw an IllegalStateException if the state is RUNNING or REBALANCING, however the application could be in the process of shutting down in which case StreamThreads may still be running. We should also throw if the state is PENDING_ERROR or PENDING_SHUTDOWN

Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-05-13 16:16:35 -07:00
Daniyar Yeralin 6d1ae8bc00
KAFKA-8326: Introduce List Serde (#6592)
Introduce List serde for primitive types or custom serdes with a serializer and a deserializer according to KIP-466

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias J. Sax <mjsax@conflunet.io>, John Roesler <roesler@confluent.io>, Michael Noll <michael@confluent.io>
2021-05-13 15:54:00 -07:00
A. Sophie Blee-Goldman 4b2736570c
KAFKA-12648: MINOR - Add TopologyMetadata.Subtopology class for subtopology metadata (#10676)
Introduce a Subtopology class to wrap the topicGroupId and namedTopology metadata.

Reviewers: Walker Carlson <wcarlson@confluent.io>
2021-05-13 11:25:18 -07:00
Vito Jeng fae0784ce3
KAFKA-5876: KIP-216 Part 4, Apply InvalidStateStorePartitionException for Interactive Queries (#10657)
KIP-216, part 4 - apply InvalidStateStorePartitionException

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-10 17:29:58 -07:00
Guozhang Wang 25f4ee879c
KAFKA-12747: Fix flakiness in shouldReturnUUIDsWithStringPrefix (#10643)
Consecutive UUID generation could result in same prefix.

Reviewers: Josep Prat <josep.prat@aiven.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-10 12:32:51 -07:00
Jorge Esteban Quilcate Otoya 8f8f914efc
KAFKA-12536: Add Instant-based methods to ReadOnlySessionStore (#10390)
Implements: KIP-666 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-666%3A+Add+Instant-based+methods+to+ReadOnlySessionStore)

Reviewers: John Roesler <vvcephei@apache.org>
2021-05-07 13:24:41 -05:00
Sergio Peña 45d7440c15
KAFKA-10847: Set StreamsConfig on InternalTopologyDriver before writing topology (#10640)
Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2021-05-06 17:27:23 -07:00
Bruno Cadonna 90fc875e24
KAFKA-8897: Upgrade RocksDB to 6.19.3 (#10568)
This PR upgrades RocksDB to 6.19.3. After the upgrade the Gradle build exited with code 134 due to SIGABRT signals ("Pure virtual function called!") coming from the C++ part of RocksDB. This error was caused by RocksDB state stores not properly closed in Streams' code. This PR adds the missing closings and updates the RocksDB option adapter.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2021-05-06 15:29:26 -07:00
Jorge Esteban Quilcate Otoya 12a1e68aeb
KAFKA-12451: Remove deprecation annotation on long-based read operations in WindowStore (#10296)
Complete https://cwiki.apache.org/confluence/display/KAFKA/KIP-667%3A+Remove+deprecated+methods+from+ReadOnlyWindowStore by removing deprecation annotation on long-based read operations in WindowStore.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-05-06 11:25:23 -07:00
Bruno Cadonna 94be57d610
MINOR: Fix formatting in RelationalSmokeTest (#10639)
Fixes formatting in RelationalSmokeTest.

Reviewers: Leah Thomas <lthomas@confluent.io>
2021-05-06 17:25:02 +02:00
leah 03690d7a1f
MINOR: Stop using hamcrest in system tests (#10631)
We currently use hamcrest imports to check the outputs of the RelationalSmokeTest, but with the new gradle updates, the proper hamcrest imports are no longer included in the test jar.

This is a bit of a workaround to remove the hamcrest usage so we can get system tests up and running again. Potential follow-up could be to update the way we create the test-jar to pull in the proper dependencies.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-05-06 12:05:36 +02:00
Luke Chen 80aea23beb
KAFKA-9295: increase startup timeout for flaky test in KTableKTableForeignKeyInnerJoinMultiIntegrationTest (#10635)
Try to address the extreme flakiness of shouldInnerJoinMultiPartitionQueryable since the recent test cleanup. Since we need to wait for 3 streams reach RUNNING state, it makes sense to increase the waiting time to make the test more reliable.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-05 21:58:27 -07:00
Matthias J. Sax 6a5992a814
KAFKA-8531: Change default replication factor config (#10532)
Implements KIP-733

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2021-05-05 16:16:06 -07:00
Sergio Peña d915ce58d2
KAFKA-10847: Set shared outer store to an in-memory store when in-memory stores are supplied (#10613)
When users supply in-memory stores for left/outer joins, then the internal shared outer store must be switch to in-memory store too. This will allow users who want to keep all stores in memory to continue doing so.

Added unit tests to validate topology and left/outer joins work fine with an in-memory shared store.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-05-05 10:21:43 -07:00
vamossagar12 9a71468cb0
KAFKA-10767: Adding test cases for all, reverseAll and reverseRange for ThreadCache (#9779)
The test cases for ThreaCache didn't have the corresponding unit tests for all, reverseAll and reverseRange methods. This PR aims to add the same.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-05-05 12:26:51 +02:00
Jorge Esteban Quilcate Otoya 45f24c4195
KAFKA-12450: Remove deprecated methods from ReadOnlyWindowStore (#10294)
Implement first part of https://cwiki.apache.org/confluence/display/KAFKA/KIP-667%3A+Remove+deprecated+methods+from+ReadOnlyWindowStore.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-05-04 09:23:30 -07:00
Sergio Peña 62221edaff
KAFKA-10847: Add internal flag to disable KAFKA-10847 fix (#10612)
Adds an internal flag that can be used to disable the fixes in KAFKA-10847. It defaults to true if the flag is not set or has an invalid boolean value.

The flag is named __enable.kstreams.outer.join.spurious.results.fix__. This flag is considered internal only. It is a temporary flag that will be used to help users to disable the join fixes while they do a transition from the previous semantics of left/outer joins. The flag may be removed in future releases.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-05-03 14:10:05 -07:00
Vito Jeng 816f5c3b86
KAFKA-5876: KIP-216 Part 3, Apply StreamsNotStartedException for Interactive Queries (#10597)
KIP-216 Part 3: Throw StreamsNotStartedException if KafkaStreams state is CREATED

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-03 13:53:35 -07:00
Guozhang Wang bee3cf7d98
MINOR: Remove unused Utils.delete (#10622)
Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-01 08:32:14 -07:00
Guozhang Wang 3ec6317ee6
KAFKA-12683: Remove deprecated UsePreviousTimeOnInvalidTimestamp (#10557)
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-01 08:31:41 -07:00
A. Sophie Blee-Goldman 16b2ce7da7
KAFKA-12648: basic skeleton API for NamedTopology (#10615)
Just the API for NamedTopology.

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2021-04-30 22:46:00 -07:00
Valery Kokorev e454becb33
KAFKA-12396: added null check for state stores key (#10548)
Reviewers: Bruno Cadonna <bruno@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2021-04-29 19:47:26 -07:00
A. Sophie Blee-Goldman 9dbf2226cd
MINOR: clean up some remaining locking stuff in StateDirectory (#10608)
Minor followup to #10342 that I noticed while working on the NamedTopology stuff. Cleans up a few things:

We no longer need locking for the global state directory either, since it's contained within the top-level state directory lock. Definitely less critical than the task directory locking, since it's less vulnerable to IOExceptions given that it's just locked and unlocked once during the application lifetime, but nice to have nonetheless
Clears out misc. usages of the LOCK_FILE_NAME that no longer apply. This has the awesome side effect of finally being able to actually delete obsolete task directories, whereas previously we had to leave behind the empty directory due to a ridiculous Windows bug (though I'm sure they would claim "it's not a bug it's a feature" 😉 )
Lazily delete old-and-now-unused lock files in the StateDirectory#taskDirIsEmpty method to clean up the state directory for applications that upgraded from an older version that still used task locking

Reviewers: Walker Carlson <wcarlson@confluent.io>
2021-04-29 12:30:48 -07:00
Sergio Peña bf359f8e29
KAFKA-10847: Fix spurious results on left/outer stream-stream joins (#10462)
Fixes the issue with https://issues.apache.org/jira/browse/KAFKA-10847.

To fix the above problem, the left/outer stream-stream join processor uses a buffer to hold non-joined records for some time until the window closes, so they are not processed if a join is found during the join window time. If the window of a record closes and a join was not found, then this should be emitted and processed by the consequent topology processor.

A new time-ordered window store is used to temporary hold records that do not have a join and keep the records keys ordered by time. The KStreamStreamJoin has a reference to this new store . For every non-joined record seen, the processor writes it to this new state store without processing it. When a joined record is seen, the processor deletes the joined record from the new state store to prevent further processing.

Records that were never joined at the end of the window + grace period are emitted to the next topology processor. I use the stream time to check for the expiry time for determinism results . The KStreamStreamJoin checks for expired records and emit them every time a new record is processed in the join processor.

The new state store is shared with the left and right join nodes. The new store needs to serialize the record keys using a combined key of <joinSide-recordKey>. This key combination helps to delete the records from the other join if a joined record is found. Two new serdes are created for this, KeyAndJoinSideSerde which serializes a boolean value that specifies the side where the key is found, and ValueOrOtherValueSerde that serializes either V1 or V2 based on where the key was found.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-04-28 17:57:28 -07:00
A. Sophie Blee-Goldman 3805f3706f
KAFKA-12574: KIP-732, Deprecate eos-alpha and replace eos-beta with eos-v2 (#10573)
Deprecates the following 

1. StreamsConfig.EXACTLY_ONCE
2. StreamsConfig.EXACTLY_ONCE_BETA
3. Producer#sendOffsetsToTransaction(Map offsets, String consumerGroupId)

And introduces a new StreamsConfig.EXACTLY_ONCE_V2 config. Additionally, this PR replaces usages of the term "eos-beta" throughout the code with the term "eos-v2"

Reviewers: Matthias J. Sax <mjsax@confluent.io>
2021-04-28 13:22:15 -07:00
JoelWee 0ea440b2af
KAFKA-6435: KIP-623 Add internal topics option to streamResetter (#8923)
Allow user to specify subset of internal topics to clean up with application reset tool

Reviewers: Boyang Chen <boyang@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Walker Carlson <wcarlson@confluent.io>
2021-04-27 15:44:53 -07:00
ketulgupta1995 d949f1094e
KAFKA-12344 Support SlidingWindows in the Scala API (#10519)
Support SlidingWindows in the Scala API

Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-04-26 11:38:48 -07:00
Vito Jeng c972ac929e
KAFKA-5876: Apply UnknownStateStoreException for Interactive Queries (#9821)
KIP-216: IQ should throw different exceptions for different errors, Part 2

Reviewers: Matthias J. Sax <mjsax@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@confluent.io>
2021-04-23 12:09:32 -07:00
high.lee 105243afb1
KAFKA-10283; Consolidate client-level and consumer-level assignment within ClientState (#9640)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-04-23 10:06:55 -07:00
Walker Carlson 72236f343d
KAFKA-12691: Add case where task can be considered idling (#10565)
Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-04-21 19:43:18 -07:00
A. Sophie Blee-Goldman a24e78b34a
HOTFIX: remove unimplemented SPILL_TO_DISK buffer full strategy (#10571)
Remove enum for the spill-to-disk strategy since this hasn't been implemented.

Reviewers: Walker Carlson <wcarlson@confluent.io>
2021-04-20 19:14:26 -07:00
high.lee 8360db41da
MINOR: Modify unnecessary access specifiers (#9861)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-04-20 13:58:15 +08:00
Sergio Peña 15c24da888
KAFKA-10847: Delete Time-ordered duplicated records using deleteRange() internally (#10537)
This PR changes the TimeOrderedKeySchema composite key from time-seq-key -> time-key-seq to allow deletion of duplicated time-key records using the RocksDB deleteRange API. It also removes all duplicates when put(key, null) is called. Currently, the put(key, null) was a no-op, which was causing problems because there was no way to delete any keys when duplicates are allowed.

The RocksDB deleteRange(keyFrom, keyTo) deletes a range of keys from keyFrom (inclusive) to keyTo (exclusive). To make keyTo inclusive, I incremented the end key by one when calling the RocksDBAccessor.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-04-18 11:18:09 -07:00
Guozhang Wang 035449eb55
KAFKA-12633: Remove deprecated APIs in TopologyTestDriver (#10508)
As well as related test classes.

Reviewers: John Roesler <vvcephei@apache.org>
2021-04-18 10:46:01 -07:00
Ismael Juma 89933f21f2
KAFKA-12612: Remove `checksum` from ConsumerRecord/RecordMetadata for 3.0 (#10470)
The methods have been deprecated since 0.11 without replacement since
message format 2 moved the checksum to the record batch (instead of the
record).

Unfortunately, we did not deprecate the constructors that take a checksum
(even though we intended to) so we cannot remove them. I have deprecated
them for removal in 4.0 and added a single non deprecated constructor to
`ConsumerRecord` and `RecordMetadata` that take all remaining parameters.
`ConsumerRecord` could do with one additional convenience constructor, but
that requires a KIP and hence should be done separately.

Also:
* Removed `ChecksumMessageFormatter`, which is technically not public
API, but may have been used with the console consumer.
* Updated all usages of `ConsumerRecord`/`RecordMetadata` constructors
to use the non deprecated ones.
* Added tests for deprecated `ConsumerRecord/`RecordMetadata`
constructors.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
2021-04-14 14:38:37 -07:00
Matthias J. Sax b163706dc9
KAFKA-12650: fix NPE in InternalTopicManagerTest (#10529)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Luke Chen (@showuon), Bruno Cadonna <bruno@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2021-04-14 11:45:45 -07:00
Luke Chen 232ffc358a
KAFKA-9295: improve KTableKTableForeignKeyInnerJoinMultiIntegrationTest (#10409)
Wait for Streams to get to RUNNING before proceeding with the test, some general cleanup

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-04-13 11:41:12 -07:00
dengziming 88eb24db40
KAFKA-12637: Remove deprecated PartitionAssignor interface (#10512)
Remove PartitionAssignor and related classes, update docs and move unit test

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-04-12 18:37:01 -07:00
A. Sophie Blee-Goldman c608d8480e
KAFKA-7606: Remove deprecated options from StreamsResetter (#10411)
Remove deprecated --zookeeper and --execute flags

Reviewers: Matthias J. Sax <mjsax@confluent.io>
2021-04-12 18:24:18 -07:00
Marco Aurelio Lotz e677782112
KAFKA-9527: fix NPE when using time-based argument for Stream Resetter Tool (#10042)
Reviewers: Jorge Esteban Quilcate Otoya <quilcate.jorge@gmail.com>, Matthias J. Sax <matthias@confluent.io>
2021-04-12 09:28:23 -07:00
Matthias J. Sax 872b44455c
HOTFIX: delete removed WindowedStore.put() method (#10517)
Reviewers: Boyang Chen <boyang@confluent.io>
2021-04-09 14:58:44 -07:00
Jorge Esteban Quilcate Otoya c9cab2beb8
KAFKA-8410: Migrate KStream Stateless operators to new Processor API (#10381)
Migrate KStream stateless operators to new Processor API.
Following PRs will complete migration of KStream stateful operators and KTable.
No expected functionality changes.

Reviewers: John Roesler <vvcephei@apache.org>
2021-04-09 14:49:54 -05:00
Luke Chen f76b8e4938
KAFKA-9831: increase max.poll.interval.ms to avoid unexpected rebalance (#10301)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-04-09 12:19:14 -07:00
Jorge Esteban Quilcate Otoya db0323e9ba
KAFKA-12449: Remove deprecated WindowStore#put (#10293)
Removes `WindowStore#put(K,V)` that was deprecated via KIP-474.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-04-09 11:49:37 -07:00
Guozhang Wang aa0f450dad
KAFKA-12630: Remove deprecated KafkaClientSupplier#getAdminClient in Streams (#10502)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2021-04-08 08:39:51 -07:00
A. Sophie Blee-Goldman e9c5a3995a
MINOR: un-deprecate StreamsConfig overloads to support dependency injection (#10484)
In #5344 it came to our attention that the StreamsConfig overloads of the KafkaStreams constructors are actually quite useful for dependency injection, providing a cleaner way to configure dependencies and better type safety.

Reviewers: Matthias J. Sax <mjsax@confluent.io>
2021-04-07 21:13:32 -07:00
Guozhang Wang 3ca5a3bb78
KAFKA-12568: Remove deprecated APIs in KStream, KTable and Joined (#10421)
This is related to KIP-307 / KIP-372 / KIP-479.

Reviewers: John Roesler <vvcephei@apache.org>
2021-04-07 17:38:43 -07:00
Guozhang Wang 04f47c54c2
KAFKA-12527: Remove deprecated PartitionGrouper annotation (#10380)
A quick follow-up rebased on https://github.com/apache/kafka/pull/10302 to remove deprecated annotation.
2021-04-07 13:57:12 -07:00
high.lee 00ec646c5a
KAFKA-7785: move internal DefaultPartitionGrouper (#10302)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-04-07 13:36:39 -07:00
Sergio Peña 37493d1e18
KAFKA-10847: Add new RocksDBTimeOrderedWindowStore that persists (time-key)-value records (#10331)
This new store is more efficient when calling range queries with only time parameters, like `fetch(from, to)`. For range queries using key ranges, then the current RocksDBWindowStore should be used.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-04-07 13:35:12 -07:00
Ismael Juma 2f36001987
KAFKA-12579: Remove various deprecated clients classes/methods for 3.0 (#10438)
* Remove `ExtendedSerializer` and `ExtendedDeserializer`, deprecated since 2.1.
The extra functionality was also made available in `Serializer` and `Deserializer`.
* Remove `close(long, TimeUnit)` from the producer, consumer and admin client,
deprecated since 2.0 for the consumer and 2.2 for the rest. The replacement is `close(Duration)`.
* Remove `ConsumerConfig.addDeserializerToConfig` and `ProducerConfig.addSerializerToConfig`,
deprecated since 2.7 with no replacement. These methods were not intended to be public API
and are likely not used much (if at all).
* Remove `NoOffsetForPartitionException.partition()`, deprecated since 0.11. `partitions()`
should be used instead.
* Remove `MessageFormatter.init(Properties)`, deprecated since 2.7. The `configure(Map)`
method should be used instead.
* Remove `kafka.common.MessageFormatter`, deprecated since 2.7.
`org.apache.kafka.common.MessageFormatter` should be used instead.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
2021-04-06 08:54:53 -07:00
John Roesler 4ed7f2cd01
KAFKA-12593: Fix Apache License headers (#10452)
* Standardize license headers in scala, python, and gradle files.
* Relocate copyright attribution to the NOTICE.
* Add a license header check to `spotless` for scala files.

Reviewers: Ewen Cheslack-Postava <ewencp@apache.org>, Matthias J. Sax <mjsax@apache.org>, A. Sophie Blee-Goldman <ableegoldman@apache.org
2021-04-01 10:38:37 -05:00
A. Sophie Blee-Goldman 3eff8d39f1
HOTFIX: move rebalanceInProgress check to skip commit during handleCorrupted (#10444)
Minor followup to #10407 -- we need to extract the rebalanceInProgress check down into the commitAndFillInConsumedOffsetsAndMetadataPerTaskMap method which is invoked during handleCorrupted, otherwise we may attempt to commit during a a rebalance which will fail

Reviewers: Matthias J. Sax <mjsax@confluent.io>
2021-03-30 18:55:38 -07:00
A. Sophie Blee-Goldman 0189298d86
KAFKA-12288: remove task-level filesystem locks (#10342)
The filesystem locks don't protect access between StreamThreads, only across different instances of the same Streams application. Running multiple processes in the same physical state directory is not supported, and as of PR #9978 it's explicitly guarded against), so there's no reason to continue locking the task directories with anything heavier than an in-memory map.

Reviewers: Rohan Desai <rodesai@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-03-30 17:02:42 -07:00
ketulgupta1995 617ee00322
KAFKA-12509 Tighten up StateDirectory thread locking (#10418)
Modified LockAndOwner class to have Thread reference instead of just name

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-03-30 12:13:53 -07:00
Walker Carlson e971d94eb3
KAFKA-8784: remove default close for RocksDBConfigSetter (#10416)
Remove the default close implementation for RocksDBConfigSetter to avoid accidental memory leaks via C++ backed objects which are constructed but not closed by the user

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-03-29 14:26:39 -07:00
A. Sophie Blee-Goldman fb2eef94a4
KAFKA-12523: handle TaskCorruption and TimeoutException during handleCorruption and handleRevocation (#10407)
Need to handle TaskCorruptedException and TimeoutException that can be thrown from offset commit during handleRevocation or handleCorruption

Reviewers: Matthias J. Sax <mjsax@confluent.org>, Guozhang Wang <guozhang@confluent.io>
2021-03-29 14:06:22 -07:00
Guozhang Wang d5fd491bf7
KAFKA-7106: remove deprecated Windows APIs (#10378)
1. Remove all deprecated APIs in KIP-328.
2. Remove deprecated APIs in Windows in KIP-358.

Reviewers: John Roesler <vvcephei@apache.org>
2021-03-28 12:33:40 -07:00
Guozhang Wang b8058829bb
KAFKA-12562: Remove deprecated APIs in KafkaStreams and returned state classes (#10412)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2021-03-28 12:20:31 -07:00
Walker Carlson f91d592a27
KAFKA-12537: fix application shutdown corner case with only one thread (#10387)
When in EOS the run loop terminates on that thread before the shutdown can be called. This is a problem for EOS single thread applications using the application shutdown feature.

I changed it so in all cases with a single thread, the dying thread will spin up a new thread to communicate the shutdown and terminate the dying thread. Also @ableegoldman refactored the catch blocks in runloop.

Co-authored-by: A. Sophie Blee-Goldman <ableegoldman@gmail.com>

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-03-26 19:55:27 -07:00
Matthias J. Sax 03f13d1c41
KAFKA-12452: Remove deprecated overloads of ProcessorContext#forward (#10300)
ProcessorContext#forward was changed via KIP-251 in 2.0.0 release. This PR removes the old and deprecated overloads.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2021-03-25 19:49:18 -07:00
John Roesler 9ef52dd2db
KAFKA-12508: Disable KIP-557 (#10397)
A major issue has been raised that this implementation of
emit-on-change is vulnerable to a number of data-loss bugs
in the presence of recovery with dirty state under at-least-once
semantics. This should be fixed in the future when we implement
a way to avoid or clean up the dirty state under at-least-once,
at which point it will be safe to re-introduce KIP-557 and
complete it.

Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>
2021-03-25 14:42:26 -05:00
John Roesler 9bf5c57997
KAFKA-12435: Fix javadoc errors (#10392)
There were errors while generating javadoc for the streams:test-utils module
because the included TopologyTestDriver imported some excluded classes.

This fixes the errors by inlining the previously excluded packages.

Reviewers: Chia-Ping Tsai <chia7712@apache.org>, Ismael Juma <ijuma@apache.org>
2021-03-24 13:55:27 -05:00
Chia-Ping Tsai 9af81955c4
KAFKA-12173 Migrate streams:streams-scala module to JUnit 5 (#9858)
1. replace org.junit.Assert by org.junit.jupiter.api.Assertions
2. replace org.junit by org.junit.jupiter.api
3. replace Before by BeforeEach
4. replace After by AfterEach
5. remove ExternalResource from all scala modules
6. add explicit AfterClass/BeforeClass to stop/start EmbeddedKafkaCluster

Noted that this PR does not migrate stream module to junit 5 so it does not introduce callback of junit 5 to deal with beforeAll/afterAll. The next PR of migrating stream module can replace explicit beforeAll/afterAll by junit 5 extension. Or we can keep the beforeAll/afterAll if it make code more readable.

Reviewers: John Roesler <vvcephei@apache.org>
2021-03-25 01:04:39 +08:00
Guozhang Wang 7071ded2a6
KAFKA-12524: Remove deprecated segments() (#10379)
Reviewers: Boyang Chen <boyang@confluent.io>
2021-03-23 21:05:42 -07:00
wenbingshen e0cbd0fa66
MINOR: Remove duplicate definition about 'the' from kafka project (#10370)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-03-23 10:44:55 +08:00
Boyang Chen 80f373d34f
(Cherry-pick) KAFKA-9274: handle TimeoutException on task reset (#10000) (#10372)
This PR was removed by accident in trunk and 2.8, bringing it back.

Co-authored-by: Matthias J. Sax <matthias@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-03-22 13:39:29 -07:00
Bill Bejeck a290c8e1df
KAFKA-3745: Add access to read-only key in value joiner (#10150)
This PR implements adding read-only access to the key for KStream.join as described in KIP-149

This PR as it stands does not affect the Streams Scala API. Updating the Streams Scala API will be done in a follow-up PR.
Additionally, the original KIP did not include the KTable API, but I don't see any reason why we wouldn't want the same functionality there as well, this will be done in an additional follow-up PR after updating the existing KIP.

Reviewers: Matthias J. Sax <mjsax@apache.org>
2021-03-20 22:19:01 -04:00
A. Sophie Blee-Goldman 13b4ca8795
KAFKA-12500: fix memory leak in thread cache (#10355)
Need to exclude threads in PENDING_SHUTDOWN from the num live threads computation used to compute the new cache size per thread. Also adds some logging to help follow what's happening when a thread is added/removed/replaced.

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Walker Carlson <wcarlson@confluent.io>, John Roesler <john@confluent.io>
2021-03-19 18:11:07 -07:00
Bruno Cadonna 5dfbefcb37
KAFKA-12508: Emit records with same value and same timestamp (#10360)
Emit on change introduced in Streams with KIP-557 might lead to
data loss if a record is put into a source KTable and emitted
downstream and then a failure happens before the offset could be
committed. After Streams rereads the record, it would find a record
with the same key, value and timestamp in the KTable (i.e. the same
record that was put into the KTable before the failure) and not
forward it downstreams. Hence, the record would never be processed
downstream of the KTable which breaks at-least-once and exactly-once
processing guarantees.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-03-19 15:51:19 -07:00
Walker Carlson 367eca083b
KAFKA-12503: inform threads to resize their cache instead of doing so for them (#10356)
Make it so threads do not directly resize other thread's caches

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-03-18 19:34:39 -07:00
Bruno Cadonna 4be0033e62
KAFKA-10357: Add setup method to internal topics (#10317)
For KIP-698, we need a way to setup internal topics without validating them. This PR adds a setup method to the InternalTopicManager for that purpose.

Reviewers: Rohan Desai <rohan@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-03-18 09:52:08 -07:00
Walker Carlson 336d26accf
HOTFIX: timeout issue in removeStreamThread() (#10321)
Timeout is a duration not a point in time.

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-03-15 20:12:37 -07:00
Boyang Chen 8da65936d7
revert stream logging level back to ERROR (#10320)
An accidental change of logging level for streams from #9579, correcting it.

Reviewers: Bill Bejeck <bbejeck@gmail.com>
2021-03-15 11:00:35 -07:00
A. Sophie Blee-Goldman 4fe4cdc4a6
KAFKA-12462: proceed with task revocation in case of thread in PENDING_SHUTDOWN (#10311)
Always invoke TaskManager#handleRevocation when the thread is in PENDING_SHUTDOWN

Reviewers: Walker Carlson <wcarlson@confluent.io>
2021-03-12 20:06:54 -08:00
Bruno Cadonna b519117b22
KAFKA-10357: Add missing repartition topic validation (#10305)
Reviewers: Rohan Desai <rohan@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-03-12 08:59:41 -08:00
Bruno Cadonna 800d9b5abc
KAFKA-10357: Add validation method for internal topics (#10266)
For KIP-698, we need a way to validate internal topics before we create them. This PR adds a validation method to the InternalTopicManager for that purpose.

Reviewers: Rohan Desai <rohan@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-03-11 09:55:30 -08:00
Matthias J. Sax 50518fa8e0
KAFKA-12441: remove deprecated method StreamsBuilder#addGlobalStore (#10284)
The method StreamsBuilder#addGlobalStore was simplified via KIP-233 in 1.1.0 release. This PR removes the old and deprecated overload.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-03-10 16:14:44 -08:00
Rohit Deshpande 029f5a136a
KAFKA-10062: Add a methods to retrieve the current timestamps as known by the Streams app (#9744)
Implements KIP-622.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-03-10 14:26:24 -08:00
Lee Dongjin e6f8f5d0ae
MINOR: Remove unused variables, methods, parameters, unthrown exceptions, and fix typos (#9457)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com
2021-03-10 13:21:30 +08:00
Walker Carlson 207bb0826b
KAFKA-12347: updating TaskMetadata (#10211)
added committed offset, high watermark and idle duration to the taskMetadata.

Reviewers: Boyang Chen <boyang@confluent.io>
2021-03-05 11:27:25 -08:00
A. Sophie Blee-Goldman 23b61ba383
KAFKA-12375: don't reuse thread.id until a thread has fully shut down (#10215)
Always grab a new thread.id and verify that a thread has fully shut down to DEAD before removing from the `threads` list and making that id available again

Reviewers: Walker Carlson <wcarlson@confluent.io>, Bruno Cadonna <cadonna@confluent.io>
2021-03-02 16:28:15 -08:00
Bruno Cadonna a848e0c420
KAFKA-10357: Extract setup of changelog from Streams partition assignor (#10163)
To implement the explicit user initialization of Kafka Streams as
described in KIP-698, we first need to extract the code for the
setup of the changelog topics from the Streams partition assignor
so that it can also be called outside of a rebalance.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang@confluent.io>
2021-03-02 12:00:00 -08:00
vamossagar12 4c5867a39b
KAFKA-10766: Unit test cases for RocksDBRangeIterator (#9717)
This PR aims to add unit test cases for RocksDBRangeIterator which were missing.

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-03-02 09:55:14 -08:00
vamossagar12 b2075a0946
KAFKA-12289: Adding test cases for prefix scan in InMemoryKeyValueStore (#10052)
Co-authored-by: Bruno Cadonna <bruno@confluent.io>

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-03-02 09:53:27 -08:00
John Roesler a92b986c85
KAFKA-12268: Implement task idling semantics via currentLag API (#10137)
Implements KIP-695

Reverts a previous behavior change to Consumer.poll and replaces
it with a new Consumer.currentLag API, which returns the client's
currently cached lag.

Uses this new API to implement the desired task idling semantics
improvement from KIP-695.

Reverts fdcf8fbf72 / KAFKA-10866: Add metadata to ConsumerRecords (#9836)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <guozhang@apache.org>
2021-03-02 08:20:47 -06:00
Luke Chen 020ead4b97
MINOR: Format the revoking active log output in `StreamsPartitionAssignor` (#10242)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
2021-03-02 10:32:21 +01:00
Guozhang Wang 22dbf89886
KAFKA-12323 Follow-up: Refactor the unit test a bit (#10205)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-03-01 12:11:11 -08:00
Guozhang Wang 7a0e371e0e
MINOR: Remove stack trace of the lock exception in a debug log4j (#10231)
Although the lock exception log is at the DEBUG level only, many people were confused with stack traces that something serious happened; plus, in the source code there is only one call path that can lead to the capture of LockException at task manager/stream thread, so even for debugging purposes there’s no extra information we can get from anyways.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2021-03-01 11:32:05 -08:00
dengziming e2bffb9086
MINOR: Word count should account for extra whitespaces between words (#10229)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-03-01 11:02:28 -08:00
Walker Carlson 958f90e710
KAFKA-12375: fix concurrency issue in application shutdown (#10213)
Need to ensure that `enforceRebalance` is used in a thread safe way

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-02-26 12:17:28 -08:00
Matthias J. Sax e2a0d0c90e
MINOR: bump release version to 3.0.0-SNAPSHOT (#10186)
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2021-02-24 17:49:18 -08:00
Guozhang Wang f75efb96fa
KAFKA-12323: Set timestamp in record context when punctuate (#10170)
We need to preserve the timestamp when punctuating so that downstream operators would retain it via context.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-02-23 20:41:02 -08:00
CHUN-HAO TANG 954c090ffc
MINOR: apply Utils.isBlank to code base (#10124)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-02-20 11:44:29 +08:00
Matthias J. Sax b35ca4349d
KAFKA-9274: Throw TaskCorruptedException instead of TimeoutException when TX commit times out (#10072)
Part of KIP-572: follow up work to PR #9800. It's not save to retry a TX commit after a timeout, because it's unclear if the commit was successful or not, and thus on retry we might get an IllegalStateException. Instead, we will throw a TaskCorruptedException to retry the TX if the commit failed.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2021-02-19 13:36:07 -08:00
Bruno Cadonna 5b761e66cc
MINOR: Correct warning about increasing capacity when insufficient nodes to assign standby tasks (#10151)
We should only recommend to increase the number of KafkaStreams instances, not the number of threads, since a standby task can never be placed on the same instance as an active task regardless of the thread count

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-02-19 12:26:12 -08:00
Marco Aurelio Lotz c8112b5ecd
KAFKA-9524: increase retention time for window and grace periods longer than one day (#10091)
Reviewers: Victoria Xia <victoria.xia@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2021-02-18 18:18:53 -08:00
Matthias J. Sax b50a78b4ac
TRIVIAL: fix JavaDocs formatting (#10134)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Bill Bejeck <bill@confluent.io>
2021-02-18 16:02:25 -08:00
Matthias J. Sax d4de383f5f
KAFKA-12272: Fix commit-interval metrics (#10102)
Reviewer: A. Sophie Blee-Goldman <sophie@confluent.io>
2021-02-11 16:49:05 -08:00
Lee Dongjin 83ec80988a
MINOR: Remove always-passing validation in TestRecordTest#testProducerRecord (#9930)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-02-11 23:21:22 +08:00
dengziming 3769bc21b5
MINOR: replace hard-coding utf-8 with StandardCharsets.UTF_8 (#10079)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-02-09 10:06:01 +08:00
John Roesler 1f240ce179 bump to 2.9 development version 2021-02-07 09:25:36 -06:00
Boyang Chen d2cb2dc45d
KAFKA-9751: Forward CreateTopicsRequest for FindCoordinator/Metadata when topic creation is needed (#9579)
Consolidate auto topic creation logic to either forward a CreateTopicRequest or handling the creation directly as AutoTopicCreationManager, when handling FindCoordinator/Metadata request.

Co-authored-by: Jason Gustafson <jason@confluent.io>

Reviewers: Jason Gustafson <jason@confluent.io>
2021-02-06 13:04:30 -08:00
Matthias J. Sax 0bc394cc1d
KAFKA-9274: handle TimeoutException on task reset (#10000)
Part of KIP-572: We move the offset reset for the internal "main consumer" when we revive a corrupted task, from the "task cleanup" code path, to the "task init" code path. For this case, we have already logic in place to handle TimeoutException that might be thrown by consumer#committed() method call.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2021-02-05 17:25:50 -08:00
Matthias J. Sax 470e6f2b9a
KAFKA-9274: Add timeout handling for `StreamPartitioner` (#9997)
Part of KIP-572: When a custom `StreamPartitioner` is used, we need to get the number of partitions of output topics from the producer. This `partitionFor(topic)` call may through a `TimeoutException` that we now handle gracefully.

Reviewers: John Roesler <john@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-02-04 19:02:56 -08:00
Arjun Satish c8dc74e16d
MINOR: Word count should account for extra whitespaces between words (#10044)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-02-04 18:54:08 -08:00
Ivan Ponomarev 5552da3a20
KAFKA-5488: Add type-safe split() operator (#9107)
Implements KIP-418, that deprecated the `branch()` operator in favor of the newly added and type-safe `split()` operator.

Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>
2021-02-04 16:23:35 -08:00
A. Sophie Blee-Goldman 706f5097b7
KAFKA-10716: persist UUID in state directory for stable processId across restarts (#9978)
To stabilize the task assignment across restarts of the JVM we need some way to persist the process-specific UUID. We can just write it to a file in the state directory, and initialize it from there or create a new one if no prior UUID exists.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Leah Thomas <lthomas@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-02-03 18:01:26 -08:00
vamossagar12 51833bf37c
KAFKA-10648: Add Prefix Scan support to State Stores (#9508)
Add prefix scan support to State stores. Currently, only RocksDB and InMemory key value stores are being supported.

Reviewers: Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-02-03 10:26:30 -08:00
leah f5a2fbac6d
KAFKA-10366 & KAFKA-9649: Implement KIP-659 to allow TimeWindowedDeserializer and TimeWindowedSerde to handle window size (#9253)
See KIP details and discussions here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-659%3A+Improve+TimeWindowedDeserializer+and+TimeWindowedSerde+to+handle+window+size

Deprecates methods that allow users to skip setting a window size when one is needed. Adds a window size streams config to allow the timeWindowedDeserializer to calculate window end time.

Reviewers: Walker Carlson <wcarlson@confluent.io>, John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2021-02-01 16:20:35 -08:00
CHUN-HAO TANG 47b417260c
MINOR: fix @link tags in javadoc (#9939)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-02-01 12:40:02 +08:00
Walker Carlson bd6c212538
KAFKA-12247: add timeout and static group rebalance to remove thread (#9984)
Add timeout to remove thread, and trigger thread to explicitly leave the group even in case of static membership

Reviewers: Bruno Cadonna <bruno@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-01-29 12:35:34 -08:00
Lee Dongjin e3ff4b0870
KAFKA-10604: Fix Streams default state.dir (#9420)
Make the default state store directory location to follow
OS-specific temporary directory settings or java.io.tmpdir
JVM parameter, with Utils#getTempDir.

Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <vvcephei@apache.org>
2021-01-29 09:45:13 -06:00
mathieu 550d8b8260
KAFKA-8744: Update Scala API to give names to processors (#9738)
As it's only API extension to match the java API with Named object with lots of duplication, I only tested the logic once.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2021-01-28 09:57:18 -05:00
Chia-Ping Tsai a5deb5326f
MINOR: remove duplicate code of serializing auto-generated data (#9964)
Reviewers: David Jacot <djacot@confluent.io>
2021-01-28 17:14:28 +08:00
John Roesler 4d28391480
KAFKA-10867: Improved task idling (#9840)
Use the new ConsumerRecords.metadata() API to implement
improved task idling as described in KIP-695

Reviewers: Guozhang Wang <guozhang@apache.org>
2021-01-27 21:57:20 -06:00
John Roesler fdcf8fbf72
KAFKA-10866: Add metadata to ConsumerRecords (#9836)
Expose fetched metadata via the ConsumerRecords
object as described in KIP-695.

Reviewers: Guozhang Wang <guozhang@apache.org>
2021-01-27 18:18:38 -06:00
dengziming c830bce570
MINOR: Fix meaningless message in assertNull validation (#9965)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-01-27 11:14:31 +08:00
Walker Carlson 647c609cef
KAFKA-10555: Improve client state machine (#9720)
Implements KIP-696: Add new state PENDING_ERROR to KafkaStreams client.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <bruno@confluent.io>
2021-01-26 11:43:49 -08:00
Andy Wilkinson e1a4dccc15
KAFKA-12190: Fix setting of file permissions on non-POSIX filesystems (#9947)
Previously, StateDirectory used PosixFilePermissions to configure its directories' permissions which fails on Windows as its file system is not POSIX-compliant. This PR updates StateDirectory to fall back to the File API on non-POSIX-compliant file systems. 

Reviewers: Luke Chen <43372967+showuon@users.noreply.github.com>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-01-25 11:33:05 -08:00
Walker Carlson 667a6b2d26
MINOR: fix record time in test shouldWipeOutStandbyStateDirectoryIfCheckpointIsMissing (#9948)
Reviewer: Matthias J. Sax <matthias@confluent.io>
2021-01-22 15:55:05 -08:00
Bruno Cadonna 7b06a2417d
MINOR: Restore interrupt status when closing (#9863)
We do not always own the thread that executes the close()  method, i.e., we do not know the interruption policy of the thread. Thus, we should not swallow the interruption. The least we can do is restoring the interruption status before the current thread exits this method.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-01-22 12:15:52 -08:00
Bruno Cadonna 019cd4ab80
KAFKA-10357: Extract setup of repartition topics from Streams partition assignor (#9848)
KIP-698: extract the code for the setup of the repartition topics from the Streams partition assignor so that it can also be called outside of a rebalance.

Reviewers: Leah Thomas <lthomas@confluent.io> , Guozhang Wang <guozhang@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-01-22 10:03:39 -08:00
A. Sophie Blee-Goldman 45550e98f0
MINOR: log 2min processing summary of StreamThread loop (#9941)
Remove all INFO-level logging from the main StreamThread loop in favor of a summary with a 2min interval

Reviewers: Walker Carlson <carlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-01-21 16:32:55 -08:00
Matthias J. Sax 92e72f7bf9
KAFKA-12185: fix ConcurrentModificationException in newly added Tasks container class (#9940)
Reviewers: Guozhang Wang <guozhand@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-01-21 08:52:34 -08:00
Luke Chen 462c89e0b4
KAFKA-12211: don't change perm for base/state dir when no persistent store (#9904)
If a user doesn't have Persistent Stores, we won't create base dir and state dir and should not try to set permissions on them.

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-01-20 11:37:56 -08:00
Chia-Ping Tsai 38b320a1f2
HOTFIX: fix RocksDBMetricsTest (#9935)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-01-20 22:58:04 +08:00
Matthias J. Sax 0158e1d719
MINOR: Add 'task container' class to KafkaStreams TaskManager (#9835)
Kafka Streams' TaskManager is a central class that grew quite big. This
PR breaks out a new 'task container' class to descope what TaskManager
does. In follow up PRs, we plan to move more methods from TaskManager
to the new 'Tasks.java' class and also improve task-type type safety.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2021-01-19 19:28:46 -08:00
Bruno Cadonna add160d522
KAFKA-9924: Add docs for RocksDB properties-based metrics (#9895)
Document the new properties-based metrics for RocksDB

Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-01-19 13:30:23 -08:00
Lee Dongjin 4c6f900673
KAFKA-12219: Add 'synchronized' keyword to InMemoryKeyValueStore#[reverseRange, reverseAll] (#9923)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-01-19 10:34:46 -08:00
Luke Chen 130274b7f6
KAFKA-10017: fix uncaucht-exception handling in EosBetaUpgradeIntegrationTest (#9733)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-01-19 10:31:14 -08:00
Chia-Ping Tsai 6752f28254
KAFKA-12195 Fix synchronization issue happening in KafkaStreams (#9887)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-01-19 17:54:34 +08:00
Luke Chen 277c4371c7
KAFKA-12194: use stateListener to catch each state change (#9888)
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bruno Cadonna <bruno@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2021-01-19 13:38:30 +08:00
Chia-Ping Tsai 04827dad51
KAFKA-12171: Migrate streams:test-utils module to JUnit 5 (#9856)
* replace `org.junit.Assert` by `org.junit.jupiter.api.Assertions`
* replace `org.junit` by `org.junit.jupiter.api`
* replace `org.junit.runners.Parameterized` by `org.junit.jupiter.params.ParameterizedTest`
* replace `org.junit.runners.Parameterized.Parameters` by `org.junit.jupiter.params.provider.{Arguments, MethodSource}`
* replace `Before` by `BeforeEach`
* replace `After` by `AfterEach`

Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-01-13 21:23:48 -08:00
A. Sophie Blee-Goldman fec6831b10
MINOR: reduce StreamThread INFO logging during low traffic (#9875)
Avoid spamming the logs at the INFO level in a tight loop when there are no new records being polled

Reviewers: Walker Carlson <wcarlson@confluent.io>, Leah Thomas <lthomas@confluent.io>
2021-01-13 14:32:09 -08:00
Bruno Cadonna ee5ef89a71
MINOR: Fix flaky test shouldQuerySpecificActivePartitionStores (#9873)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-01-13 10:01:14 -08:00
Boyang Chen 94a0aac81d
MINOR: Add restoration time tracking (#9830)
Add Stream restoration time tracking log

Reviewers: John Roesler <vvcephei@apache.org>
2021-01-13 09:05:23 -08:00
Chia-Ping Tsai bed4c6a33b
KAFKA-12172 Migrate streams:examples module to JUnit 5 (#9857)
This PR includes following changes.
1. replace org.junit.Assert by org.junit.jupiter.api.Assertions
2. replace org.junit by org.junit.jupiter.api
3. replace Before by BeforeEach
4. replace After by AfterEach

Reviewers: Ismael Juma <ismael@confluent.io>
2021-01-13 21:02:13 +08:00
Walker Carlson aedb53a4e6
KAFKA-10500: Add KafkaStreams#removeStreamThread (#9695)
Add the ability to remove running threads

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-01-11 12:38:02 -08:00
dengziming 119a2d9127
MINOR: Substitute assertEquals(null) with assertNull (#9852)
Reviewers: David Jacot <djacot@confluent.io>
2021-01-10 20:06:37 +01:00
Chia-Ping Tsai 913a019d6c
MINOR: replace test "expected" parameter by assertThrows (#9520)
This PR includes following changes.

1. @Test(expected = Exception.class) is replaced by assertThrows
2. remove reference to org.scalatest.Assertions
3. change the magic code from 1 to 2 for testAppendAtInvalidOffset to test ZSTD
4. rename testMaybeAddPartitionToTransactionXXXX to testNotReadyForSendXXX
5. increase maxBlockMs from 1s to 3s to avoid unexpected timeout from TransactionsTest#testTimeout

Reviewers: Ismael Juma <ismael@confluent.io>
2021-01-10 20:20:13 +08:00
Matthias J. Sax 94f9b919ab
KAFKA-9566: Improve DeserializationExceptionHandler JavaDocs (#9837)
Reviewers: John Roesler <john@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2021-01-07 15:59:33 -08:00
Matthias J. Sax 22e8e71156
KAFKA-9274: Fix commit-TimeoutException handling for EOS (#9800)
If EOS is enabled and the TX commit fails with a timeout,
we should not process more messages (what is ok for non-EOS)
because we don't really know the status of the TX.
If the commit was indeed successful, we won't have an open TX
can calling send() would fail with an fatal error.

Instead, we should retry the (idempotent) commit of the TX,
and start a new TX afterwards.

Reviewers: Boyang Chen <boyang@confluent.io>, John Roesler <john@confluent.io>
2021-01-06 14:01:02 -08:00
Matthias J. Sax 393139064f
MINOR: code cleanup for Kafka Streams task interface (#9801)
Reviewer: John Roesler <john@confluent.io>
2021-01-06 13:51:07 -08:00
Matthias J. Sax 2e0c686ec1
MINOR: improve KafkaStreams replication factor documentation (#9829)
Reviewers: Jason Gustafson <jason@confluent.io>, Jim Galasyn <jim.galasyn@confluent.io>
2021-01-06 11:31:05 -08:00
fml2 baea94d926
KAFKA-10722: Improve JavaDoc for KGroupedStream.aggregate and other similar methods (#9606)
Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-12-23 15:50:52 -08:00
Matthias J. Sax a0e0028b16
MINOR: add test for repartition/source-topic/changelog optimization (#9668)
If topology optimization is enabled, KafkaStreams does not create store changelog topics but re-uses source input topics if possible. However, this optimization should not be applied to internal repartition topics, because those are actively purged.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
2020-12-23 11:56:55 -08:00
Dongxu Wang 5b06e9690b
MINOR: Use ApiUtils' methods static imported consistently (#9763)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2020-12-24 00:22:30 +08:00
Chia-Ping Tsai 9aeacb425b
KAFKA-10815 EosTestDriver#verifyAllTransactionFinished should break loop if all partitions are verified (#9706)
Reviewers: Matthias J. Sax <mjsax@apache.org>
2020-12-23 10:04:36 +08:00
Bill Bejeck b6891f6729
MINOR: Kafka Streams updates for 2.7.0 release (#9773)
Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-12-22 14:34:59 -08:00
APaMio 1670362236
MINOR: Replace Collection.toArray(new T[size]) by Collection.toArray(new T[0]) (#9750)
This PR is based on the research of https://shipilev.net/blog/2016/arrays-wisdom-ancients

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2020-12-21 17:38:33 +08:00
Matthias J. Sax b689507c41
MINOR: fix error message (#9730)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2020-12-17 17:09:45 -08:00
leah f4272fd5d3
KAFKA-9126: KIP-689: StreamJoined changelog configuration (#9708)
Add withLoggingEnabled and withLoggingDisabled for StreamJoined
to give StreamJoined the same flexibility as Materialized

Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>
2020-12-16 16:19:33 -06:00
leah 793117d455
KAFKA-10417: Update Cogrouped processor to work with suppress() and joins (#9727)
Correct the implementation of the Cogroup
processor to implement KTableProcessorSupplier.

Change the cogrouped processor from PassThrough to
KTablePassThrough to allow for sending old values.
KTablePassThrough extends KTableProcessorSupplier instead of
ProcessorSupplier to implement sending old values and the view() method.

Reviewers: Walker Carlson <wcarlson@confluent.io>, John Roesler <vvcephei@apache.org>
2020-12-15 14:53:56 -06:00
Walker Carlson d5dc7dfe00
KAFKA-10810: Replace stream threads (#9697)
StreamThreads can now be replaced in the streams uncaught exception handler

Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>, Leah Thomas <lthomas@confluent.io>
2020-12-11 15:08:21 -06:00
Matthias J. Sax 567a2ec737
KAFKA-10017: fix flaky EOS-beta upgrade test (#9688)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-12-10 17:32:30 -08:00
Boyang Chen 310e240abd
throw corresponding invalid producer epoch (#9700)
As suggested, ensure InvalidProducerEpoch gets caught properly on stream side.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-12-09 20:15:43 -08:00
mowczare cd95ce4ace
MINOR: fix typo "intervall" to "interval" (#5435)
Co-authored-by: Chia-Ping Tsai <chia7712@gmail.com>

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2020-12-09 20:15:34 +08:00
leah 78a986bf59
MINOR: Clean up streams metric sensors (#9696)
Reviewers: Bruno Cadonna <bruno@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2020-12-09 17:51:31 +08:00
Walker Carlson 9ece7fe372
KAFKA-10500: Allow people to add new StreamThread at runtime (#9615)
Part of KIP-663.

Reviewers: Bruno Cadonna <bruno@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-12-04 12:21:03 -07:00
Chia-Ping Tsai b9640a71c4
HOTFIX: fix failed build caused by StreamThreadTest (#9691)
Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-12-04 11:55:42 -07:00
Rohit Deshpande 4e9c7fc8a5
KAFKA-10629: TopologyTestDriver should not require a Properties argument (#9660)
Implements KIP-680.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Matthias J. Sax <matthias@confluent.io>
2020-12-04 11:51:12 -07:00
Luke Chen 20ae73b051
KAFKA-10665: close all kafkaStreams before purgeLocalStreamsState (#9674)
The flaky tests are because we forgot to close the kafkaStreams before purgeLocalStreamsState, so that sometimes there will be some tmp files be created/deleted during streams running(ex: checkpoint.tmp), and caused the DirectoryNotEmptyException or NoSuchFileException be thrown.

Reviewers:  Levani Kokhreidze, Bill Bejeck <bbejeck@apache.org>
2020-12-04 09:04:50 -05:00
leah 4cc6d204ec
KAFKA-10500: Add failed-stream-threads metric for adding + removing stream threads (#9614)
Part of KIP-663.

Reviewer: Bruno Cadonna <bruno@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-12-04 00:56:11 -07:00
Bruno Cadonna 7c68531a1f
MINOR: Fix flaky test shouldQueryOnlyActivePartitionStoresByDefault (#9681)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-12-03 13:59:32 -08:00
A. Sophie Blee-Goldman dc55be2d92
KAFKA-6687: restrict DSL to allow only Streams from the same source topics (#9609)
Followup to PR #9582, need to restrict DSL so only KStreams can be created from the same set of topic(s)s but not KTables, which can be tackled as followup work

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@confluent.io>
2020-12-01 11:26:16 -08:00
Bruno Cadonna 9211ff6ffd
MINOR: Increase unit test coverage of method ProcessorTopology#updateSourceTopics() (#9654)
The unit tests for method ProcessorTopology#updateSourceTopics() did not cover all
code paths.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2020-12-01 10:58:57 -08:00
A. Sophie Blee-Goldman dbf4e63ae7
KAFKA-10758: ProcessorTopology should only consider its own nodes when updating regex source topics (#9648)
We should ignore any source nodes that aren't part of the ProcessorTopology's subtopology when updating its source topics after a change in the topic metadata.

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Matthias J. Sax <mjsax@confluent.io>
2020-11-24 17:59:29 -08:00
Luke Chen 2a409200dc
KAFKA-10754: fix flaky tests by waiting kafka streams be in running state before assert (#9629)
The flaky test is because we didn't wait for the streams become RUNNING before verifying the state becoming ERROR state. This fix explicitly wait for the streams become RUNNING state. Also, put the 2nd stream into try resource block so it will be closed after the test.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2020-11-24 13:59:16 -08:00
Matthias J. Sax 72918a9816
MINOR: change default TX timeout only if EOS is enabled (#9618)
Reviewer: Boyang Chen <boyang@confluent.io>
2020-11-22 14:00:31 -08:00
Matthias J. Sax af4e34867b MINOR: fix formatting 2020-11-21 11:34:47 -08:00
dengziming 502a544ee7
KAFKA-10757: Fix compilation failure in StreamThreadTest (#9636)
Add missing `null` to `TaskManager` constructor.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2020-11-20 21:53:34 -08:00
Matthias J. Sax 351a22a12e
KAFKA-10755: Should consider commit latency when computing next commit timestamp (#9634)
Reviewer: Guozhang Wang <guozhang@confluent.io>
2020-11-20 18:55:40 -08:00
Luke Chen 179ecdf49e
KAFKA-10628: remove all the unnecessary parameters from the tests which are using TopologyTestDriver (#9507)
1. remove unneeded javadoc content.
2. Replace containsKey/setProperty with putIfAbsent
3. refactor the constructor of TopologyTestDriverTest

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2020-11-19 11:23:41 +08:00
Matthias J. Sax 192db666c9
KAFKA-9274: Handle TimeoutException on commit (#9570)
- part of KIP-572
 - when KafkaStreams commits a task, a TimeoutException should not kill
   the thread but `task.timeout.ms` should be triggered and the commit
   should be retried in the next loop

Reviewer: John Roesler <john@confluent.io>
2020-11-18 16:23:43 -08:00
Walker Carlson d12fbb7c00
KAFKA-10500: Allow resizing of StreamThread state store caches (#9572)
- part of KIP-663

Reviewer: Bruno Cadonna <bruno@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-11-18 15:45:40 -08:00
Justine Olshan 28c57b273a
KAFKA-10618: Rename UUID to Uuid and make it more efficient (#9566)
As decided in KIP-516, the UUID class should be named Uuid. Change all instances of
org.apache.kafka.common.UUID to org.apache.kafka.common.Uuid.

Also modify Uuid so that it stores two `long` fields instead of wrapping java.util.UUID
to reduce memory usage.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2020-11-18 00:58:20 -08:00
Walker Carlson 5899f5fc4a
KAFKA-9331: Add a streams specific uncaught exception handler (#9487)
This PR introduces a streams specific uncaught exception handler that currently has the option to close the client or the application. If the new handler is set as well as the old handler (java thread handler) will be ignored and an error will be logged.
The application shutdown is achieved through the rebalance protocol.

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Leah Thomas <lthomas@confluent.io>, John Roesler <john@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2020-11-17 22:55:09 -08:00
Boyang Chen e7090173ee
KAFKA-10687: make ProduceRespone only returns INVALID_PRODUCER_EPOCH (#9569)
Ensures INVALID_PRODUCER_EPOCH recognizable from client side, and ensure the ProduceResponse always uses the old error code as INVALID_PRODUCER_EPOCH.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-11-17 19:56:38 -08:00
A. Sophie Blee-Goldman e71cb7ab11
KAFKA-10689: fix windowed FKJ topology and put checks in assignor to avoid infinite loops (#9568)
Fix infinite loop in assignor when trying to resolve the number of partitions in a topology with a windowed FKJ. Also adds a check to this loop to break out and fail the application if we detect that we are/will be stuck in an infinite loop

Reviewers: Matthias Sax <matthias@confluent.io>
2020-11-17 16:57:53 -08:00
fml2 53026b799c
Improve JavaDoc (#9594)
In the JavaDoc, the implemented interface was described inaccurately.
Also, the ordered list was formatted as plain text, not as html "ol".

Reviewers: Boyang Chen <boyang@confluent.io>
2020-11-14 21:03:12 -08:00
A. Sophie Blee-Goldman cb3dc6785b
KAFKA-6687: rewrite topology to allow reading the same topic multiple times in the DSL (#9582)
Rewrite DSL topology to allow reading a topic or pattern multiple times

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Leah Thomas <lthomas@confluent.io>
2020-11-13 14:35:26 -08:00
A. Sophie Blee-Goldman cfc813537e
MINOR: demote "Committing task offsets" log to DEBUG (#9489)
Demote "committing offsets" log message to DEBUG and promote/add summarizing INFO level logs in the main StreamThread loop

Reviewers: Boyang Chen <boyang@confluent.io>, Walker Carlson <wcarlson@confluent.io>, John Roesler <john@confluent.io>
2020-11-13 13:25:43 -08:00
leah 0b9c7512bf
KAFKA-10705: Make state stores not readable by others (#9583)
Change permissions on the folders for the state store so they're no readable or writable by "others", but still accessible by owner and group members.

Reviewers: Bruno Cadonna <bruno@confluent.io>,  Walker Carlson <wcarlson@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2020-11-13 10:44:09 -08:00
Walker Carlson c3d0140a55
KAFKA-10500: Makes the Stream thread list resizable (#9543)
Change the StreamThreads to be held in a List so that we
can dynamically change the number of threads easily.

Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>
2020-11-05 15:02:00 -06:00
Igor Soarez b3e77dfad9
KAFKA-10036: Improve handling and documentation of Suppliers (#9000)
Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-11-03 17:30:15 -08:00
A. Sophie Blee-Goldman c75fd66875
KAFKA-10651: read offsets directly from checkpoint for uninitialized tasks (#9515)
Read offsets directly from the checkpoint file if a task is uninitialized or closed

Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>
2020-10-30 13:36:35 -07:00
A. Sophie Blee-Goldman 92e947c3a1
KAFKA-10664: Delete existing checkpoint when writing empty offsets (#9534)
Delete the existing checkpoint file if told to write empty offsets map to ensure that corrupted offsets are not re-initialized from

Reviewers: Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@apache.org>
2020-10-30 13:28:31 -07:00
John Roesler dc4d3ecbb6
MINOR: Fix flaky shouldRejectNonExistentStoreName (#9426)
Fix flaky test by making sure Streams is
running before making assertions about IQ.

Reviewers: Lee Dongjin <dongjin@apache.org>, Guozhang Wang <guozhang@apache.org>, Chia-Ping Tsai <chia7712@apache.org>
2020-10-29 12:23:55 -05:00
John Roesler 933a813950
KAFKA-10638: Fix QueryableStateIntegrationTest (#9521)
This test has been observed to have flaky failures.
Apparently, in the failed runs, Streams had entered a rebalance
before some of the assertions were made. We recently made
IQ a little stricter on whether it would return errors instead of
null responses in such cases:
KAFKA-10598: Improve IQ name and type checks (#9408)

As a result, we have started seeing failures now instead of
silently executing an invalid test (I.e., it was asserting the
return to be null, but the result was null for the wrong
reason).

Now, if the test discovers that Streams is no longer running,
it will repeat the verification until it actually gets a valid
positive or negative result.

Reviewers: Chia-Ping Tsai <chia7712@apache.org>
2020-10-29 11:57:31 -05:00
Rohan 35deb2f7c4
MINOR: call super.close() when closing RocksDB options (#9498)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-10-28 11:29:12 -07:00
Guozhang Wang f1a7097ccd
KAFKA-10616: Always call prepare-commit before suspending for active tasks (#9464)
Today for active tasks we the following active task suspension:

1. closeAndRevive in handleTaskCorruption.
2. closeClean in assignor#onAssignment.
3. closeClean in shutdown.
4. closeDirty in assignor#onAssignment.
5. closeDirty in listener#onPartitionsLost.
6. closeDirty in shutdown.
7. suspend in listener#onPartitionsRevoked.

Among those, 1/4/5/6 do not call prepareCommit which would stateManager#flushCache and may cause illegal state manager. This PR would require a prepareCommit triggered before suspend.

Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>
2020-10-26 14:24:05 -07:00
John Roesler 58bd0a6ee3
MINOR: TopologyTestDriver should not require dummy parameters (#9477)
TopologyTestDriver comes with a paper cut that it passes through a
config requirement that application.id and bootstrap.servers must be
configured. But these configs are not required in the context of
TopologyTestDriver specifically. This change relaxes the requirement.

Reviewers: Boyang Chen <boyang@apache.org>, Matthias J. Sax <mjsax@apache.org>
2020-10-22 08:19:01 -05:00
A. Sophie Blee-Goldman 5dc94b1ff4
MINOR: distinguish between missing source topics and internal assignment errors (#9446)
Introduce an ASSIGNMENT_ERROR code to distinguish from INCOMPLETE_SOURCE_TOPIC_METADATA and shut down all members in case of an unexpected exception during task assignment.

Reviewers: Matthias J. Sax <mjsax@apache.org>,  John Roesler <vvcephei@apache.org>
2020-10-21 09:06:25 -07:00
Justine Olshan 67bc4f08fe
KAFKA-10618: Add UUID class, use in protocols (part of KIP-516) (#9454)
In order to support topic IDs, we need to create a public UUID class. This class will be used in protocols. This PR creates the class, modifies code to use the class in the message protocol and changes the code surrounding the existing messages/json that used the old UUID class.

SimpleExampleMessage was used only for testing, so all usages of UUID have been switched to the new class.

SubscriptionInfoData uses UUID for processId extensively. It also utilizes java.util.UUID implementation of Comparable so that UUIDs can be ordered. This functionality was not necessary for the UUIDs used for topic IDs converted to java.util.UUID on the boundary of SubscriptionInfoData. Sorting was used only for testing, though, so this still may be changed.

Also added tests for the methods of the new UUID class. The existing SimpleExampleMessage tests should be sufficient for testing the new UUID class in message protocols.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2020-10-21 10:17:12 +01:00
Michael Bingham c3c65de3f0
KAFKA-10564: fix flaky test (#9466)
Minor update to fix flaky state directory test by advancing the MockTime.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, A. Sophie Blee-Goldman <ableegoldman@apache.org>
2020-10-20 19:16:23 -07:00
Bill Bejeck 50a5671135
MINOR: Clean-up streams javadoc warnings (#9461)
Reviewers: Matthias J. Sax <mjsax@apache.org>,  John Roesler <john@confluent.io>
2020-10-20 19:30:49 -04:00
Levani Kokhreidze d87cd00f5a
KAFKA-10454 / Update copartitionSourceGroups when optimization algorithm is triggered (#9237)
Fix KAFKA-10454 bug

Main issue was that when optimization algorithm was removing repartition nodes, corresponding copartitionSourceGroups was never updated. As a result, copartition enforcer wasn't able to do the checks and set proper number of partitions.

Test ensures that whenever optimization is set, changelog topic for the table is not created. And whenever optimization is turned off, appropriate changelog topic for the table is created.

Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
2020-10-20 18:39:37 -04:00
Thorsten Hake bb47519a42
KAFKA-10515: Properly initialize nullable Serdes with default values (#9338)
Also introduced the notion of WrappingNullableSerdes (aligned to the concept
of WrappingNullableSerializer and WrappingNullableDeserializer) and centralized
initialization in WrappingNullables.

The added integeration test KTableKTableForeignKeyJoinDistributedTest tests
whether all serdes are now correctly set on all stream clients.

Reviewers: John Roesler <vvcephei@apache.org>
2020-10-20 17:03:39 -05:00
John Roesler 659b05f78a
KAFKA-10605: Deprecate old PAPI registration methods (#9448)
Add deprecation annotations to the methods replaced in KIP-478.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2020-10-19 15:29:27 -05:00
leah 4d14d6a96c
KAFKA-10455: Ensure that probing rebalances always occur (#9383)
Add dummy data to subscriptionUserData to make sure that
it is different each time a member rejoins.

Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>, John Roesler <vvcephei@apache.org>
2020-10-19 13:29:35 -05:00
Matthias J. Sax aef6cd6e99
KAFKA-9274: Add timeout handling for state restore and StandbyTasks (#9368)
* Part of KIP-572
* If a TimeoutException happens during restore of active tasks, or updating standby tasks, we need to trigger task.timeout.ms timeout.

Reviewers: John Roesler <john@confluent.io>
2020-10-19 11:07:56 -07:00
Matthias J. Sax e8ad80ebe1
MINOR: remove explicit passing of AdminClient into StreamsPartitionAssignor (#9384)
Currently, we pass multiple object reference (AdminClient,TaskManager, and a few more) into StreamsPartitionAssignor. Furthermore, we (miss)use TaskManager#mainConsumer() to get access to the main consumer (we need to do this, to avoid a cyclic dependency).

This PR unifies how object references are passed into a single ReferenceContainer class to
 - not "miss use" the TaskManager as reference container
 - unify how object references are passes

Note: we need to use a reference container to avoid cyclic dependencies, instead of using a config for each passed reference individually.

Reviewers: John Roesler <john@confluent.io>
2020-10-15 16:10:27 -07:00
vamossagar12 a85802faa1
KAFKA-10559: Not letting TimeoutException shutdown the app during internal topic validation (#9432)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-10-15 15:10:21 -07:00
Andy Coates 40ad4fe0ae
KAFKA-10494: Eager handling of sending old values (#9415)
Nodes that are materialized should not forward requests to `enableSendingOldValues` to parent nodes, as they themselves can handle fulfilling this request. However, some instances of `KTableProcessorSupplier` were still forwarding requests to parent nodes, which was causing unnecessary materialization of table sources.

The following instances of `KTableProcessorSupplier` have been updated to not forward `enableSendingOldValues` to parent nodes if they themselves are materialized and can handle sending old values downstream:

 * `KTableFilter`
 * `KTableMapValues`
 * `KTableTransformValues`

Other instances of `KTableProcessorSupplier` have not be modified for reasons given below:
 * `KTableSuppressProcessorSupplier`: though it has a `storeName` field, it didn't seem right for this to handle sending old values itself. Its only job is to suppress output.
 * `KTableKTableAbstractJoin`: doesn't have a store name, i.e. it is never materialized, so can't handle the call itself.
 * `KTableKTableJoinMerger`: table-table joins already have materialized sources, which are sending old values. It would be an unnecessary performance hit to have this class do a lookup to retrieve the old value from its store.
 * `KTableReduce`: is always materialized and already handling the call without forwarding
 * `KTableAggregate`: is always materialized and already handling the call without forwarding

Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-10-13 11:19:05 -07:00
John Roesler 27b0e35e7a
KAFKA-10437: Implement new PAPI support for test-utils (#9396)
Implements KIP-478 for the test-utils module:
* adds mocks of the new ProcessorContext and StateStoreContext
* adds tests that all stores and store builders are usable with the new mock
* adds tests that the new Processor api is usable with the new mock
* updates the demonstration Processor to the new api

Reviewers: Guozhang Wang <guozhang@apache.org>
2020-10-13 11:15:22 -05:00
John Roesler 1f8ac6e6fe
KAFKA-10598: Improve IQ name and type checks (#9408)
Previously, we would throw a confusing error, "the store has migrated,"
when users ask for a store that is not in the topology at all, or when the
type of the store doesn't match the QueryableStoreType parameter.

Adds an up-front check that the requested store is registered and also
a better error message when the QueryableStoreType parameter
doesn't match the store's type.

Reviewers: Guozhang Wang <guozhang@apache.org>
2020-10-12 09:34:32 -05:00
Xavier Léauté 7947c18b57 MINOR update comments and docs to be gender-neutral
While this is not technically part of KIP-629, I believe this makes our codebase more inclusive as well.

cc gwenshap

Author: Xavier Léauté <xvrl@apache.org>

Reviewers: Gwen Shapira

Closes #9398 from xvrl/neutral-term
2020-10-08 17:05:15 -07:00
voffcheg109 5fc3f73f08
KAFKA-7334: Suggest changing config for state.dir in case of FileNotFoundException (#9380)
Add additional warning logs and improve existing log messages for `FileNotFoundException` and if /tmp is used as state directory.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-10-08 12:20:21 -07:00
Dima Reznik cc54000e72
KAFKA-10271: Performance regression while fetching a key from a single partition (#9020)
StreamThreadStateStoreProvider excessive loop over calling internalTopologyBuilder.topicGroups(), which is synchronized, thus causing significant performance degradation to the caller, especially when store has many partitions.

Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2020-10-08 10:12:33 -07:00
Jorge Esteban Quilcate Otoya d0e6943bdd
KAFKA-9929: Support backward iterator on SessionStore (#9139)
Implements KIP-617 for `SessionStore`

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>
2020-10-08 08:08:24 -05:00
bill 4d3036bb4e Updating trunk versions after cutting branch for 2.7 2020-10-08 07:47:36 -04:00
John Roesler 2804257fe2
KAFKA-10562: Properly invoke new StateStoreContext init (#9388)
* all wrapping stores should pass StateStoreContext init through to the same
  method on the wrapped store and not translate it to ProcessorContext init
* base-level stores should handle StateStoreContext init so that callers passing
  a non-InternalProcessorContext implementation will be able to initialize the store
* extra tests are added to verify the desired behavior

Reviewers: Guozhang Wang <guozhang@apache.org>
2020-10-07 23:06:53 -05:00
Lee Dongjin 8d4bbf22ad
MINOR: trivial cleanups, javadoc errors, omitted StateStore tests, etc. (#8130)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-10-07 19:08:31 -07:00
Michael Bingham 250c71b532
KAFKA-10564: only process non-empty task directories when internally cleaning obsolete state stores (#9373)
Avoid continuous repeated logging by not trying to clean empty task directories, which are longer fully deleted during internal cleanup as of https://issues.apache.org/jira/browse/KAFKA-6647.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-10-07 15:48:35 -07:00
Matthias J. Sax 65c29a9dec
KAFKA-9274: fix incorrect default value for `task.timeout.ms` config (#9385)
- part of KIP-572
 - also add handler method to trigger/reset the timeout on a task

Reviewer: John Roesler <john@confluent.io>
2020-10-07 12:17:47 -07:00
Sharath Bhat a8b5f5a430
KAFKA-10362: When resuming Streams active task with EOS, the checkpoint file is deleted (#9247)
Deleted the checkpoint file before the transition from SUSPENDED state to RESTORING state

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-10-07 11:20:06 -07:00
Jorge Esteban Quilcate Otoya 5a5d1c34b9
KAFKA-9929: fix: add missing default implementations (#9321)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-10-06 11:21:23 -07:00
Guozhang Wang 53a35c1de3
MINOR: Refactor unit tests around RocksDBConfigSetter (#9358)
* Extract the mock RocksDBConfigSetter into a separate class.
* De-dup unit tests covering RocksDBConfigSetter.

Reviewers: Boyang Chen <boyang@confluent.io>
2020-10-06 09:09:54 -07:00
John Roesler 69790a1463
KAFKA-10535: Split ProcessorContext into Processor/StateStore/Record Contexts (#9361)
Migrate different components of the old ProcessorContext interface
into separate interfaces that are more appropriate for their usages.
See KIP-478 for the details.

Reviewers: Guozhang Wang <guozhang@apache.org>, Paul Whalen <pgwhalen@gmail.com>
2020-10-02 18:49:12 -05:00
leah 95986a8f48
MINOR: Fix KStreamKTableJoinTest and StreamTaskTest (#9357)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-09-30 20:07:23 -07:00
Matthias J. Sax a15387f34d
KAFKA-9274: Revert deprecation of `retries` for producer and admin clients (#9333)
Reviewer: John Roesler <john@confluent.io>
2020-09-30 12:13:34 -07:00
Igor Soarez d1c82a9baf
KAFKA-10205: Documentation and handling of non deterministic Topologies (#9064)
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-09-30 12:06:11 -07:00
JoelWee 6e0a10b41a
KAFKA-10277: Allow null keys with non-null mappedKey in KStreamKGlobalTable join (#9186)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-09-30 11:59:11 -07:00
Luke Chen e1ab1433ad
KAFKA-10351: Add tests for IOExceptions for GlobalStateManagerImpl/OffsetCheckpoint (#9121)
Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-09-30 11:26:48 -07:00
manijndl7 2fda4458b4
KAFKA-6585: Consolidate duplicated logic on reset tools (#9255)
Reviewers: Navinder Pal Singh Brar <navinder_brar@yahoo.com>, Matthias J. Sax <matthias@confluent.io>
2020-09-30 10:13:01 -07:00
Micah Paul Ramos 340cd07bab
KAFKA-9584: Fix Headers ConcurrentModificationException in Streams (#8181)
Avoid forwarding a shared reference to the record context in punctuate calls.
Note, this fix isn't airtight, since all processors triggered by a single punctuate
call will still see the same reference to the record context. It's also not a terribly
principled approach, since the context is still technically not defined, but this
is about the best we can do without significant refactoring. We will probably
follow up with a more comprehensive solution, but this should avoid the issue
for most programs.

Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <vvcephei@apache.org>
2020-09-28 14:56:36 -05:00
A. Sophie Blee-Goldman bd462df203
MINOR: standardize rebalance related logging for easy discovery & debugging (#9295)
Some minor logging adjustments to standardize the grammar of rebalance related messages and make it easy to query the logs for quick debugging results

Guozhang Wang <wangguoz@gmail.com>
2020-09-25 20:29:17 -07:00
Andy Coates dc81d442df
KAFKA-10077: Filter downstream of state-store results in spurious tombstones (#9156)
Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-09-25 13:58:04 -07:00