kafka

Commit Graph

Author	SHA1	Message	Date
John Roesler	46df7ee97c	MINOR: Add extra notice about IQv2 compatibility (#11944 ) Added an extra notice about IQv2's API compatibility, as discussed in the KIP-796 vote thread. Reviewers: Bill Bejeck <bbejeck@apache.org>, @Kvicii	2022-03-24 14:04:40 -05:00
Rohan	01533e3dd7	KAFKA-13692: include metadata wait time in total blocked time (#11805 ) This patch includes metadata wait time in total blocked time. First, this patch adds a new metric for total producer time spent waiting on metadata, called metadata-wait-time-ms-total. Then, this time is included in the total blocked time computed from StreamsProducer. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-24 09:55:26 -07:00
John Roesler	322a065b90	KAFKA-13714: Fix cache flush position (#11926 ) The caching store layers were passing down writes into lower store layers upon eviction, but not setting the context to the evicted records' context. Instead, the context was from whatever unrelated record was being processed at the time. Reviewers: Matthias J. Sax <mjsax@apache.org>	2022-03-23 22:09:05 -05:00
Hao Li	a3adf41d8b	[Emit final][4/N] add time ordered store factory (#11892 ) Add factory to create time ordered store supplier. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-22 20:53:53 -07:00
vamossagar12	0924fd3f9f	KAFKA-13152: Replace "buffered.records.per.partition" with "input.buffer.max.bytes" (#11796 ) Implements KIP-770 Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-21 17:16:00 -07:00
Ludovic DEHON	df963ee0a9	MINOR: Fix incorrect log for out-of-order KTable (#11905 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-03-18 10:00:03 +08:00
Luke Chen	fbe7fb9411	KAFKA-9847: add config to set default store type (KIP-591) (#11705 ) Reviewers: Hao Li <hli@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>	2022-03-17 10:19:42 +08:00
Levani Kokhreidze	b68463c250	KAFKA-6718 / Add rack awareness configurations to StreamsConfig (#11837 ) This PR is part of KIP-708 and adds rack aware standby task assignment logic. Rack aware standby task assignment won't be functional until all parts of this KIP gets merged. Splitting PRs into three smaller PRs to make the review process easier to follow. Overall plan is the following: ⏭️ Rack aware standby task assignment logic #10851 ⏭️ Protocol change, add clientTags to SubscriptionInfoData #10802 👉 Add required configurations to StreamsConfig (public API change, at this point we should have full functionality) This PR implements last point of the above mentioned plan. Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>	2022-03-16 18:02:24 +01:00
Nick Telford	9e8ace0809	KAFKA-13549: Add repartition.purge.interval.ms (#11610 ) Implements KIP-811. Add a new config `repartition.purge.interval.ms` that limits how often data is purged from repartition topics.	2022-03-15 15:55:20 -07:00
Walker Carlson	f708dc58ed	MINOR: fix shouldWaitForMissingInputTopicsToBeCreated test (#11902 ) This test was falling occasionally. It does appear to be a matter of the tests assuming perfecting deduplication/caching when asserting the test output records, ie a bug in the test not in the real code. Since we are not assuming that it is going to be perfect I changed the test to make sure the records we expect arrive, instead of only those arrive. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-15 13:54:48 -07:00
Matthias J. Sax	03411ca28b	KAFKA-13721: asymetric join-winodws should not emit spurious left/outer join results (#11875 ) Reviewers: Sergio Peña <sergio@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2022-03-15 09:37:01 -07:00
Guozhang Wang	b916cb40bd	KAFKA-13690: Fix flaky test in EosIntegrationTest (#11887 ) I found a couple of flakiness with the integration test. IQv1 on stores failed although getting the store itself is covered with timeouts, since the InvalidStoreException is upon the query (store.all()). I changed to the util function with IQv2 whose timeout/retry covers the whole procedure. Example of such failure is: https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-11802/11/tests/ With ALOS we should not check that the output, as well as the state store content is exactly as of processed once, since it is possible that during processing we got spurious task-migrate exceptions and re-processed with duplicates. I actually cannot reproduce this error locally, but from the jenkins errors it seems possible indeed. Example of such failure is: https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-11433/4/tests/ Some minor cleanups. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>	2022-03-14 15:42:10 -07:00
Matthias J. Sax	b1f36360ed	KAKFA-13699: new ProcessorContext is missing methods (#11877 ) We added `currentSystemTimeMs()` and `currentStreamTimeMs()` to the `ProcessorContext` via KIP-622, but forgot to add both to the new `api.ProcessorContext`. Reviewers: Ricardo Brasil <anribrasil@gmail.com>, Guozhang Wang <guozhang@confluent.io>	2022-03-14 09:22:01 -07:00
Hao Li	63ea5db9ec	KIP-825: Part 1, add new RocksDBTimeOrderedWindowStore (#11802 ) Initial State store implementation for TimedWindow and SlidingWindow. RocksDBTimeOrderedWindowStore.java contains one RocksDBTimeOrderedSegmentedBytesStore which contains index and base schema. PrefixedWindowKeySchemas.java implements keyschema for time ordered base store and key ordered index store. Reviewers: James Hughes, Guozhang Wang <wangguoz@gmail.com>	2022-03-11 17:51:10 -08:00
Hao Li	17988f4710	MINOR: fix flaky EosIntegrationTest.shouldCommitCorrectOffsetIfInputTopicIsTransactional[at_least_once] (#11878 ) In this test, we started Kafka Streams app and then write to input topic in transaction. It's possible when streams commit offset, transaction hasn't finished yet. So the streams committed offset could be less than the eventual endOffset. This PR moves the logic of writing to input topic before starting streams app. Reviewers: John Roesler <vvcephei@apache.org>	2022-03-11 12:01:46 -06:00
Levani Kokhreidze	87eb0cf03c	KAFKA-6718: Update SubscriptionInfoData with clientTags (#10802 ) adds ClientTags to SubscriptionInfoData Reviewer: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>	2022-03-11 16:29:05 +08:00
Walker Carlson	4d5a28973f	Revert "KAFKA-13542: add rebalance reason in Kafka Streams (#11804 )" (#11873 ) This reverts commit `2ccc834faa`. This reverts commit `2ccc834`. We were seeing serious regressions in our state heavy benchmarks. We saw that our state heavy benchmarks were experiencing a really bad regression. The State heavy benchmarks runs with rolling bounces with 10 nodes. We regularly saw this exception: java.lang.OutOfMemoryError: Java heap space I ran through a git bisect and found this commit. We verified that the commit right before did not have the same issues as this one did. I then reverted the problematic commit and ran the benchmarks again on this commit and did not see any more issues. We are still looking into the root cause, but for now since this isn't a critical improvement so we can remove it temporarily. Reviewers: Bruno Cadonna <cadonna@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, David Jacot <djacot@confluent.io>, Ismael Juma <ismael@confluent.io>	2022-03-10 13:52:05 -08:00
A. Sophie Blee-Goldman	113595cf5c	KAFKA-12648: fix flaky #shouldAddToEmptyInitialTopologyRemoveResetOffsetsThenAddSameNamedTopologyWithRepartitioning (#11868 ) This test has started to become flaky at a relatively low, but consistently reproducible, rate. Upon inspection, we find this is due to IOExceptions during the #cleanUpNamedTopology call -- specifically, most often a DirectoryNotEmptyException with an ocasional FileNotFoundException Basically, signs pointed to having returned from/completed the #removeNamedTopology future prematurely, and moving on to try and clear out the topology's state directory while there was a streamthread somewhere that was continuing to process/close its tasks. I believe this is due to updating the thread's topology version before we perform the actual topology update, in this case specifically the act of eg clearing out a directory. If one thread updates its version and then goes to perform the topology removal/cleanup when the second thread finishes its own topology removal, this other thread will check whether all threads are on the latest version and complete any waiting futures if so -- which means it can complete the future before the first thread has actually completed the corresponding action Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-10 12:02:07 -08:00
A. Sophie Blee-Goldman	9c7d857713	KAFKA-12648: fix #getMinThreadVersion and include IOException + topologyName in StreamsException when topology dir cleanup fails (#11867 ) Quick fix to make sure we log the actual source of the failure both in the actual log message as well as the StreamsException that we bubble up to the user's exception handler, and also to report the offending topology by filling in the StreamsException's taskId field. Also prevents a NoSuchElementException from being thrown when trying to compute the minimum topology version across all threads when the last thread is being unregistered during shutdown. Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-09 16:30:42 -08:00
John Roesler	717f9e2149	MINOR: Restructure ConsistencyVectorIntegrationTest (#11848 ) Reviewers: YEONCHEOL JANG <@YeonCheolGit>, Matthias J. Sax <mjsax@apache.org>	2022-03-08 13:59:58 -06:00
John Roesler	10f34ce6b3	MINOR: Clarify acceptable recovery lag config doc (#11411 ) Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>, Andrew Eugene Choi < @andrewchoi5 >	2022-03-08 10:42:36 -06:00
A. Sophie Blee-Goldman	fc7133d52d	KAFKA-12648: fix bug where thread is re-added to TopologyMetadata when shutting down (#11857 ) We used to call TopologyMetadata#maybeNotifyTopologyVersionWaitersAndUpdateThreadsTopologyVersion when a thread was being unregistered/shutting down, to check if any of the futures listening for topology updates had been waiting on this thread and could be completed. Prior to invoking this we make sure to remove the current thread from the TopologyMetadata's threadVersions map, but this thread is actually then re-added in the #maybeNotifyTopologyVersionWaitersAndUpdateThreadsTopologyVersion call. To fix this, we should break up this method into separate calls for each of its two distinct functions, updating the version and checking for topology update completion. When unregistering a thread, we should only invoke the latter method Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-07 23:59:43 -08:00
A. Sophie Blee-Goldman	539f006e65	KAFKA-12648: fix NPE due to race condtion between resetting offsets and removing a topology (#11847 ) While debugging the flaky NamedTopologyIntegrationTest. shouldRemoveOneNamedTopologyWhileAnotherContinuesProcessing test, I did discover one real bug. The problem was that we update the TopologyMetadata's builders map (with the known topologies) inside the #removeNamedTopology call directly, whereas the StreamThread may not yet have reached the poll() in the loop and in case of an offset reset, we get an NP.e I changed the NPE to just log a warning for now, going forward I think we should try to tackle some tech debt by keeping the processing tasks and the TopologyMetadata in sync Also includes a quick fix on the side where we were re-adding the topology waiter/KafkaFuture for a thread being shut down Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-07 11:09:18 -08:00
Tim Patterson	e3ef29ea03	KAFKA-12959: Distribute standby and active tasks across threads to better balance load between threads (#11493 ) Balance standby and active stateful tasks evenly across threads Reviewer: Luke Chen <showuon@gmail.com>	2022-03-05 16:11:42 +08:00
A. Sophie Blee-Goldman	11143d4883	MINOR: fix flaky shouldRemoveOneNamedTopologyWhileAnotherContinuesProcessing (#11827 ) This test has been failing somewhat regularly due to going into the ERROR state before reaching RUNNING during the startup phase. The problem is that we are reusing the DELAYED_INPUT_STREAM topics, which had previously been assumed to be uniquely owned by a particular test. We should make sure to delete and re-create these topics for any test that uses them.	2022-03-04 10:31:37 -08:00
A. Sophie Blee-Goldman	6f54faed2d	KAFKA-12648: fix #add/removeNamedTopology blocking behavior when app is in CREATED (#11813 ) Currently the #add/removeNamedTopology APIs behave a little wonky when the application is still in CREATED. Since adding and removing topologies runs some validation steps there is valid reason to want to add or remove a topology on a dummy app that you don't plan to start, or a real app that you haven't started yet. But to actually check the results of the validation you need to call get() on the future, so we need to make sure that get() won't block forever in the case of no failure -- as is currently the case Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-04 09:58:56 -08:00
Levani Kokhreidze	62e646619b	KAFKA-6718 / Rack aware standby task assignor (#10851 ) This PR is part of KIP-708 and adds rack aware standby task assignment logic. Reviewer: Bruno Cadonna <cadonna@apache.org>, Luke Chen <showuon@gmail.com>, Vladimir Sitnikov <vladimirsitnikov.apache.org>	2022-03-03 11:37:26 +08:00
A. Sophie Blee-Goldman	f089bea7ed	MINOR: set log4j.logger.kafka and all Config logger levels to ERROR for Streams tests (#11823 ) Pretty much any time we have an integration test failure that's flaky or only exposed when running on Jenkins through the PR builds, it's impossible to debug if it cannot be reproduced locally as the logs attached to the test results have truncated the entire useful part of the logs. This is due to the logs being flooded at the beginning of the test when the Kafka cluster is coming up, eating up all of the allotted characters before we even get to the actual Streams test. Setting log4j.logger.kafka to ERROR greatly improves the situation and cuts down on most of the excessive logging in my local runs. To improve things even more and have some hope of getting the part of the logs we actually need, I also set the loggers for all of the Config objects to ERROR, as these print out the value of every single config (of which there are a lot) and are not useful as we can easily figure out what the configs were if necessary by just inspecting the test locally. Reviewers: Luke Chen <showuon@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2022-03-01 21:58:10 -08:00
John Roesler	7172f35807	MINOR: Improve test assertions for IQv2 (#11828 ) Reviewer: Bill Bejeck <bbejeck@apache.org>	2022-03-01 20:30:29 -06:00
A. Sophie Blee-Goldman	84f8c90b13	KAFKA-12648: standardize startup timeout to fix some flaky NamedTopologyIntegrationTest tests (#11824 ) Seen a few of the new tests added fail on PR builds lately with "java.lang.AssertionError: Expected all streams instances in [org.apache.kafka.streams.processor.internals.namedtopology.KafkaStreamsNamedTopologyWrapper@7fb3e6b0] to be RUNNING within 30000 ms" We already had some tests using the 30s timeout while others were bumped all the way up to 60s, I figured we should try out a default timeout of 45s and if we still see failures in specific tests we can go from there	2022-03-01 13:15:53 -08:00
A. Sophie Blee-Goldman	6eb57f6df1	KAFKA-12738: address minor followup and consolidate integration tests of PR #11787 (#11812 ) This PR addresses the remaining nits from the final review of #11787 It also deletes two integration test classes which had only one test in them, and moves the tests to another test class file to save on the time to bring up an entire embedded kafka cluster just for a single run Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-01 12:59:18 -08:00
Hao Li	2ccc834faa	KAFKA-13542: add rebalance reason in Kafka Streams (#11804 ) Add rebalance reason in Kafka Streams. Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>	2022-02-28 18:26:46 +01:00
Walker Carlson	abb74d406a	KAFKA-13281: allow #removeNamedTopology while in the CREATED state (#11810 ) We should be able to change the topologies while still in the CREATED state. We already allow adding them, but this should include removing them as well Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2022-02-25 19:11:06 -08:00
Walker Carlson	29317e6953	KAFKA-13281: add API to expose current NamedTopology set (#11808 ) List all the named topologies that have been added to this client Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2022-02-25 19:04:07 -08:00
A. Sophie Blee-Goldman	c2ee1411c8	KAFKA-12738: send LeaveGroup request when thread dies to optimize replacement time (#11801 ) Quick followup to #11787 to optimize the impact of the task backoff by reducing the time to replace a thread. When a thread is going through a dirty close, ie shutting down from an uncaught exception, we should be sending a LeaveGroup request to make sure the broker acknowledges the thread has died and won't wait up to the `session.timeout` for it to join the group if the user opts to `REPLACE_THREAD` in the handler Reviewers: Walker Carlson <wcarlson@confluent.io>, John Roesler <vvcephei@apache.org>	2022-02-24 16:18:13 -08:00
A. Sophie Blee-Goldman	cd4a1cb410	KAFKA-12738: track processing errors and implement constant-time task backoff (#11787 ) Part 1 in the initial series of error handling for named topologies. *Part 1: Track tasks with errors within a named topology & implement constant-time based task backoff Part 2: Implement exponential task backoff to account for recurring errors Part 3: Pause/backoff all tasks within a named topology in case of a long backoff/frequent errors for any individual task Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-02-24 12:10:31 -08:00
Bruno Cadonna	8d88b20b27	KAFKA-10199: Add interface for state updater (#11499 ) Reviewers: Andrew Eugene Choi <andrew.choi@uwaterloo.ca>, Guozhang Wang <wangguoz@gmail.com>	2022-02-23 10:13:08 -08:00
Walker Carlson	d8cf47bf28	KAFKA-13676: Commit successfully processed tasks on error (#11791 ) When we hit an exception when processing tasks we should save the work we have done so far. This will only be relevant with ALOS and EOS-v1, not EOS-v2. It will actually reduce the number of duplicated record in ALOS because we will not be successfully processing tasks successfully more than once in many cases. This is currently enabled only for named topologies. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang@confluent.io>	2022-02-22 23:10:05 -08:00
Rob Leland	06ca4850c5	KAFKA-13666 Don't Only ignore test exceptions for windows OS for certain tests. (#11752 ) Tests are swallowing exceptions for supported operating systems, which could hide regressions. Co-authored-by: Rob Leland <rleland@apache.org> Reviewer: Bruno Cadonna <cadonna@apache.org>	2022-02-18 14:49:03 +01:00
A. Sophie Blee-Goldman	4c23e47bd5	MINOR: move non-management methods from TaskManager to Task Executor (#11738 ) Basic refactoring with no logical changes to lay the groundwork & facilitate reviews for error handling work. This PR just moves all methods that go beyond the management of tasks into a new TaskExecutor class, such as processing, committing, and punctuating. This breaks up the ever-growing TaskManager class so it can focus on the tracking and updating of the tasks themselves, while the TaskExecutor can focus on the actual processing. In addition to cleaning up this code this should make it easier to test this part of the code. Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-02-18 00:39:41 -08:00
Bruno Cadonna	333278d9bb	MINOR: Add actual state directory to related exceptions (#11751 ) For debugging it is useful to see the actual state directory when an exception regarding the state directory is thrown. Reviewer: Bill Bejeck <bbejeck@apache.org>	2022-02-16 20:32:00 +01:00
Matthias J. Sax	c012fc411c	MINOR: improve JavaDocs for ReadOnlySessionStore (#11759 ) Reviewer: Guozhang Wang <guozhang@confluent.io>	2022-02-16 08:40:47 -08:00
A. Sophie Blee-Goldman	fdb98df839	KAFKA-12648: avoid modifying state until NamedTopology has passed validation (#11750 ) Previously we were only verifying the new query could be added after we had already inserted it into the TopologyMetadata, so we need to move the validation upfront. Also adds a test case for this and improves handling of NPE in case of future or undiscovered bugs. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-02-15 13:06:54 -08:00
Jorge Esteban Quilcate Otoya	99310360a5	KAFKA-12939: After migrating processors, search the codebase for missed migrations (#11534 ) Migrated internal usages that had previously been marked with TODO suppressions. Reviewer: John Roesler<vvcephei@apache.org>	2022-02-11 22:25:03 -06:00
Jonathan Albrecht	ec05f90a3d	KAFKA-13599: Upgrade RocksDB to 6.27.3 (#11690 ) RocksDB v6.27.3 has been released and it is the first release to support s390x. RocksDB is currently the only dependency in gradle/dependencies.gradle without s390x support. RocksDB v6.27.3 has added some new options that require an update to streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapter.java but no other changes are needed to upgrade. I have run the unit/integration tests locally on s390x and also the :streams tests on x86_64 and they pass. Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>	2022-02-02 10:56:14 +01:00
Matthias J. Sax	67cf187603	Revert "KAFKA-13152: Replace "buffered.records.per.partition" with "input.buffer.max.bytes" (#11424 )" This reverts commit `14c6030c6a`. Reason: Implemenation breaks backward compatibility	2022-02-01 14:08:11 -08:00
kurtostfeld	830d83e2cd	MINOR: Fix typo "Exsiting" -> "Existing" (#11547 ) Co-authored-by: Kurt Ostfeld <kurt@samba.tv> Reviewers: Kvicii <Karonazaba@gmail.com>, Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2022-02-01 11:09:04 +01:00
vamossagar12	14c6030c6a	KAFKA-13152: Replace "buffered.records.per.partition" with "input.buffer.max.bytes" (#11424 ) This PR is an implementation of: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390. The following changes have been made: * Adding a new config input.buffer.max.bytes applicable at a topology level. * Adding new config statestore.cache.max.bytes. * Adding new metric called input-buffer-bytes-total. * The per partition config buffered.records.per.partition is deprecated. * The config cache.max.bytes.buffering is deprecated. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang@confluent.io>	2022-01-27 21:19:04 -08:00
Matthias J. Sax	af377b5f30	KAFKA-13423: GlobalThread should not log ERROR on clean shutdown (#11455 ) Reviewers: Guozhang Wang <guozhang@confluent.io>, Bruno Cadonna <cadonna@confluent.io>	2022-01-27 20:40:43 -08:00
Patrick Stuedi	1a21892663	KAFKA-13605: checkpoint position in state stores (#11676 ) There are cases in which a state store neither has an in-memory position built up nor has it gone through the state restoration process. If a store is persistent (i.e., RocksDB), and we stop and restart Streams, we will have neither of those continuity mechanisms available. This patch: * adds a test to verify that all stores correctly recover their position after a restart * implements storage and recovery of the position for persistent stores alongside on-disk state Reviewers: Vicky Papavasileiou <vpapavasileiou@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <guozhang@apache.org>, John Roesler <vvcephei@apache.org>	2022-01-27 09:25:04 -06:00
Vicky Papavasileiou	fe72187cb1	KAFKA-13524: Add IQv2 query handling to the caching layer (#11682 ) Currently, IQv2 forwards all queries to the underlying store. We add this bypass to allow handling of key queries in the cache. If a key exists in the cache, it will get answered from there. As part of this PR, we realized we need access to the position of the underlying stores. So, I added the method getPosition to the public API and ensured all state stores implement it. Only the "leaf" stores (Rocks, InMemory) have an actual position, all wrapping stores access their wrapped store's position. Reviewers: Patrick Stuedi <pstuedi@apache.org>, John Roesler <vvcephei@apache.org>	2022-01-26 09:36:39 -06:00
Vicky Papavasileiou	868cbcb8e5	MINOR: Fix bug of empty position in windowed and session stores #11713 Reviewers: John Roesler <vvcephei@apache.org>	2022-01-25 13:46:20 -06:00
John Roesler	96fa468106	MINOR: fix NPE in iqv2 (#11702 ) There is a brief window between when the store is registered and when it is initialized when it might handle a query, but there is no context. We treat this condition just like a store that hasn't caught up to the desired position yet. Reviewers: Guozhang Wang <guozhang@apache.org>, Matthias J. Sax <mjsax@apache.org>, A. Sophie Blee-Goldman <ableegoldman@apache.org>, Patrick Stuedi <pstuedi@apache.org>	2022-01-25 13:23:46 -06:00
A. Sophie Blee-Goldman	9d602a01be	KAFKA-12648: invoke exception handler for MissingSourceTopicException with named topologies (#11686 ) Followup to #11600 to invoke the streams exception handler on the MissingSourceTopicException, without killing/replacing the thread Reviewers: Guozhang Wang <guozhang@confluent.io>, Bruno Cadonna <cadonna@confluent.io>	2022-01-25 10:37:35 -08:00
A. Sophie Blee-Goldman	265d3199ec	KAFKA-12648: fixes for query APIs with named topologies (#11609 ) Fixes some issues with the NamedTopology version of the IQ methods that accept a topologyName argument, and adds tests for all. Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-01-25 05:49:23 -08:00
Sayantanu Dey	d13d09fb68	KAFKA-13590:rename InternalTopologyBuilder#topicGroups (#11704 ) Renamed the often confusing and opaque #topicGroups API to #subtopologyToTopicsInfo Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2022-01-24 21:03:37 -08:00
Aleksandr Sorokoumov	7d9b9847f1	KAFKA-6502: Update consumed offsets on corrupted records. (#11683 ) Reviewers: Matthias J. Sax <matthias@confluent.io>	2022-01-20 09:26:38 -08:00
A. Sophie Blee-Goldman	529dde904a	KAFKA-12648: handle MissingSourceTopicException for named topologies (#11600 ) Avoid throwing a MissingSourceTopicException inside the #assign method when named topologies are used, and just remove those topologies which are missing any of their input topics from the assignment. Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Bruno Cadonna <cadonna@confluent.io>	2022-01-18 11:49:23 -08:00
Walker Carlson	c182a431d2	MINOR: prefix topics if internal config is set (#11611 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-01-10 16:08:48 -08:00
Guozhang Wang	9078451e37	MINOR: Add num threads logging upon shutdown (#11652 ) 1. Add num of threads logging upon shutdown. 2. Prefix the shutdown thread with client id. Reviewers: John Roesler <vvcephei@apache.org>	2022-01-06 11:28:27 -08:00
Richard	7567cbc857	KAFKA-13476: Increase resilience timestamp decoding Kafka Streams (#11535 ) Reviewers: Matthias J. Sax <matthias@confluent.io>	2022-01-05 21:38:10 -08:00
John Roesler	b424553101	KAFKA-13553: Add PAPI Window and Session store tests for IQv2 (#11650 ) During some recent reviews, @mjsax pointed out that StateStore layers are constructed differently the stores are added via the PAPI vs. the DSL. This PR adds PAPI construction for Window and Session stores to the IQv2StoreIntegrationTest so that we can ensure IQv2 works on every possible state store. Reviewer: Guozhang Wang <guozhang@apache.org>	2022-01-05 23:16:33 -06:00
John Roesler	7ef8701cca	KAFKA-13553: add PAPI KV store tests for IQv2 (#11624 ) During some recent reviews, @mjsax pointed out that StateStore layers are constructed differently the stores are added via the PAPI vs. the DSL. This PR adds KeyValueStore PAPI construction to the IQv2StoreIntegrationTest so that we can ensure IQv2 works on every possible state store. Reviewers: Patrick Stuedi <pstuedi@apache.org>, Guozhang Wang <guozhang@apache.org>	2022-01-05 21:04:37 -06:00
Patrick Stuedi	b8f1cf14c3	KAFKA-13494: WindowKeyQuery and WindowRangeQuery (#11567 ) Implement WindowKeyQuery and WindowRangeQuery as proposed in KIP-806 Reviewer: John Roesler <vvcephei@apache.org>	2022-01-02 22:17:38 -06:00
John Roesler	018fb88efa	KAFKA-13557: Remove swapResult from the public API (#11617 ) Follow-on from #11582 . Removes a public API method in favor of an internal utility method. Reviewer: Matthias J. Sax <mjsax@apache.org>	2021-12-20 19:04:08 -06:00
John Roesler	5747788659	KAFKA-13525: Implement KeyQuery in Streams IQv2 (#11582 ) Implement the KeyQuery as proposed in KIP-796 Reviewers: Vicky Papavasileiou <vpapavasileiou@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <guozhang@apache.org>	2021-12-20 12:22:05 -06:00
Walker Carlson	247c271353	MINOR: retry when deleting offsets for named topologies (#11604 ) When this was made I didn't expect deleteOffsetsResult to be set if an exception was thrown. But it is and to retry we need to reset it to null. Changing the KafkaStreamsNamedTopologyWrapper for remove topology when resetting offsets to retry upon GroupSubscribedToTopicException and swallow/complete upon GroupIdNotFoundException Reviewers: Anna Sophie Blee-Goldman <ableegoldman@ache.>	2021-12-16 19:39:55 -08:00
Vicky Papavasileiou	b38f6ba5cc	KAFKA-13479: Implement range and scan queries (#11598 ) Implement the RangeQuery as proposed in KIP-805 Reviewers: John Roesler <vvcephei@apache.org>	2021-12-16 11:09:01 -06:00
Walker Carlson	04787334a5	MINOR: Update log and method name in TopologyMetadata (#11589 ) Update an unclear log message and method name in TopologyMetadata Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang@confluent.io>, Luke Chen <showuon@confluent.io>	2021-12-15 19:43:40 -08:00
John Roesler	acd1f9c563	KAFKA-13522: add position tracking and bounding to IQv2 (#11581 ) * Fill in the Position response in the IQv2 result. * Enforce PositionBound in IQv2 queries. * Update integration testing approach to leverage consistent queries. Reviewers: Patrick Stuedi <pstuedi@apache.org>, Vicky Papavasileiou <vpapavasileiou@confluent.io>, Guozhang Wang <guozhang@apache.org>	2021-12-11 01:00:59 -06:00
A. Sophie Blee-Goldman	1e459271d7	KAFKA-12648: fix IllegalStateException in ClientState after removing topologies (#11591 ) Fix for one of the causes of failure in the NamedTopologyIntegrationTest: org.apache.kafka.streams.errors.StreamsException: java.lang.IllegalStateException: Must initialize prevActiveTasks from ownedPartitions before initializing remaining tasks. This exception could occur if a member sent in a subscription where all of its ownedPartitions were from a named topology that is no longer recognized by the group leader, eg because it was just removed from the client. We should filter each ClientState based on the current topology only so the assignor only processes the partitions/tasks it can identify. The member with the out-of-date tasks will eventually clean them up when the #removeNamedTopology API is invoked on them Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2021-12-10 14:26:27 -08:00
A. Sophie Blee-Goldman	d5eb3c10ec	HOTFIX: fix failing StreamsMetadataStateTest tests (#11590 ) Followup to #11562 to fix broken tests in StreamsMetadataStateTest Reviewers: Walker Carlson <wcarlson@confluent.io>	2021-12-09 16:19:56 -08:00
Vicky Papavasileiou	e1dba7af57	MINOR: Cleanup for #11513 (#11585 ) Clean up some minor things that were left over from PR #11513 Reviewer: John Roesler <vvcephei@apache.org>	2021-12-09 13:23:01 -06:00
Tamara Skokova	133b515b5e	KAFKA-13507: GlobalProcessor ignores user specified names (#11573 ) Use the name specified via consumed parameter in InternalStreamsBuilder#addGlobalStore method for initializing the source name and processor name. If not specified, the names are generated. Reviewers: Luke Chen <showuon@gmail.com>, Bill Bejeck <bbejeck@apache.org>	2021-12-09 09:42:00 -05:00
Tolga H. Dur	e20f102298	KAFKA-12648: extend IQ APIs to work with named topologies (#11562 ) In the new NamedTopology API being worked on, state store names and their uniqueness requirement is going to be scoped only to the owning topology, rather than to the entire app. In other words, two different named topologies can have different state stores with the same name. This is going to cause problems for some of the existing IQ APIs which require only a name to resolve the underlying state store. We're now going to need to take in the topology name in addition to the state store name to be able to locate the specific store a user wants to query Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2021-12-09 03:54:28 -08:00
Vicky Papavasileiou	7acd12d6e3	KAFKA-13506: Write and restore position to/from changelog (#11513 ) Introduces changelog headers to pass position information to standby and restoring stores. The headers are guarded by an internal config, which defaults to `false` for backward compatibility. Once IQv2 is finalized, we will make that flag a public config. Reviewers: Patrick Stuedi <pstuedi@apache.org>, John Roesler <vvcephei@apache.org>	2021-12-08 11:58:35 -06:00
Bruno Cadonna	68c3018a5a	MINOR: Fix internal topic manager tests (#11574 ) When the unit tests of the internal topic manager test are executed on a slow machine (like sometimes in automatic builds) they sometimes fail with a timeout exception instead of the expected exception. To fix this behavior, this commit replaces the use of system time with mock time. Reviewer: John Roesler <vvcephei@apache.org>	2021-12-07 18:23:52 +01:00
Walker Carlson	965ec40c0a	KAFKA-12648: Make changing the named topologies have a blocking option (#11479 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2021-12-03 11:32:55 -08:00
John Roesler	14c2449050	KAFKA-13491: IQv2 framework (#11557 ) Implements the major part of the IQv2 framework as proposed in KIP-796. Reviewers: Patrick Stuedi <pstuedi@apache.org>, Vicky Papavasileiou <vpapavasileiou@confluent.io>, Bruno Cadonnna <cadonna@apache.org>	2021-12-03 12:53:31 -06:00
Patrick Stuedi	62f73c30d3	KAFKA-13498: track position in remaining state stores (#11541 ) Reviewers: Vicky Papavasileiou <vpapavasileiou@confluent.io>, John Roesler<vvcephei@apache.org>	2021-12-01 11:49:10 -06:00
Bruno Cadonna	4fed0001ec	MINOR: Fix system test StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance (#11532 ) Log messages were changed in the AssignorConfiguration (#11490) that are also used for verification in system test StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance. This commit fixes the test and adds comments to the log messages that point to the test that needs to be updated in case of changes to the log messages. Reviewers: John Roesler <vvcephei@apache.org>, Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2021-11-25 10:48:09 +01:00
Patrick Stuedi	23e9818e62	KAFKA-13480: Track Position in KeyValue stores (#11514 ) Add position tracking to KeyValue stores in support of KIP-796 Reviewers: John Roesler <vvcephei@apache.org>	2021-11-24 18:28:00 -06:00
Jorge Esteban Quilcate Otoya	1e0916580f	KAFKA-13117: migrate TupleForwarder and CacheFlushListener to new Record API (#11481 ) * Migrate TupleForwarder and CacheFlushListener to new Processor API * Update the affected Processors Reviewers: John Roesler <vvcephei@apache.org>	2021-11-22 21:34:59 -06:00
Bruno Cadonna	9285820df0	MINOR: Set mock correctly in RocksDBMetricsRecorderTest (#11462 ) With a nice mock in RocksDBMetricsRecorderTest#shouldCorrectlyHandleHitRatioRecordingsWithZeroHitsAndMisses() and RocksDBMetricsRecorderTest#shouldCorrectlyHandleAvgRecordingsWithZeroSumAndCount() were green although getTickerCount() was never called. The tests were green because EasyMock returns 0 for a numerical return value by default if no expectation is specified. Thus, commenting out the expectation for getTickerCount() did not change the result of the test. This commit changes the mock to a default mock and fixes the expectation to expect getAndResetTickerCount(). Now, commenting out the expectation leads to a test failure. Reviewers: Luizfrf3 <lf.fonseca@hotmail.com>, Guozhang Wang <wangguoz@gmail.com>	2021-11-18 18:14:36 +01:00
Luke Chen	1b4cffdcb7	KAFKA-13439: Deprecate eager rebalance protocol in kafka stream (#11490 ) Deprecate eager rebalancing protocol in kafka streams and log warning message when upgrade.from is set to 2.3 or lower. Also add a note in upgrade doc to prepare users for the removal of eager rebalancing support Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2021-11-17 03:05:19 -08:00
Matthias J. Sax	30d1989db1	MINOR: update Kafka Streams standby task config (#11404 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Antony Stubbs <antony@confluent.io>, James Galasyn <jim.galasyn@confluent.io>	2021-11-16 17:34:49 -08:00
Patrick Stuedi	ffbef88cd7	Add recordMetadata() to StateStoreContext (#11498 ) Implements KIP-791 Reviewers: John Roesler <vvcephei@apache.org>	2021-11-16 10:51:41 -06:00
mkandaswamy	babc54333c	MINOR: Improve KafkaStreamsTest: testInitializesAndDestroysMetricsReporters (#11494 ) Add additional asserts for KafkaStreamsTest: testInitializesAndDestroysMetricsReporters to help diagnose if it flakily fails in the future. - MockMetricsReporter gets initialized only once during KafkaStreams construction, so make assert check stricter by ensuring initDiff is one. - Assert KafkaStreams is not running before we validate whether MockMetricsMetricsReporter close count got incremented after streams close. Reviewer: Bruno Cadonna <cadonna@apache.org>	2021-11-16 10:50:15 +01:00
A. Sophie Blee-Goldman	908a6d2ad7	KAFKA-12648: introduce TopologyConfig and TaskConfig for topology-level overrides (#11272 ) Most configs that are read and used by Streams today originate from the properties passed in to the KafkaStreams constructor, which means they get applied universally across all threads, tasks, subtopologies, and so on. The only current exception to this is the topology.optimization config which is parsed from the properties that get passed in to StreamsBuilder#build. However there are a handful of configs that could also be scoped to the topology level, allowing users to configure each NamedTopology independently of the others, where it makes sense to do so. This PR refactors the handling of these configs by interpreting the values passed in via KafkaStreams constructor as the global defaults, which can then be overridden for individual topologies via the properties passed in when building the NamedTopology. More topology-level configs may be added in the future, but this PR covers the following: max.task.idle.ms task.timeout.ms buffered.records.per.partition default.timestamp.extractor.class default.deserialization.exception.handler Reviewers: Guozhang Wang <wangguoz@gmail.com>, Walker Carlson <wcarlson@confluent.io>, Luke Chen <showuon@confluent.io>	2021-11-10 11:27:59 -08:00
Jorge Esteban Quilcate Otoya	807c5b4d28	KAFKA-10543: Convert KTable joins to new PAPI (#11412 ) * Migrate KTable joins to new Processor API. * Migrate missing KTableProcessorSupplier implementations. * Replace KTableProcessorSupplier with new Processor API implementation. Reviewers: John Roesler <vvcephei@apache.org>	2021-11-08 14:48:54 -06:00
Victoria Xia	01e6a6ebf2	KAFKA-13261: Add support for custom partitioners in foreign key joins (#11368 ) Implements KIP-775. Co-authored-by: Tomas Forsman <tomas-forsman@users.noreply.github.com>	2021-11-03 10:55:24 -07:00
Luizfrf3	252a40ea1f	KAFKA-8941: Add RocksDB Metrics that Could not be Added due to RocksD… (#11441 ) This PR adds some RocksDB metrics that could not be added in KIP-471 due to RocksDB version. The new metrics are extracted using Histogram data provided by RocksDB API, and the old ones were extracted using Tickers. The new metrics added are memtable-flush-time-(avg\|min\|max) and compaction-time-(avg\|min\|max). Reviewer: Bruno Cadonnna <cadonna@apache.org>	2021-11-03 12:18:28 +01:00
A. Sophie Blee-Goldman	22aa9d2ce1	KAFKA-12648: fill in javadocs for the StreamsException class with new guarantees (#11436 ) Minor followup to #11405 / KIP-783 to write down the new guarantees we're providing about the meaning of a StreamsException in the javadocs of that class Reviewers: Bruno Cadonna <cadonna@apache.org>	2021-10-27 10:47:57 -07:00
Andrew Patterson	86e83de742	kafka-12994: Migrated SlidingWindowsTest to new API (#11379 ) As raised in KAFKA-12994, All tests that use the old API should be either eliminated or migrated to the new API in order to remove the @SuppressWarnings("deprecation") annotations. This PR migrates the SlidingWindowsTest to the new API. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2021-10-21 16:50:37 -07:00
A. Sophie Blee-Goldman	b534124224	KAFKA-12648: Wrap all exceptions thrown to handler as StreamsException & add TaskId field (#11405 ) To help users distinguish which task an exception was thrown from, and which NamedTopology if it exists, we add a TaskId field to the StreamsException class. We then make sure that all exceptions thrown to the handler are wrapped as StreamsExceptions, to help the user simplify their handling code as they know they will always need to unwrap the thrown exception exactly once. Reviewers: Walker Carlson <wcarlson@confluent.io>, Luke Chen <showuon@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, John Roesler <vvcephei@apache.org>	2021-10-21 16:21:20 -07:00
Jorge Esteban Quilcate Otoya	68223d3227	KAFKA-10539: Convert KStreamImpl joins to new PAPI (#11356 ) Part of the migration to new Processor API, this PR converts KStream to KStream joins. Depends #11315 Reviewers: John Roesler <vvcephei@apache.org>	2021-10-18 15:34:08 -05:00
Lucas Bradstreet	da38a1df27	MINOR: "partition" typos and method doc arg fix (#11298 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Luke Chen <showuon@gmail.com>	2021-10-18 10:44:11 +02:00
Matthias J. Sax	f58d79d549	KAFKA-13345: Use "delete" cleanup policy for windowed stores if duplicates are enabled (#11380 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Luke Chen <showuon@gmail.com>	2021-10-14 22:06:30 -07:00
Matthias J. Sax	7fbe482289	HOTFIX: suppress deprecation warnings to fix compilation errors (#11400 ) Reviewers: Bill Bejeck <bill@confluent.io>	2021-10-14 18:58:20 -07:00
Luke Chen	65b01a0464	KAFKA-13212: add support infinite query for session store (#11234 ) Add support for infinite range query for SessionStore. Reviewers: Patrick Stuedi <pstuedi@apache.org>, Guozhang Wang <wangguoz@gmail.com>	2021-10-12 16:14:38 -07:00
Luke Chen	d1415866cc	KAFKA-13021: disallow grace called after grace set via new API (#11188 ) Disallow calling grace() if it was already set via ofTimeDifferenceAndGrace/WithNoGrace(). Add the check to disallow grace called after grace set via new API, and add tests for them. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2021-10-12 16:05:22 -07:00
Guozhang Wang	e31a53a2cd	KAFKA-13319: Do not commit empty offsets on producer (#11362 ) We observed on the broker side that txn-offset-commit request with empty topics are received. After checking the source code I found there's on place on Streams which is unnecessarily sending empty offsets. This PR cleans up the streams layer logic a bit to not send empty offsets, and at the same time also guard against empty offsets at the producer layer as well. Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>	2021-10-12 13:33:23 -07:00
Luke Chen	769882d910	MINOR: remove unneeded size and add lock coarsening to inMemoryKeyValueStore (#11370 ) Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>	2021-10-06 18:32:10 -07:00
Jorge Esteban Quilcate Otoya	ce83e5be66	KAFKA-10540: Migrate KStream aggregate operations (#11315 ) As part of the migration of KStream/KTable operations to the new Processor API https://issues.apache.org/jira/browse/KAFKA-8410, this PR includes the migration of KStream aggregate/reduce operations. Reviewers: John Roesler <vvcephei@apache.org>	2021-09-30 11:40:40 -05:00
A. Sophie Blee-Goldman	6d7a785956	MINOR: expand logging and improve error message during partition count resolution (#11364 ) Recently a user hit this TaskAssignmentException due to a bug in their regex that meant no topics matched the pattern subscription, which in turn meant that it was impossible to resolve the number of partitions of the downstream repartition since there was no upstream topic to get the partition count for. Debugging this was pretty difficult and ultimately came down to stepping through the code line by line, since even with TRACE logging we only got a partial picture. We should expand the logging to make sure the TRACE logging hits both conditional branches, and improve the error message with a suggestion for what to look for should someone hit this in the future Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2021-09-29 18:19:43 -07:00
Luke Chen	361b7845c6	KAFKA-13309: fix InMemorySessionStore#fetch/backwardFetch order issue (#11337 ) In #9139, we added backward iterator on SessionStore. But there is a bug that when fetch/backwardFetch the key range, if there are multiple records in the same session window, we can't return the data in the correct order. Reviewers: Anna Sophie Blee-Goldman <ableegoldman<apache.org>	2021-09-28 17:31:29 -07:00
vamossagar12	7e57300148	KAFKA-12486: Enforce Rebalance when a TaskCorruptedException is throw… (#11076 ) This PR aims to utilize HighAvailabilityTaskAssignor to avoid downtime on corrupted tasks. The idea is that, when we hit TaskCorruptedException on an active task, a rebalance is triggered after we've wiped out the corrupted state stores. This will allow the assignor to temporarily redirect this task to another client who can resume work on the task while the original owner works on restoring the state from scratch. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2021-09-28 16:50:16 -07:00
Andrew Patterson	24a335b338	KAFKA-12994: Migrate TimeWindowsTest to new API (#11215 ) As raised in KAFKA-12994, All tests that use the old API should be either eliminated or migrated to the new API in order to remove the @SuppressWarnings("deprecation") annotations. This PR will migrate over all the relevant tests in TimeWindowsTests.java Reviewers: Anna Sophie Blee-Goldman	2021-09-28 15:58:49 -07:00
Walker Carlson	4eb386f6e0	KAFKA-13296: warn if previous assignment has duplicate partitions (#11347 ) Reviewers: Matthias J. Sax <matthias@confluent.io>, Luke Chen <showuon@gmail.com>	2021-09-25 11:41:52 -07:00
Jorge Esteban Quilcate Otoya	d15969af28	KAFKA-10544: Migrate KTable aggregate and reduce (#11316 ) As part of the migration of KStream/KTable operations to the new Processor API https://issues.apache.org/jira/browse/KAFKA-8410, this PR includes the migration of KTable aggregate/reduce operations. Reviewers: John Roesler <vvcephei@apache.org>	2021-09-22 14:25:41 -05:00
Luke Chen	b61ec0003f	KAFKA-13211: add support for infinite range query for WindowStore (#11227 ) Add support for infinite range query for WindowStore. Story JIRA: https://issues.apache.org/jira/browse/KAFKA-13210 Reviewers: Patrick Stuedi <pstuedi@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	2021-09-22 09:14:58 -07:00
andy0x01	5a6f19b2a1	KAFKA-13246: StoreQueryIntegrationTest#shouldQueryStoresAfterAddingAndRemovingStreamThread now waits for the client state to go to REBALANCING/RUNNING after adding/removing a thread and waits for state RUNNING before querying the state store. (#11334 ) KAFKA-13246: StoreQueryIntegrationTest#shouldQueryStoresAfterAddingAndRemovingStreamThread does not gate on stream state well The test now waits for the client to transition to REBALANCING/RUNNING after adding/removing a thread as well as to transition to RUNNING before querying the state store. Reviewers: singingMan <@3schwartz>, Walker Carlson <wcarlson@confluent.io>, John Roesler <vvcephei@apache.org>	2021-09-21 11:18:15 -05:00
Guozhang Wang	1b0294dfc4	MINOR: Let the list-store return null in putifabsent (#11335 ) This is to make sure that even if logging is disabled, we would still return null in order to workaround the deserialization issue for stream-stream left/outer joins. Reviewers: Matthias J. Sax <mjsax@apache.org>	2021-09-17 12:05:21 -07:00
Guozhang Wang	a0c7e6d8b4	KAFKA-13216: Use a KV with list serde for the shared store (#11252 ) This is an alternative approach in parallel to #11235. After several unsuccessful trials to improve its efficiency i've come up with a larger approach, which is to use a kv-store instead as the shared store, which would store the value as list. The benefits of this approach are: Only serde once that compose <timestamp, byte, key>, at the outer metered stores, with less byte array copies. Deletes are straight-forward with no scan reads, just a single call to delete all duplicated <timestamp, byte, key> values. Using a KV store has less space amplification than a segmented window store. The cons though: Each put call would be a get-then-write to append to the list; also we would spend a few more bytes to store the list (most likely a singleton list, and hence just 4 more bytes). It's more complicated definitely.. :) The main idea is that since the shared store is actively GC'ed by the expiration logic, not based on time retention, and since that the key format is in <timestamp, byte, key>, the range expiration query is quite efficient as well. Added testing covering for the list stores (since we are still use kv-store interface, we cannot leverage on the get() calls in the stream-stream join, instead we use putIfAbsent and range only). Another minor factoring piggy-backed is to let toList to always close iterator to avoid leaking. Reviewers: Sergio Peña <sergio@confluent.io>, Matthias J. Sax <mjsax@apache.org>	2021-09-16 16:44:32 -07:00
Luke Chen	9628c1278e	KAFKA-13264: fix inMemoryWindowStore backward fetch not in reversed order (#11292 ) When introducing backward iterator for WindowStroe in #9138, we forgot to make "each segment" in reverse order (i.e. in descendingMap) in InMemoryWindowStore. Fix it and add integration tests for it. Currently, in Window store, we store records in [segments -> [records] ]. For example: window size = 500, input records: key: "a", value: "aa", timestamp: 0 ==> will be in [0, 500] window key: "b", value: "bb", timestamp: 10 ==> will be in [0, 500] window key: "c", value: "cc", timestamp: 510 ==> will be in [500, 1000] window So, internally, the "a" and "b" will be in the same segment, and "c" in another segments. segments: [0 /* window start */, records], [500, records]. And the records for window start 0 will be "a" and "b". the records for window start 500 will be "c". Before this change, we did have a reverse iterator for segments, but not in "records". So, when doing backwardFetchAll, we'll have the records returned in order: "c", "a", "b", which should be "c", "b", "a" obviously. Reviewers: Jorge Esteban Quilcate Otoya <quilcate.jorge@gmail.com>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <wangguoz@gmail.com>	2021-09-13 14:40:54 -07:00
Oliver Hutchison	a03bda61e0	KAFKA-13249: Always update changelog offsets before writing the checkpoint file (#11283 ) When using EOS checkpointed offsets are not updated to the latest offsets from the changelog because the maybeWriteCheckpoint method is only ever called when commitNeeded=false. This change will force the update if enforceCheckpoint=true . I have also added a test which verifies that both the state store and the checkpoint file are completely up to date with the changelog after the app has shutdown. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <wangguoz@gmail.com>	2021-09-13 14:15:22 -07:00
Josep Prat	286126f9a5	KAFKA-13243: KIP-773 Differentiate metric latency measured in ms and ns (#11302 ) KAFKA-13243: KIP-773 Differentiate metric latency measured in ms and ns Implementation of KIP-773 Deprecates inconsistent metrics bufferpool-wait-time-total, io-waittime-total, and iotime-total. Introduces new metrics bufferpool-wait-time-ns-total, io-wait-time-ns-total, and io-time-ns-total with the same semantics as before. Adds metrics (old and new) in ops.html. Adds upgrade guide for these metrics. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Tom Bentley <tbentley@redhat.com>	2021-09-08 18:00:58 +01:00
Tomer Wizman	fb77da941a	KAFKA-12766 - Disabling WAL-related Options in RocksDB (#11250 ) Description Streams disables the write-ahead log (WAL) provided by RocksDB since it replicates the data in changelog topics. Hence, it does not make much sense to set WAL-related configs for RocksDB. Proposed solution Ignore any WAL-related configuration and state in the log that we are ignoring them. Co-authored-by: Tomer Wizman <tomer.wizman@personetics.com> Co-authored-by: Bruno Cadonna <cadonna@apache.org> Reviewers: Boyang Chen <boyang@apache.org>, Bruno Cadonna <cadonna@apache.org>	2021-09-08 13:57:08 +02:00
Christo Lolov	6472e79092	KAFKA-12994 Migrate JoinWindowsTest and SessionWindowsTest to new API (#11214 ) As detailed in KAFKA-12994, unit tests using the old API should be either removed or migrated to the new API. This PR migrates relevant tests in JoinWindowsTest.java and SessionWindowsTest.java. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2021-09-07 19:50:18 -07:00
Jorge Esteban Quilcate Otoya	5f89ce5f20	KAFKA-13201: Convert KTable suppress to new PAPI (#11213 ) Migrate Suppress as part of the migration of KStream/KTable operations to the new Processor API (KAFKA-8410) Reviewers: John Roesler <vvcephei@apache.org>	2021-09-07 17:17:44 -05:00
CHUN-HAO TANG	89ee72b5b5	KAFKA-13088: Replace EasyMock with Mockito for ForwardingDisabledProcessorContextTest (#11051 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	2021-09-06 08:18:52 -07:00
Yanwen(Jason) Lin	7c64077b34	KAFKA-13032: add NPE checker for KeyValueMapper (#11241 ) Currently KStreamMap and KStreamFlatMap classes will throw NPE if the call to KeyValueMapper#apply() return null. This commit checks whether the result of KeyValueMapper#apply() is null and throws a more meaningful error message for better debugging. Two unit tests are also added to check if we successfully captured nulls. Reviewers: Josep Prat <josep.prat@aiven.io>, Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>	2021-09-06 14:03:16 +02:00
Josep Prat	4835c64f89	KAFKA-12887 Skip some RuntimeExceptions from exception handler (#11228 ) Instead of letting all RuntimeExceptions go through and be processed by the uncaught exception handler, IllegalStateException and IllegalArgumentException are not passed through and fail fast. In this PR when setting the uncaught exception handler we check if the exception is in an "exclude list", if so, we terminate the client, otherwise we continue as usual. Added test checking this new case. Added integration test checking that user defined exception handler is not used when an IllegalStateException is thrown. Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2021-09-01 09:58:36 -07:00
Rohan	a5ce43781e	MINOR: add units to metrics descriptions + test fix post KAFKA-13229 (#11286 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2021-08-31 11:44:45 -07:00
Rohan	01ab888dbd	KAFKA-13229: add total blocked time metric to streams (KIP-761) (#11149 ) * Add the following producer metrics: flush-time-total: cumulative sum of time elapsed during in flush. txn-init-time-total: cumulative sum of time elapsed during in initTransactions. txn-begin-time-total: cumulative sum of time elapsed during in beginTransaction. txn-send-offsets-time-total: cumulative sum of time elapsed during in sendOffsetsToTransaction. txn-commit-time-total: cumulative sum of time elapsed during in commitTransaction. txn-abort-time-total: cumulative sum of time elapsed during in abortTransaction. * Add the following consumer metrics: commited-time-total: cumulative sum of time elapsed during in committed. commit-sync-time-total: cumulative sum of time elapsed during in commitSync. * Add a total-blocked-time metric to streams that is the sum of: consumer’s io-waittime-total consumer’s iotime-total consumer’s committed-time-total consumer’s commit-sync-time-total restore consumer’s io-waittime-total restore consumer’s iotime-total admin client’s io-waittime-total admin client’s iotime-total producer’s bufferpool-wait-time-total producer's flush-time-total producer's txn-init-time-total producer's txn-begin-time-total producer's txn-send-offsets-time-total producer's txn-commit-time-total producer's txn-abort-time-total Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2021-08-30 15:39:25 -07:00
dengziming	1d22b0d706	KAFKA-10774; Admin API for Describe topic using topic IDs (#9769 ) Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Satish Duggana <satishd@apache.org>, Rajini Sivaram <rajinisivaram@googlemail.com>	2021-08-28 09:00:36 +01:00
Walker Carlson	49aed781d8	KAFKA-13128: extract retry checker and update with retriable exception causing flaky StoreQueryIntegrationTest (#11275 ) Add a new case to the list of possible retriable exceptions for the flaky tests to take care of threads starting up Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman	2021-08-27 20:12:12 -07:00
Andy Lapidas	84b111f968	KAFKA-12963: Add processor name to error (#11262 ) This PR adds the processor name to the ClassCastException exception text in process() Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2021-08-27 20:06:49 -07:00
A. Sophie Blee-Goldman	d9bb988954	MINOR: remove unused Properties from GraphNode#writeToTopology (#11263 ) The GraphNode#writeToTopology method accepts a Properties input parameter, but never uses it in any of its implementations. We can remove this parameter to clean things up and help make it clear that writing nodes to the topology doesn't involve the app properties. Reviewers: Bruno Cadonna <cadonna@confluent.io>	2021-08-26 17:19:03 -07:00
Luke Chen	844c1259a9	MINOR: Optimize the OrderedBytes#upperRange for not all query cases (#11181 ) Currently in OrderedBytes#upperRange method, we'll check key bytes 1 by 1, to see if there's a byte value >= first timestamp byte value, so that we can skip the following key bytes, because we know compareTo will always return 0 or 1. However, in most cases, the first timestamp byte is always 0, more specifically the upperRange is called for both window store and session store. For former, the suffix is in timestamp, Long.MAX_VALUE and for latter the suffix is in Long.MAX_VALUE, timestamp. For Long.MAX_VALUE the first digit is not 0, for timestamp it could be 0 or not, but as long as it is up to "now" (i.e. Aug. 23rd) then the first byte should be 0 since the value is far smaller than what a long typed value could have. So in practice for window stores, that suffix's first byte has a large chance to be 0, and hence worth optimizing for. This PR optimizes the not all query cases by not checking the key byte 1 by 1 (because we know the unsigned integer will always be >= 0), instead, put all bytes and timestamp directly. So, we won't have byte array copy in the end either. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2021-08-26 14:37:34 -07:00
A. Sophie Blee-Goldman	53277a92a6	HOTFIX: decrease session timeout in flaky NamedTopologyIntegrationTest (#11259 ) Decrease session timeout back to 10s to improve test flakiness Reviewers: Walker Carlson <wcarlson@confluent.io>	2021-08-25 21:52:32 -07:00
Phil Hardwick	a594747d75	HOTFIX: Fix null pointer when getting metric value in MetricsReporter (#11248 ) The alive stream threads metric relies on the threads field as a monitor object for its synchronized block. When the alive stream threads metric is registered it isn't initialised so any call to get the metric value before it is initialised will result in a null pointer exception. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Walker Carlson <wcarlson@confluent.io>	2021-08-23 13:21:38 -07:00
John Roesler	45ecaa19f8	MINOR: Set session timeout back to 10s for Streams system tests (#11236 ) We increased the default session timeout to 30s in KIP-735: https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout Since then, we are observing sporadic system test failures due to rebalances taking longer than the test timeout. Rather than increase the test wait times, we can just override the session timeout to a value more appropriate in the testing domain. Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>	2021-08-20 11:27:54 -05:00
Luke Chen	2bfd0ae2e9	MINOR: update the branch(split) doc and java doc and tests (#11195 ) Reviewers: Ivan Ponomarev <iponomarev@mail.ru>, Matthias J. Sax <matthias@confluent.io>	2021-08-10 13:37:59 -07:00
Guozhang Wang	35ebdd7ed0	MINOR: Fix flaky shouldCreateTopicWhenTopicLeaderNotAvailableAndThenTopicNotFound (#11155 ) 1. This test is taking two iterations since the firs iteration is designed to fail due to unknow topic leader. However both the timeout and the backoff are set to 50ms, while the actual SYSTEM time is used. This means we are likely to timeout before executing the second iteration. I thought about using a mock time but dropped that idea as it may forgo the purpose of this test, instead I set the backoff time to 10ms so that we are almost unlikely to hit this error anymore. 2. Found a minor issue while fixing this which is that when we have non-empty not-ready topics, but the topics-to-create is empty (which is possible as the test shouldCreateTopicWhenTopicLeaderNotAvailableAndThenTopicNotFound itself illustrates), we still call an empty create-topic function. Though it does not incur any round-trip it would still waste some cycles, so I branched it off and hence also simplified some unit tests. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@confluent.io>	2021-08-10 12:27:44 -07:00
Walker Carlson	9565a529e0	KAFKA-12779: rename namedTopology in TaskId to topologyName #11192 Update to KIP-740. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Israel Ekpo <israelekpo@gmail.com>	2021-08-09 15:19:21 -07:00
John Roesler	f16a9499ec	MINOR: Increase smoke test production time (#11190 ) We've seen a few failures recently due to the driver finishing the production of data and verifying the results before the whole cluster is even running. Reviewers: Leah Thomas <lthomas@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Matthias J. Sax <mjsax@apache.org>	2021-08-09 14:19:35 -05:00
A. Sophie Blee-Goldman	6854eb8332	KAFKA-12648: Pt. 3 - addNamedTopology API (#10788 ) Pt. 1: #10609 Pt. 2: #10683 Pt. 3: #10788 In Pt. 3 we implement the addNamedTopology API. This can be used to update the processing topology of a running Kafka Streams application without resetting the app, or even pausing/restarting the process. It's up to the user to ensure that this API is called on every instance of an application to ensure all clients are able to run the newly added NamedTopology. Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2021-08-06 00:18:27 -07:00
Walker Carlson	be8820cdac	MINOR: Port fix to other StoreQueryIntegrationTests (#11153 ) Port the fix from #11129 to the other store-query tests. Reviewers: John Roesler <vvcephei@apache.org>	2021-08-05 16:39:33 -05:00
Guozhang Wang	e5df7fd90b	MINOR: Should commit a task if the consumer position advanced as well (#11151 ) Reviewers: John Roesler <vvcephei@apache.org>	2021-08-05 13:41:46 -07:00
Walker Carlson	d414de3779	MINOR: use relative counts for restores (#11176 ) Use a relative count from using 0 for totalNumbRestores to prevent flakiness. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2021-08-05 00:32:25 -07:00
A. Sophie Blee-Goldman	5d52de2ccf	KAFKA-12648: minor followup from Pt. 2 and some new tests (#11146 ) Addresses the handful of remaining feedback from Pt. 2, plus adds two new tests: one verifying a multi-topology application with a FKJ and its internal topics, another to make sure IQ works with named topologies (though note that there is a bit more work left for IQ to be fully supported, will be tackled after Pt. 3 Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2021-07-30 10:56:57 -07:00
Patrick Stuedi	22541361b7	Add support for infinite endpoints for range queries (#11120 ) Add support to open endpoint range queries in key-value stores Implements: KIP-763 Reviewers: Almog Gavra <almog@confluent.io>, Luke Chen <showuon@gmail.com>, John Roesler <vvcephei@apache.org>	2021-07-29 21:52:16 -05:00
Patrick Stuedi	c074b67395	Fix for flaky test in StoreQueryIntegrationTest (#11129 ) Fix a bug in StoreQueryIntegrationTest::shouldQueryOnlyActivePartitionStoresByDefault that causes the test to fail in the case of a client rebalancing. The changes in this PR make sure the test keeps re-trying after a rebalancing operation, instead of failing. Reviewers: Luke Chen <showuon@gmail.com>, John Roesler <vvcephei@apache.org>	2021-07-29 16:26:00 -05:00
Luke Chen	a2aa3b9ea5	close TopologyTestDriver to release resources (#11143 ) Close TopologyTestDriver to release resources Reviewers: Bill Bejeck <bbejeck@apache.org>	2021-07-29 15:50:57 -04:00
Jorge Esteban Quilcate Otoya	3190ebd1e6	KAFKA-10542: Migrate KTable mapValues, passthrough, and source to new Processor API (#11099 ) As part of the migration of KStream/KTable operations to the new Processor API (KAFKA-8410), this PR includes the migration of KTable: * mapValues, * passthrough, * and source operations. Reviewers: John Roesler <vvcephei@apache.org>	2021-07-28 20:58:15 -05:00
Walker Carlson	58402a6fe8	MINOR: add helpful error message (#11139 ) I noticed that replace thread actions would not be logged unless the user added a log in the handler. I think it would be very useful for debugging. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2021-07-28 11:38:31 -07:00
A. Sophie Blee-Goldman	4710a49146	KAFKA-12648: Pt. 2 - Introduce TopologyMetadata to wrap InternalTopologyBuilders of named topologies (#10683 ) Pt. 1: #10609 Pt. 2: #10683 Pt. 3: #10788 The TopologyMetadata is next up after Pt. 1 #10609. This PR sets up the basic architecture for running an app with multiple NamedTopologies, though the APIs to add/remove them dynamically are not implemented until Pt. 3 Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2021-07-28 11:18:56 -07:00
Luke Chen	818cbfba6d	KAFKA-13125: close KeyValueIterator instances in internals tests (part 2) (#11107 ) Reviewers: Matthias J. Sax <matthias@confluent.io>	2021-07-26 16:26:02 -07:00
Luke Chen	ded66d92a4	KAFKA-13124: close KeyValueIterator instance in internals tests (part 1) (#11106 ) Reviewers: Matthias J. Sax <matthias@confluent.io>	2021-07-26 16:25:22 -07:00

1 2 3 4 5 ...

2334 Commits