kafka

Commit Graph

Author	SHA1	Message	Date
Stanislav Vodetskyi	7e683852b4	MINOR: unpin ducktape dependency to always use the newest version (py3 edition) (#11884 ) Ensures we always have the latest published ducktape version. This way whenever we release a new one, we won't have to cherry pick a bunch of commits across a bunch of branches.	2022-03-11 17:48:19 +05:30
Levani Kokhreidze	87eb0cf03c	KAFKA-6718: Update SubscriptionInfoData with clientTags (#10802 ) adds ClientTags to SubscriptionInfoData Reviewer: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>	2022-03-11 16:29:05 +08:00
xuexiaoyue	f025a93c7c	MINOR: Fix comments in TransactionsTest (#11880 ) Reviewer: Luke Chen <showuon@gmail.com>	2022-03-11 15:42:44 +08:00
Lucas Bradstreet	dc36dedd28	MINOR: jmh.sh swallows compile errors (#11870 ) jmh.sh runs tasks in quiet mode which swallows compiler errors. This is a pain and I frequently have to edit the shell script to see the error. Reviewers: Ismael Juma <ismael@confluent.io>, Bill Bejeck <bbejeck@apache.org>	2022-03-10 18:18:41 -05:00
Walker Carlson	4d5a28973f	Revert "KAFKA-13542: add rebalance reason in Kafka Streams (#11804 )" (#11873 ) This reverts commit `2ccc834faa`. This reverts commit `2ccc834`. We were seeing serious regressions in our state heavy benchmarks. We saw that our state heavy benchmarks were experiencing a really bad regression. The State heavy benchmarks runs with rolling bounces with 10 nodes. We regularly saw this exception: java.lang.OutOfMemoryError: Java heap space I ran through a git bisect and found this commit. We verified that the commit right before did not have the same issues as this one did. I then reverted the problematic commit and ran the benchmarks again on this commit and did not see any more issues. We are still looking into the root cause, but for now since this isn't a critical improvement so we can remove it temporarily. Reviewers: Bruno Cadonna <cadonna@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, David Jacot <djacot@confluent.io>, Ismael Juma <ismael@confluent.io>	2022-03-10 13:52:05 -08:00
A. Sophie Blee-Goldman	113595cf5c	KAFKA-12648: fix flaky #shouldAddToEmptyInitialTopologyRemoveResetOffsetsThenAddSameNamedTopologyWithRepartitioning (#11868 ) This test has started to become flaky at a relatively low, but consistently reproducible, rate. Upon inspection, we find this is due to IOExceptions during the #cleanUpNamedTopology call -- specifically, most often a DirectoryNotEmptyException with an ocasional FileNotFoundException Basically, signs pointed to having returned from/completed the #removeNamedTopology future prematurely, and moving on to try and clear out the topology's state directory while there was a streamthread somewhere that was continuing to process/close its tasks. I believe this is due to updating the thread's topology version before we perform the actual topology update, in this case specifically the act of eg clearing out a directory. If one thread updates its version and then goes to perform the topology removal/cleanup when the second thread finishes its own topology removal, this other thread will check whether all threads are on the latest version and complete any waiting futures if so -- which means it can complete the future before the first thread has actually completed the corresponding action Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-10 12:02:07 -08:00
aSemy	38e3787d76	Minor typo: "result _is_ a" > "result _in_ a" (#11876 ) Reviewers Bill Bejeck <bbejeck@apache.org>	2022-03-10 14:03:12 -05:00
RivenSun	84b41b9d3a	KAFKA-13689: Revert AbstractConfig code changes (#11863 ) Reviewer: Luke Chen <showuon@gmail.com>	2022-03-10 10:54:10 +08:00
Vincent Jiang	798275f254	KAFKA-13717: skip coordinator lookup in commitOffsetsAsync if offsets is empty (#11864 ) Reviewer: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2022-03-10 10:52:05 +08:00
A. Sophie Blee-Goldman	9c7d857713	KAFKA-12648: fix #getMinThreadVersion and include IOException + topologyName in StreamsException when topology dir cleanup fails (#11867 ) Quick fix to make sure we log the actual source of the failure both in the actual log message as well as the StreamsException that we bubble up to the user's exception handler, and also to report the offending topology by filling in the StreamsException's taskId field. Also prevents a NoSuchElementException from being thrown when trying to compute the minimum topology version across all threads when the last thread is being unregistered during shutdown. Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-09 16:30:42 -08:00
Randall Hauch	d2d49f6421	KAFKA-12879: Remove extra sleep (#11872 )	2022-03-09 15:11:46 -06:00
Philip Nee	ddcee81043	KAFKA-12879: Addendum to reduce flakiness of tests (#11871 ) This is an addendum to the KAFKA-12879 (#11797) to fix some tests that are somewhat flaky when a build machine is heavily loaded (when the timeouts are too small). - Add an if check to void sleep(0) - Increase timeout in the tests	2022-03-09 14:37:48 -06:00
Philip Nee	28393be6d7	KAFKA-12879: Revert changes from KAFKA-12339 and instead add retry capability to KafkaBasedLog (#11797 ) Fixes the compatibility issue regarding KAFKA-12879 by reverting the changes to the admin client from KAFKA-12339 (#10152) that retry admin client operations, and instead perform the retries within Connect's `KafkaBasedLog` during startup via a new `TopicAdmin.retryEndOffsets(..)` method. This method delegates to the existing `TopicAdmin.endOffsets(...)` method, but will retry on `RetriableException` until the retry timeout elapses. This change should be backward compatible to the KAFKA-12339 so that when Connect's `KafkaBasedLog` starts up it will retry attempts to read the end offsets for the log's topic. The `KafkaBasedLog` existing thread already has its own retry logic, and this is not changed. Added more unit tests, and thoroughly tested the new `RetryUtil` used to encapsulate the parameterized retry logic around any supplied function.	2022-03-09 12:39:28 -06:00
Jason Koch	2367c8994b	KAFKA-13630: Reduce amount of time that producer network thread holds batch queue lock (#11722 ) Hold the `deque` lock for only as long as is required to collect and make a decision in `ready()` and `drain()` loops. Once this is done, remaining work can be done without lock, so release it. This allows producers to continue appending. For an application with with a single producer thread and a high send() rate, this change reduces spinlock CPU cycles from 14.6% to 2.5% of the send() path, or more clearly a 12.1% improvement in efficiency for the send() path by reducing the duration of contention events with the network thread. Note that this application was executed with Java 8, which has a slower crc32c implementation. Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Artem Livshits <84364232+artemlivshits@users.noreply.github.com>	2022-03-09 05:41:06 -08:00
Adam Kotwasinski	add11eed75	MINOR: Correct logging and Javadoc in FetchSessionHandler (#11843 ) Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <david.jacot@gmail.com>, Luke Chen <showuon@gmail.com>	2022-03-09 16:51:26 +08:00
David Jacot	69926b5193	MINOR: Clean up AlterIsrManager code (#11832 ) Reviewers: Justine Olshan <jolshan@confluent.io>, Jason Gustafson <jason@confluent.io>	2022-03-09 07:31:07 +01:00
John Roesler	717f9e2149	MINOR: Restructure ConsistencyVectorIntegrationTest (#11848 ) Reviewers: YEONCHEOL JANG <@YeonCheolGit>, Matthias J. Sax <mjsax@apache.org>	2022-03-08 13:59:58 -06:00
Vincent Jiang	b27000ec6a	MINOR: Fix flaky test cases SocketServerTest.remoteCloseWithoutBufferedReceives and SocketServerTest.remoteCloseWithIncompleteBufferedReceive (#11861 ) When a socket is closed, corresponding channel should be retained only if there is complete buffered requests. Reviewers: David Jacot <djacot@confluent.io>	2022-03-08 19:03:11 +01:00
John Roesler	10f34ce6b3	MINOR: Clarify acceptable recovery lag config doc (#11411 ) Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>, Andrew Eugene Choi < @andrewchoi5 >	2022-03-08 10:42:36 -06:00
A. Sophie Blee-Goldman	fc7133d52d	KAFKA-12648: fix bug where thread is re-added to TopologyMetadata when shutting down (#11857 ) We used to call TopologyMetadata#maybeNotifyTopologyVersionWaitersAndUpdateThreadsTopologyVersion when a thread was being unregistered/shutting down, to check if any of the futures listening for topology updates had been waiting on this thread and could be completed. Prior to invoking this we make sure to remove the current thread from the TopologyMetadata's threadVersions map, but this thread is actually then re-added in the #maybeNotifyTopologyVersionWaitersAndUpdateThreadsTopologyVersion call. To fix this, we should break up this method into separate calls for each of its two distinct functions, updating the version and checking for topology update completion. When unregistering a thread, we should only invoke the latter method Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-07 23:59:43 -08:00
Luke Chen	1848f049e1	KAFKA-13710: bring the InvalidTimestampException back for record error (#11853 ) Reviewers: Guozhang Wang <guozhang@confluent.io>, Ricardo Brasil <anribrasil@gmail.com>	2022-03-08 14:28:16 +08:00
A. Sophie Blee-Goldman	539f006e65	KAFKA-12648: fix NPE due to race condtion between resetting offsets and removing a topology (#11847 ) While debugging the flaky NamedTopologyIntegrationTest. shouldRemoveOneNamedTopologyWhileAnotherContinuesProcessing test, I did discover one real bug. The problem was that we update the TopologyMetadata's builders map (with the known topologies) inside the #removeNamedTopology call directly, whereas the StreamThread may not yet have reached the poll() in the loop and in case of an offset reset, we get an NP.e I changed the NPE to just log a warning for now, going forward I think we should try to tackle some tech debt by keeping the processing tasks and the TopologyMetadata in sync Also includes a quick fix on the side where we were re-adding the topology waiter/KafkaFuture for a thread being shut down Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-07 11:09:18 -08:00
Mickael Maison	bbb2dc54a0	KAFKA-13671: Add ppc64le build stage (#11833 ) Reviewers: David Arthur <mumrah@gmail.com>	2022-03-07 10:18:54 +01:00
Tim Patterson	e3ef29ea03	KAFKA-12959: Distribute standby and active tasks across threads to better balance load between threads (#11493 ) Balance standby and active stateful tasks evenly across threads Reviewer: Luke Chen <showuon@gmail.com>	2022-03-05 16:11:42 +08:00
RivenSun	0dac4b4267	KAFKA-13689: printing unused and unknown logs separately (#11800 ) Differentiate between unused and unknown configs during log output. Reviewer: Luke Chen <showuon@gmail.com>	2022-03-05 16:08:14 +08:00
RivenSun	3be978464c	KAFKA-13694: Log more specific information when the verification record fails on brokers. (#11830 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-04 10:45:44 -08:00
A. Sophie Blee-Goldman	11143d4883	MINOR: fix flaky shouldRemoveOneNamedTopologyWhileAnotherContinuesProcessing (#11827 ) This test has been failing somewhat regularly due to going into the ERROR state before reaching RUNNING during the startup phase. The problem is that we are reusing the DELAYED_INPUT_STREAM topics, which had previously been assumed to be uniquely owned by a particular test. We should make sure to delete and re-create these topics for any test that uses them.	2022-03-04 10:31:37 -08:00
A. Sophie Blee-Goldman	6f54faed2d	KAFKA-12648: fix #add/removeNamedTopology blocking behavior when app is in CREATED (#11813 ) Currently the #add/removeNamedTopology APIs behave a little wonky when the application is still in CREATED. Since adding and removing topologies runs some validation steps there is valid reason to want to add or remove a topology on a dummy app that you don't plan to start, or a real app that you haven't started yet. But to actually check the results of the validation you need to call get() on the future, so we need to make sure that get() won't block forever in the case of no failure -- as is currently the case Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-04 09:58:56 -08:00
Vincent Jiang	95dbba9fe5	KAFKA-13706: Remove closed connections from MockSelector.ready (#11839 ) Reviewers: David Jacot <djacot@confluent.io>	2022-03-04 09:51:53 +01:00
wangyap	ae76b9d45a	KAFKA-13466: delete unused config batch.size in kafka-console-producer.sh (#11517 ) delete unused config batch.size in kafka-console-producer.sh Reviewer: Andrew Eugene Choi <andrew.choi@uwaterloo.ca>, Luke Chen <showuon@gmail.com>,	2022-03-04 09:47:23 +08:00
Justin Lee	f5d8fb2b0b	(docs) Add JavaDocs for org.apache.kafka.common.security.oauthbearer.secured (#11811 ) Reviewers: Luke Chen <showuon@confluent.io>, Jun Rao <junrao@gmail.com>	2022-03-03 10:13:01 -08:00
Luke Chen	7c280c1d5f	KAFKA-13673: disable idempotence when config conflicts (#11788 ) Disable idempotence when conflicting config values for acks, retries and max.in.flight.requests.per.connection are set by the user. For the former two configs, we log at info level when we disable idempotence due to conflicting configs. For the latter, we log at warn level since it's due to an implementation detail that is likely to be surprising. This mitigates compatibility impact of enabling idempotence by default. Added unit tests to verify the change in behavior. Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Mickael Maison <mickael.maison@gmail.com>	2022-03-03 05:40:41 -08:00
Mickael Maison	029a14b530	KAFKA-13510: Connect APIs to list all connector plugins and retrieve their configs (#11572 ) Implements KIP-769: https://cwiki.apache.org/confluence/display/KAFKA/KIP-769%3A+Connect+APIs+to+list+all+connector+plugins+and+retrieve+their+configuration+definitions Reviewers: Tom Bentley <tbentley@redhat.com>, Chris Egerton <fearthecellos@gmail.com>	2022-03-03 14:28:50 +01:00
Chris Egerton	066cdc8c62	KAFKA-10000: Add producer fencing API to admin client (KIP-618) (#11777 ) * KAFKA-10000: Add producer fencing API to admin client Reviewers: Luke Chen <showuon@gmail.com>, Tom Bentley <tbentley@redhat.com>	2022-03-03 10:27:17 +00:00
Levani Kokhreidze	62e646619b	KAFKA-6718 / Rack aware standby task assignor (#10851 ) This PR is part of KIP-708 and adds rack aware standby task assignment logic. Reviewer: Bruno Cadonna <cadonna@apache.org>, Luke Chen <showuon@gmail.com>, Vladimir Sitnikov <vladimirsitnikov.apache.org>	2022-03-03 11:37:26 +08:00
Colin Patrick McCabe	07553d13f7	MINOR: create KafkaConfigSchema and TimelineObject (#11809 ) Create KafkaConfigSchema to encapsulate the concept of determining the types of configuration keys. This is useful in the controller because we can't import KafkaConfig, which is part of core. Also introduce the TimelineObject class, which is a more generic version of TimelineInteger / TimelineLong. Reviewers: David Arthur <mumrah@gmail.com>	2022-03-02 14:26:31 -08:00
A. Sophie Blee-Goldman	f089bea7ed	MINOR: set log4j.logger.kafka and all Config logger levels to ERROR for Streams tests (#11823 ) Pretty much any time we have an integration test failure that's flaky or only exposed when running on Jenkins through the PR builds, it's impossible to debug if it cannot be reproduced locally as the logs attached to the test results have truncated the entire useful part of the logs. This is due to the logs being flooded at the beginning of the test when the Kafka cluster is coming up, eating up all of the allotted characters before we even get to the actual Streams test. Setting log4j.logger.kafka to ERROR greatly improves the situation and cuts down on most of the excessive logging in my local runs. To improve things even more and have some hope of getting the part of the logs we actually need, I also set the loggers for all of the Config objects to ERROR, as these print out the value of every single config (of which there are a lot) and are not useful as we can easily figure out what the configs were if necessary by just inspecting the test locally. Reviewers: Luke Chen <showuon@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2022-03-01 21:58:10 -08:00
John Roesler	7172f35807	MINOR: Improve test assertions for IQv2 (#11828 ) Reviewer: Bill Bejeck <bbejeck@apache.org>	2022-03-01 20:30:29 -06:00
A. Sophie Blee-Goldman	84f8c90b13	KAFKA-12648: standardize startup timeout to fix some flaky NamedTopologyIntegrationTest tests (#11824 ) Seen a few of the new tests added fail on PR builds lately with "java.lang.AssertionError: Expected all streams instances in [org.apache.kafka.streams.processor.internals.namedtopology.KafkaStreamsNamedTopologyWrapper@7fb3e6b0] to be RUNNING within 30000 ms" We already had some tests using the 30s timeout while others were bumped all the way up to 60s, I figured we should try out a default timeout of 45s and if we still see failures in specific tests we can go from there	2022-03-01 13:15:53 -08:00
A. Sophie Blee-Goldman	6eb57f6df1	KAFKA-12738: address minor followup and consolidate integration tests of PR #11787 (#11812 ) This PR addresses the remaining nits from the final review of #11787 It also deletes two integration test classes which had only one test in them, and moves the tests to another test class file to save on the time to bring up an entire embedded kafka cluster just for a single run Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-03-01 12:59:18 -08:00
Kowshik Prakasam	67e99a4236	MINOR: Ensure LocalLog.flush is thread safe to recoveryPoint changes (#11814 ) Issue: Imagine a scenario where two threads T1 and T2 are inside UnifiedLog.flush() concurrently: KafkaScheduler thread T1 -> The periodic work calls LogManager.flushDirtyLogs() which in turn calls UnifiedLog.flush(). For example, this can happen due to log.flush.scheduler.interval.ms here. KafkaScheduler thread T2 -> A UnifiedLog.flush() call is triggered asynchronously during segment roll here. Supposing if thread T1 advances the recovery point beyond the flush offset of thread T2, then this could trip the check within LogSegments.values() here for thread T2, when it is called from LocalLog.flush() here. The exception causes the KafkaScheduler thread to die, which is not desirable. Fix: We fix this by ensuring that LocalLog.flush() is immune to the case where the recoveryPoint advances beyond the flush offset. Reviewers: Jun Rao <junrao@gmail.com>	2022-03-01 10:55:17 -08:00
Marc Löhe	14faea4aab	KAFKA-8659: fix SetSchemaMetadata failing on null value and schema (#7082 ) Make SetSchemaMetadata SMT ignore records with null value and valueSchema or key and keySchema. The transform has been unit tested for handling null values gracefully while still providing the necessary validation for non-null values. Reviewers: Konstantine Karantasis<konstantine@confluent.io>, Bill Bejeck <bbejeck@apache.org>	2022-03-01 10:10:43 -05:00
Hao Li	2ccc834faa	KAFKA-13542: add rebalance reason in Kafka Streams (#11804 ) Add rebalance reason in Kafka Streams. Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>	2022-02-28 18:26:46 +01:00
Jason Gustafson	5f91aa7b4c	KAFKA-13698; KRaft authorizer should use host address instead of name (#11807 ) Use `InetAddress.getHostAddress` in `StandardAuthorizer` instead of `InetAddress.getHostName`. Reviewers: Colin Patrick McCabe <cmccabe@confluent.io>	2022-02-26 10:52:34 -08:00
Walker Carlson	abb74d406a	KAFKA-13281: allow #removeNamedTopology while in the CREATED state (#11810 ) We should be able to change the topologies while still in the CREATED state. We already allow adding them, but this should include removing them as well Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2022-02-25 19:11:06 -08:00
Walker Carlson	29317e6953	KAFKA-13281: add API to expose current NamedTopology set (#11808 ) List all the named topologies that have been added to this client Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2022-02-25 19:04:07 -08:00
Jason Gustafson	2c90447a59	KAFKA-13697; KRaft authorizer should support AclOperation.ALL (#11806 ) KRaft authorizer should support AclOperation.ALL correctly. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-02-25 15:43:21 -08:00
A. Sophie Blee-Goldman	c2ee1411c8	KAFKA-12738: send LeaveGroup request when thread dies to optimize replacement time (#11801 ) Quick followup to #11787 to optimize the impact of the task backoff by reducing the time to replace a thread. When a thread is going through a dirty close, ie shutting down from an uncaught exception, we should be sending a LeaveGroup request to make sure the broker acknowledges the thread has died and won't wait up to the `session.timeout` for it to join the group if the user opts to `REPLACE_THREAD` in the handler Reviewers: Walker Carlson <wcarlson@confluent.io>, John Roesler <vvcephei@apache.org>	2022-02-24 16:18:13 -08:00
Zhang Hongyi	15ebad54b4	MINOR: Skip fsync on parent directory to start Kafka on ZOS (#11793 ) Reviewers: Cong Ding <cong@ccding.com>, Jun Rao <junrao@gmail.com>	2022-02-24 13:26:23 -08:00
A. Sophie Blee-Goldman	cd4a1cb410	KAFKA-12738: track processing errors and implement constant-time task backoff (#11787 ) Part 1 in the initial series of error handling for named topologies. *Part 1: Track tasks with errors within a named topology & implement constant-time based task backoff Part 2: Implement exponential task backoff to account for recurring errors Part 3: Pause/backoff all tasks within a named topology in case of a long backoff/frequent errors for any individual task Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>	2022-02-24 12:10:31 -08:00

1 2 3 4 5 ...

9811 Commits All Branches Search

9811 Commits

All Branches