kafka

Commit Graph

Author	SHA1	Message	Date
David Arthur	a7369bd52f	KAFKA-14136 Generate ConfigRecord for brokers even if the value is unchanged (#12483 )	2022-08-04 15:19:49 -04:00
David Arthur	4e049c706f	KAFKA-14111 Fix sensitive dynamic broker configs in KRaft (#12455 ) Enable some of the dynamic broker reconfiguration tests in KRaft mode	2022-08-04 15:19:38 -04:00
David Arthur	89b2bf257b	MINOR: Update 3.2 branch to 3.2.2-SNAPSHOT	2022-07-28 16:42:46 -04:00
David Arthur	b172a0a94f	Bump version to 3.2.1	2022-07-21 20:33:07 -04:00
Viktor Somogyi-Vass	8464e36682	KAFKA-13917: Avoid calling lookupCoordinator() in tight loop (#12417 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-07-21 20:04:39 -04:00
David Arthur	cb14b100ad	Add 3.2.1 upgrade docs (#12424 ) Reviewers: Randall Hauch <rhauch@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	2022-07-20 15:26:26 -04:00
Shawn	d8541b20a1	KAFKA-14024: Consumer keeps Commit offset in onJoinPrepare in Cooperative rebalance (#12349 ) In KAFKA-13310, we tried to fix a issue that consumer#poll(duration) will be returned after the provided duration. It's because if rebalance needed, we'll try to commit current offset first before rebalance synchronously. And if the offset committing takes too long, the consumer#poll will spend more time than provided duration. To fix that, we change commit sync with commit async before rebalance (i.e. onPrepareJoin). However, in this ticket, we found the async commit will keep sending a new commit request during each Consumer#poll, because the offset commit never completes in time. The impact is that the existing consumer will be kicked out of the group after rebalance timeout without joining the group. That is, suppose we have consumer A in group G, and now consumer B joined the group, after the rebalance, only consumer B in the group. Besides, there's also another bug found during fixing this bug. Before KAFKA-13310, we commitOffset sync with rebalanceTimeout, which will retry when retriable error until timeout. After KAFKA-13310, we thought we have retry, but we'll retry after partitions revoking. That is, even though the retried offset commit successfully, it still causes some partitions offsets un-committed, and after rebalance, other consumers will consume overlapping records. Reviewers: RivenSun <riven.sun@zoom.us>, Luke Chen <showuon@gmail.com>	2022-07-20 10:05:23 +08:00
Walker Carlson	56a136d820	Revert "KAFKA-12887 Skip some RuntimeExceptions from exception handler (#11228 )" (#12421 ) This reverts commit `4835c64f` Reviewers: Matthias J. Sax <matthias@confluent.io>	2022-07-19 09:36:44 -07:00
Christopher L. Shannon	ca674d9e17	KAFKA-14079 - Ack failed records in WorkerSourceTask when error tolerance is ALL (#12412 ) Make sure to ack all records where produce failed, when a connector's `errors.tolerance` config property is set to `all`. Acking is essential so that the task will continue to commit future record offsets properly and remove the records from internal tracking, preventing a memory leak. Reviewers: Chris Egerton <fearthecellos@gmail.com>, Randall Hauch <rhauch@gmail.com>	2022-07-18 17:06:45 -05:00
Okada Haruki	b072b3739b	KAFKA-13572 Fix negative preferred replica imbalanced count metric (#12405 ) Currently, preferredReplicaImbalanceCount calculation has a race that becomes negative when topic deletion is initiated simultaneously. This PR addresses the problem by fixing cleanPreferredReplicaImbalanceMetric to be called only once per topic-deletion procedure Reviewers: Luke Chen <showuon@gmail.com>	2022-07-18 14:20:42 +08:00
Kirk True	c873d9d7ae	KAFKA-14062: OAuth client token refresh fails with SASL extensions (#12398 ) - Different objects should be considered unique even with same content to support logout - Added comments for SaslExtension re: removal of equals and hashCode - Also swapped out the use of mocks in exchange for real SaslExtensions so that we exercise the use of default equals() and hashCode() methods. - Updates to implement equals and hashCode and add tests in SaslExtensionsTest to confirm Co-authored-by: Purshotam Chauhan <pchauhan@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2022-07-12 14:30:50 +05:30
Eugene Tolbakov	74e24deedb	KAFKA-14013: Limit the length of the `reason` field sent on the wire (#12388 ) KIP-800 added the `reason` field to the JoinGroupRequest and the LeaveGroupRequest as I mean to provide more information to the group coordinator. In https://issues.apache.org/jira/browse/KAFKA-13998, we discovered that the size of the field is limited to 32767 chars by our serialisation mechanism. At the moment, the field either provided directly by the user or constructed internally is directly set regardless of its length. This patch sends only the first 255 chars of the used provided or internally generated reason on the wire. Given the purpose of this field, that seems acceptable and that should still provide enough information to operators to understand the cause of a rebalance. Reviewers: David Jacot <djacot@confluent.io>	2022-07-12 09:38:06 +02:00
Jason Gustafson	b29cb162e3	KAFKA-14055; Txn markers should not be removed by matching records in the offset map (#12390 ) When cleaning a topic with transactional data, if the keys used in the user data happen to conflict with the keys in the transaction markers, it is possible for the markers to get removed before the corresponding data from the transaction is removed. This results in a hanging transaction or the loss of the transaction's atomicity since it would effectively get bundled into the next transaction in the log. Currently control records are excluded when building the offset map, but not when doing the cleaning. This patch fixes the problem by checking for control batches in the `shouldRetainRecord` callback. Reviewers: Jun Rao <junrao@gmail.com>	2022-07-10 10:20:09 -07:00
Divij Vaidya	1180f5e10c	KAFKA-13474: Allow reconfiguration of SSL certs for broker to controller connection (#12381 ) What: When a certificate is rotated on a broker via dynamic configuration and the previous certificate expires, the broker to controller connection starts failing with SSL Handshake failed. Why: A similar fix was earlier performed in #6721 but when BrokerToControllerChannelManager was introduced in v2.7, we didn't enable dynamic reconfiguration for it's channel. Summary of testing strategy (including rationale) Add a test which fails prior to the fix done in the PR and succeeds afterwards. The bug wasn't caught earlier because there was no test coverage to validate the scenario. Reviewers: Luke Chen <showuon@gmail.com>	2022-07-09 18:09:44 +08:00
Niket	d4a01afe5e	KAFKA-14035; Fix NPE in `SnapshottableHashTable::mergeFrom()` (#12371 ) The NPE causes the kraft controller to be in an inconsistent state. Reviewers: Jason Gustafson <jason@confluent.io>	2022-06-30 21:05:00 -07:00
Luke Chen	f1d4e6c726	KAFKA-14010: AlterPartition request won't retry when receiving retriable error (#12362 ) Reviewers: David Jacot <djacot@confluent.io>	2022-07-01 10:51:57 +08:00
James Hughes	8fb8b496cc	KAFKA-13998: JoinGroupRequestData 'reason' can be too large (#12298 ) The `reason` field cannot contain more than 32767 chars. We did not expect to ever reach this but it turns out that it is possible if the the message provided in the `Throwable` somehow contains the entire stack trace. This patch ensure that the reason crafted based on exceptions remain small. Co-authored-by: David Jacot <djacot@confluent.io> Reviewers: Bruno Cadonna <cadonna@apache.org>, A. Sophie Blee-Goldman <ableegoldman@apache.org>, David Jacot <djacot@confluent.io>	2022-06-20 16:48:06 +02:00
Guozhang Wang	b61edf2037	HOTFIX: add space to avoid checkstyle failure	2022-06-06 11:34:13 -07:00
Guozhang Wang	173b8fd26d	HOTFIX: only try to clear discover-coordinator future upon commit (#12244 ) This is another way of fixing KAFKA-13563 other than #11631. Instead of letting the consumer to always try to discover coordinator in pool with either mode (subscribe / assign), we defer the clearance of discover future upon committing async only. More specifically, under manual assign mode, there are only three places where we need the coordinator: * commitAsync (both by the consumer itself or triggered by caller), this is where we want to fix. * commitSync, which we already try to re-discovery coordinator. * committed (both by the consumer itself based on reset policy, or triggered by caller), which we already try to re-discovery coordinator. The benefits are that for manual assign mode that does not try to trigger any of the above three, then we never would be discovering coordinator. The original fix in #11631 would let the consumer to discover coordinator even if none of the above operations are required. Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2022-06-06 11:07:11 -07:00
Luke Chen	90db4f47d6	KAFKA-13773: catch kafkaStorageException to avoid broker shutdown directly (#12136 ) When logManager startup and loadLogs, we expect to catch any IOException (ex: out of space error) and turn the log dir into offline. Later, we'll handle the offline logDir in ReplicaManage, so that the cleanShutdown file won't be created when all logDirs are offline. The reason why the broker shutdown with cleanShutdown file after full disk is because during loadLogs and do log recovery, we'll write leader-epoch-checkpoint fil. And if any IOException thrown, we'll wrap it as KafkaStorageException and rethrow. And since we don't catch KafkaStorageException, so the exception is caught in the other place and go with clean shutdown path. This PR is to fix the issue by catching the KafkaStorageException with IOException cause exceptions during loadLogs, and mark the logDir as offline to let the ReplicaManager handle the offline logDirs. Reviewers: Jun Rao <jun@confluent.io>, Alok Thatikunta <alok123thatikunta@gmail.com>	2022-06-04 16:20:43 +08:00
nicolasguyomar	04e6b6e40e	MINOR: Replace left single quote with single quote in Connect worker's log message (#12201 ) Minor change to use ' and not LEFT SINGLE QUOTATION MARK in this log message, as it's the only place we are using such a quote and it can break ingestion pipelines Reviewers: Kvicii <Karonazaba@gmail.com>, Divij Vaidya <diviv@amazon.com>, Konstantine Karantasis <k.karantasis@gmail.com>	2022-05-25 10:42:27 -07:00
Lucas Bradstreet	f463b6c8a4	MINOR: fix Connect system test runs with JDK 10+ (#12202 ) When running our Connect system tests with JDK 10+, we hit the error AttributeError: 'ClusterNode' object has no attribute 'version' because util.py attempts to check the version variable for non-Kafka service objects. Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>	2022-05-25 10:26:15 -07:00
Akhilesh Chaganti	49226721c0	KAFKA-13861; Fix the validateOnly behavior for CreatePartitions requests in KRaft mode (#12106 ) The KRaft implementation of the `CreatePartitions` ignores the `validateOnly` flag in the request and creates the partitions if the validations are successful. Fixed the behavior not to create partitions upon validation if the `validateOnly` flag is true. Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-05-11 11:14:45 -07:00
Ismael Juma	abcaa109e0	MINOR: Remove kraft authorizer from list of missing features (#12146 ) Also tweak the list of missing features a bit Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>	2022-05-11 09:46:57 -07:00
Ismael Juma	c761ee0348	MINOR: reload4j build dependency fixes (#12144 ) * Replace `log4j` with `reload4j` in `copyDependantLibs`. Since we have some projects that have an explicit `reload4j` dependency, it was included in the final release release tar - i.e. it was effectively a workaround for this bug. * Exclude `log4j` and `slf4j-log4j12` transitive dependencies for `streams:upgrade-system-tests`. Versions 0100 and 0101 had a transitive dependency to `log4j` and `slf4j-log4j12` via `zkclient` and `zookeeper`. This avoids classpath conflicts that lead to [NoSuchFieldError](https://github.com/qos-ch/reload4j/issues/41) in system tests. Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-10 20:27:23 -07:00
Bruno Cadonna	8c8d71c024	MINOR: Update 3.2 branch version to 3.2.1-SNAPSHOT	2022-05-09 21:15:53 +02:00
Bruno Cadonna	1d9a09c586	Merge tag '3.2.0-rc1' into 3.2 3.2.0-rc1	2022-05-09 21:03:06 +02:00
Bruno Cadonna	e566e53204	MINOR: Fix link to old doc in quickstart (#12129 ) In Kafka's quickstart a link points to the 2.5 Kafka Streams demo. This PR fixes this link.	2022-05-06 11:44:37 +02:00
Bruno Cadonna	38103ffaa9	Bump version to 3.2.0	2022-05-03 14:52:45 +02:00
Bruno Cadonna	aaf56bea53	MINOR: Note that slf4j-log4j in version 1.7.35+ should be used (#12114 ) Adds a note to the upgrade notes to use slf4j-log4j version 1.7.35+ [1] or slf4j-reload4j to avoid possible compatibility issues originating from the logging framework [2]. [1] https://www.slf4j.org/manual.html#swapping [2] https://www.slf4j.org/codes.html#no_tlm Reviewer: Ismael Juma <ismael@juma.me.uk>	2022-05-03 12:25:45 +02:00
Ismael Juma	50d88ab8d8	KAFKA-13794: Fix comparator of inflightBatchesBySequence in TransactionsManager (round 3) (#12096 ) Conceptually, the ordering is defined by the producer id, producer epoch and the sequence number. This set should generally only have entries for the same producer id and epoch, but there is one case where we can have conflicting `remove` calls and hence we add this as a temporary safe fix. We'll follow-up with a fix that ensures the original intended invariant. Reviewers: Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>	2022-04-28 06:22:08 -07:00
Philip Nee	e51c43c049	KAFKA-12841: Remove an additional call of onAcknowledgement (#12064 ) The bug was introduced in #11689 that an additional onAcknowledgement was made using the InterceptorCallback class. This is undesirable since onSendError will attempt to call onAcknowledgement once more. Reviewers: Jun Rao <junrao@gmail.com>	2022-04-27 10:18:54 +02:00
Bruno Cadonna	5991baecdc	MINOR: Make link in quickstart dynamic (#12057 ) Reviewer: Matthias J. Sax <mjsax@apache.org>	2022-04-15 16:23:02 +02:00
Bruno Cadonna	b87c12fc1b	[MINOR] Update upgrade documentation for 3.2 (#12055 ) Reviewer: Bruno Cadonna <cadonna@apache.org>	2022-04-15 10:24:19 +02:00
Bruno Cadonna	5f1e8e4d49	MINOR: Update LICENSE-binary (#12051 ) Updates the license file. Reviewer: Bill Bejeck <bbejeck@apache.org>	2022-04-14 22:25:02 +02:00
Hao Li	75b4d06043	KAFKA-13542: Add rebalance reason in Kafka Streams (#12018 ) Reviewers: Bruno Cadonna <bruno@confluent.io>, David Jacot <djacot@confluent.io>	2022-04-13 13:49:56 +02:00
David Jacot	4e7cc335ee	KAFKA-13828; Ensure reasons sent by the consumer are small (#12043 ) This PR reworks the reasons used in the ConsumerCoordinator to ensure that they remain reasonably short. Reviewers: Bruno Cadonna <bruno@confluent.io>	2022-04-13 13:42:54 +02:00
Luke Chen	821275e6b3	KAFKA-10405: Set purge interval explicitly in PurgeRepartitionTopicIntegrationTest (#11948 ) In KIP-811, we added a new config repartition.purge.interval.ms to set repartition purge interval. In this flaky test, we expected the purge interval is the same as commit interval, which is not correct anymore (default is 30 sec). Set the purge interval explicitly to fix this issue. Reviewers: Bruno Cadonna <cadonna@apache.org>, Guozhang Wang <wangguoz@gmail.com>	2022-04-13 10:32:00 +02:00
Yang Yu	1181825b79	KAFKA-13761: KafkaLog4jAppender deadlocks when idempotence is enabled (#11939 ) When a log entry is appended to a Kafka topic using KafkaLog4jAppender, the producer.send operation may hit a deadlock if the producer network thread also tries to append a log at the same log level. This issue is triggered when idempotence is enabled for the KafkaLog4jAppender and the producer tries to acquire the TransactionManager lock. This is a temporary workaround to avoid deadlocks by disabling idempotence explicitly in KafkaLog4jAppender. Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <ismael@juma.me.uk>	2022-04-11 20:30:12 +02:00
Xiaoyue Xue	1483a86591	KAFKA-13794: Follow up to fix producer batch comparator (#12006 ) In comparator, objects that are not equal need to have a stable order otherwise, binary search may not find the objects. Improve the producer batch comparator Reviewers: Luke Chen <showuon@gmail.com>	2022-04-09 11:02:01 +08:00
Alok Nikhil	424fbc9381	MINOR: Fix DescribeLogDirs API error handling for older API versions (#12017 ) With KAFKA-13527 / KIP-784 we introduced a new top-level error code for the DescribeLogDirs API for versions 3 and above. However, the change regressed the error handling for versions less than 3 since the response converter fails to write the non-zero error code out (rightly) for versions lower than 3 and drops the response to the client which eventually times out instead of receiving an empty log dirs response and processing that as a Cluster Auth failure. With this change, the API conditionally propagates the error code out to the client if the request API version is 3 and above. This keeps the semantics of the error handling the same for all versions and restores the behavior for older versions. See current behavior in the broker log: ```bash ERROR] 2022-04-08 01:22:56,406 [data-plane-kafka-request-handler-10] kafka.server.KafkaApis - [KafkaApi-0] Unexpected error handling request RequestHeader(apiKey=DESCRIBE_LOG_DIRS, apiVersion=0, clientId=sarama, correlationId=1) -- DescribeLogDirsRequestData(topics=null) org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default errorCode at version 0 [ERROR] 2022-04-08 01:22:56,407 [data-plane-kafka-request-handler-10] kafka.server.KafkaRequestHandler - [Kafka Request Handler 10 on Broker 0], Exception when handling request org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default errorCode at version 0 ``` Reviewers: Ismael Juma <ismael@juma.me.uk>	2022-04-08 12:54:47 -07:00
Ismael Juma	f059c82e6e	MINOR: Fix support for custom commit ids in the build (#12014 ) This regressed in `ca375d8004` due to a typo. We need tests for our builds. :) I verified that passing the commitId via `-PcommitId=123` works correctly. Reviewers: Ismael Juma <ismael@juma.me.uk>	2022-04-08 08:56:57 -07:00
Tom Bentley	210b98ad45	MINOR: Mention KAFKA-13748 in release notes (#11994 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Bruno Cadonna <bruno@confluent.io>	2022-04-06 10:30:25 +01:00
Jason Gustafson	df0385435e	KAFKA-13782; Ensure correct partition added to txn after abort on full batch (#11995 ) Fixes a regression introduced in https://github.com/apache/kafka/pull/11452. Following [KIP-480](https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner), the `Partitioner` will receive a callback when a batch has been completed so that it can choose another partition. Because of this, we have to wait until the batch has been successfully appended to the accumulator before adding the partition in `TransactionManager.maybeAddPartition`. This is still safe because the `Sender` cannot dequeue a batch from the accumulator until it has been added to the transaction successfully. Reviewers: Artem Livshits <84364232+artemlivshits@users.noreply.github.com>, David Jacot <djacot@confluent.io>, Tom Bentley <tbentley@redhat.com>	2022-04-05 10:23:51 -07:00
Xiaoyue Xue	e17e2045f4	KAFKA-13794; Fix comparator of `inflightBatchesBySequence` in `TransactionManager` (#11991 ) Fixes a bug in the comparator used to sort producer inflight batches for a topic partition. This can cause batches in the map `inflightBatchesBySequence` to be removed incorrectly: i.e. one batch may be removed by another batch with the same sequence number. This leads to an `IllegalStateException` when the inflight request finally returns. This patch fixes the comparator to check equality of the `ProducerBatch` instances if the base sequences match. Reviewers: Jason Gustafson <jason@confluent.io>	2022-04-05 10:13:47 -07:00
Bounkong Khamphousone	90bd03a0a2	fix: make sliding window works without grace period (#kafka-13739) (#11928 ) Fix upperbound for sliding window, making it compatible with no grace period (kafka-13739) Added unit test for early sliding window and "normal" sliding window for both events within one time difference (small input) and above window time difference (large input). Fixing this window interval may slightly change stream behavior but probability to happen is extremely slow and may not have a huge impact on the result given. Reviewers Leah Thomas <lthomas@confluent.io>, Bill Bejeck <bbejeck@apache.org>	2022-03-31 10:09:25 -04:00
Yu	0f25205ab4	KAFKA-13772: Partitions are not correctly re-partitioned when the fetcher thread pool is resized (#11953 ) Partitions are assigned to fetcher threads based on their hash modulo the number of fetcher threads. When we resize the fetcher thread pool, we basically re-distribute all the partitions based on the new fetcher thread pool size. The issue is that the logic that resizes the fetcher thread pool updates the `fetcherThreadMap` while iterating over it. The `Map` does not give any guarantee in this case - especially when the underlying map is re-hashed - and that led to not iterating over all the fetcher threads during the process and thus in leaving some partitions in the wrong fetcher threads. Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2022-03-31 14:53:52 +02:00
David Jacot	ce86a54bdc	KAFKA-13783; Remove reason prefixing in JoinGroupRequest and LeaveGroupRequest (#11971 ) KIP-800 introduced a mechanism to pass a reason in the join group request and in the leave group request. A default reason is used unless one is provided by the user. In this case, the custom reason is prefixed by the default one. When we tried to used this in Kafka Streams, we noted a significant degradation of the performances, see https://github.com/apache/kafka/pull/11873. It is not clear wether the prefixing is the root cause of the issue or not. To be on the safe side, I think that we should remove the prefixing. It does not bring much anyway as we are still able to distinguish a custom reason from the default one on the broker side. This patch removes prefixing the user provided reasons. So if a the user provides a reason, the reason is used directly. If the reason is empty or null, the default reason is used. Reviewers: Luke Chen <showuon@gmail.com>, <jeff.kim@confluent.io>, Hao Li <hli@confluent.io>	2022-03-31 14:32:03 +02:00
dengziming	72809cce96	MINOR: Fix an uncompatible bug in GetOffsetShell (#11936 ) In KIP-815 we replaced KafkaConsumer with AdminClient in GetOffsetShell. In the previous implementation, partitions were just ignored if there is no offset for them, however, we will print -1 instead now, This PR fix this inconsistency. Reviewers: David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>	2022-03-31 10:35:56 +08:00
Konstantine Karantasis	42c804005e	KAFKA-13748: Do not include file stream connectors in Connect's CLASSPATH and plugin.path by default (#11908 ) With this change we stop including the non-production grade connectors that are meant to be used for demos and quick starts by default in the CLASSPATH and plugin.path of Connect deployments. The package of these connector will still be shipped with the Apache Kafka distribution and will be available for explicit inclusion. The changes have been tested through the system tests and the existing unit and integration tests. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Randall Hauch <rhauch@gmail.com>	2022-03-30 13:17:32 -07:00

1 2 3 4 5 ...

9909 Commits All Branches Search

9909 Commits

All Branches