kafka

Commit Graph

Author	SHA1	Message	Date
Ritika Reddy	c6b44b5d66	Cherry Pick KAFKA-19367 to 4.0 (#19958 ) CI / build (push) Has been cancelled Details [`0b2e410d61`](url) Bug fix in 4.0 Conflicts: - The Transaction Coordinator had some conflicts, mainly with the transaction states. Ex: ongoing in 4.0 is TransactionState.ONGOING in 4.1. - The TransactionCoordinatorTest file had conflicts w.r.t the 2PC changes from KIP-939 in 4.1 and the above mentioned state changes Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits <alivshits@confluent.io>	2025-06-14 11:40:00 -07:00
Alyssa Huang	ded7653066	MINOR: Fix some Request toString methods (#19655 ) (#19689 ) CI / build (push) Has been cancelled Details Reviewers: Colin P. McCabe <cmccabe@apache.org> ``` Conflicts: clients/src/main/java/org/apache/kafka/common/requests/AlterUserScramCredentialsRequest.java - import statement clients/src/main/java/org/apache/kafka/common/requests/IncrementalAlterConfigsRequest.java - import statement core/src/test/scala/unit/kafka/server/KafkaApisTest.scala - different logging and metadatacache instantiation ``` Cherry-Picked-From: `042be5b9ac` Cherry-Picked-By: Alyssa Huang <ahuang@confluent.io> Cherry-Picked-At: Mon May 12 11:01:47 2025 -0700	2025-05-27 15:37:29 -07:00
David Jacot	d7d7876989	KAFKA-19274; Group Coordinator Shards are not unloaded when `__consumer_offsets` topic is deleted (#19713 ) CI / build (push) Waiting to run Details Group Coordinator Shards are not unloaded when `__consumer_offsets` topic is deleted. The unloading is scheduled but it is ignored because the epoch is equal to the current epoch: ``` [2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1] Scheduling unloading of metadata for __consumer_offsets-0 with epoch OptionalInt[0] (org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime) [2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1] Scheduling unloading of metadata for __consumer_offsets-1 with epoch OptionalInt[0] (org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime) [2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1] Ignored unloading metadata for __consumer_offsets-0 in epoch OptionalInt[0] since current epoch is 0. (org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime) [2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1] Ignored unloading metadata for __consumer_offsets-1 in epoch OptionalInt[0] since current epoch is 0. (org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime) ``` This patch fixes the issue by not setting the leader epoch in this case. The coordinator expects the leader epoch to be incremented when the resignation code is called. When the topic is deleted, the epoch is not incremented. Therefore, we must not use it. Note that this is aligned with deleted partitions are handled too. Reviewers: Dongnuo Lyu <dlyu@confluent.io>, José Armando García Sancio <jsancio@apache.org>	2025-05-15 19:12:42 +02:00
Kamal Chandraprakash	c2068878c9	KAFKA-19131: Adjust remote storage reader thread maximum pool size to avoid illegal argument (#19629 ) CI / build (push) Has been cancelled Details The remote storage reader thread pool use same count for both maximum and core size. If users adjust the pool size larger than original value, it throws `IllegalArgumentException`. Updated both value to fix the issue. cherry-pick PR: #19532 cherry-pick commit: `965743c35b` --------- Signed-off-by: PoAn Yang <payang@apache.org> Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang <payang@apache.org> Co-authored-by: PoAn Yang <payang@apache.org>	2025-05-04 18:53:16 +05:30
Colin Patrick McCabe	0297ba2c67	KAFKA-19192; Old bootstrap checkpoint files cause problems updated servers (#19545 ) Old bootstrap.metadata files cause problems with server that include KAFKA-18601. When the server tries to read the bootstrap.checkpoint file, it will fail if the metadata.version is older than 3.3-IV3 (feature level 7). This causes problems when these clusters are upgraded. This PR makes it possible to represent older MVs in BootstrapMetadata objects without causing an exception. An exception is thrown only if we attempt to access the BootstrapMetadata. This ensures that only the code path in which we start with an empty metadata log checks that the metadata version is 7 or newer. Reviewers: José Armando García Sancio <jsancio@apache.org>, Ismael Juma <ismael@juma.me.uk>, PoAn Yang <payang@apache.org>, Liu Zeyu <zeyu.luke@gmail.com>, Alyssa Huang <ahuang@confluent.io>	2025-04-24 16:55:22 -04:00
Rajini Sivaram	f98dec9440	KAFKA-19147: Start authorizer before group coordinator to ensure coordinator authorizes regex topics (#19488 ) [KAFKA-18813](https://issues.apache.org/jira/browse/KAFKA-18813) added `Topic:Describe` authorization of topics matching regex patterns to the group coordinator since it was difficult to authorize these in the broker when processing consumer heartbeats using the new protocol. But group coordinator is started in `BrokerServer` before the authorizer is created. And hence group coordinator doesn't have an authorizer and never performs authorization. As a result, topics that are not authorized for `Describe` may be assigned to consumers. This potentially leaks information about topic existence, topic id and partition count to users who are not authorized to describe a topic. This PR starts authorizer earlier to ensure that authorization is performed by the group coordinator. Also adds integration tests for verification. Note that we still have a second issue when members have different permissions. If regex is resolved by a member with permission to more topics, unauthorized topics may be assigned to members with lower permissions. In this case, we still return assignment containing topic id and partitions to the member without `Topic:Describe` access. This is not addressed by this PR, but an integration test that illustrates the issue has been added so that we can verify when the issue is fixed. Reviewers: David Jacot <david.jacot@gmail.com>	2025-04-16 13:08:37 +01:00
Azhar Ahmed	143fcb1d7c	KAFKA-19071: Fix doc for remote.storage.enable (#19345 ) As of 3.9, Kafka allows disabling remote storage on a topic after it was enabled. It allows subsequent enabling and disabling too. However the documentation says otherwise and needs to be corrected. Doc: https://kafka.apache.org/39/documentation/#topicconfigs_remote.storage.enable Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>	2025-04-14 11:10:22 +08:00
José Armando García Sancio	83f6a1d7e6	KAFKA-18991; Missing change for cherry-pick	2025-04-09 12:52:15 -04:00
TengYao Chi	952c8a5e94	KAFKA-18991: FetcherThread should match leader epochs between fetch request and fetch state (#19223 ) This PR fixes a potential issue where the `FetchResponse` returns `divergingEndOffsets` with an older leader epoch. This can lead to committed records being removed from the follower's log, potentially causing data loss. In detail: `processFetchRequest` gets the requested leader epoch of partition data by `topicPartition` and compares it with the leader epoch of the current fetch state. If they don't match, the response is ignored. Reviewers: Jun Rao <junrao@gmail.com>	2025-04-09 12:32:22 -04:00
José Armando García Sancio	4dbe4739bd	KAFKA-18723; Better handle invalid records during replication (#18852 ) For the KRaft implementation there is a race between the network thread, which read bytes in the log segments, and the KRaft driver thread, which truncates the log and appends records to the log. This race can cause the network thread to send corrupted records or inconsistent records. The corrupted records case is handle by catching and logging the CorruptRecordException. The inconsistent records case is handle by only appending record batches who's partition leader epoch is less than or equal to the fetching replica's epoch and the epoch didn't change between the request and response. For the ISR implementation there is also a race between the network thread and the replica fetcher thread, which truncates the log and appends records to the log. This race can cause the network thread send corrupted records or inconsistent records. The replica fetcher thread already handles the corrupted record case. The inconsistent records case is handle by only appending record batches who's partition leader epoch is less than or equal to the leader epoch in the FETCH request. Reviewers: Jun Rao <junrao@apache.org>, Alyssa Huang <ahuang@confluent.io>, Chia-Ping Tsai <chia7712@apache.org>	2025-04-09 11:15:52 -04:00
Jorge Esteban Quilcate Otoya	617c96cea4	KAFKA-15931: Cancel RemoteLogReader gracefully (#19331 ) Backports `f24945b519` to 4.0 Instead of reopening the transaction index, it cancels the RemoteFetchTask without interrupting it--avoiding to close the TransactionIndex channel. This will lead to complete the execution of the remote fetch but ignoring the results. Given that this is considered a rare case, we could live with this. If it becomes a performance issue, it could be optimized. Reviewers: Jun Rao <junrao@gmail.com>	2025-04-01 16:22:53 -07:00
PoAn Yang	4dd893ba21	KAFKA-806 Index may not always observe log.index.interval.bytes (#18842 ) Currently, each log.append() will add at most 1 index entry, even when the appended data is larger than log.index.interval.bytes. One potential issue is that if a follower restarts after being down for a long time, it may fetch data much bigger than log.index.interval.bytes at a time. This means that fewer index entries are created, which can increase the fetch time from the consumers. (cherry picked from commit `e124d3975b`) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-03-20 16:38:54 +08:00
Colin Patrick McCabe	5c0b8bef07	KAFKA-18920: The kcontrollers must set kraft.version in ApiVersionsResponse (#19127 ) The kafka controllers need to set kraft.version in their ApiVersionsResponse messages according to the current kraft.version reported by the Raft layer. Instead, currently they always set it to 0. Also remove FeatureControlManager.latestFinalizedFeatures. It is not needed and it does a lot of copying. Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-03-07 13:47:05 -08:00
Logan Zhu	9311a25c2f	KAFKA-18886 add behavior change of CreateTopicPolicy and AlterConfigPolicy to zk2kraft (#19087 ) 1. Updated JavaDoc to reflect that CreateTopicPolicy and AlterConfigPolicy run on the controller in KRaft mode. 2. Modified Behavioral Change Reference in the HTML docs to include this change. 3. add warning message to KafkaConfig if the config of broker node has policy configs Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-03-05 15:15:42 +08:00
Mahsa Seifikar	43a7a3acfd	MINOR: Prevent broker fencing by adjusting resendExponentialBackoff in BrokerLifecycleManager (#19061 ) This PR reduces `maxInterval` for `resendExponentialBackoff` in `BrokerLifecycleManager` class from `broker.session.timeout.ms` to half of its value. Setting `maxInterval` to `broker.session.timeout.ms` caused brokers to be fenced if a resend attempt occurred near the timeout threshold, leading to unnecessary broker fencing. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2025-03-04 12:06:53 -08:00
David Jacot	1b424d2f3d	KAFKA-18916; Resolved regular expressions must update the group by topics data structure (#19088 ) When regular expressions are resolved, they do not update the group by topics data structure. Hence, topic changes (e.g. deletion) do not trigger a rebalance of the group. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>	2025-03-04 15:32:10 +01:00
Dongnuo Lyu	8e44ddccb5	KAFKA-18813: ConsumerGroupHeartbeat API and ConsumerGroupDescribe API… (#19042 ) … must check topic describe (#18989) This patch filters out the topic describe unauthorized topics from the ConsumerGroupHeartbeat and ConsumerGroupDescribe response. In ConsumerGroupHeartbeat, - if the request has `subscribedTopicNames` set, we directly check the authz in `KafkaApis` and return a topic auth failure in the response if any of the topics is denied. - Otherwise, we check the authz only if a regex refresh is triggered and we do it based on the acl of the consumer that triggered the refresh. If any of the topic is denied, we filter it out from the resolved subscription. In ConsumerGroupDescribe, we check the authz of the coordinator response. If any of the topic in the group is denied, we remove the described info and add a topic auth failure to the described group. (similar to the group auth failure) (cherry picked from commit `36f19057e1`) Reviewers: David Jacot <djacot@confluent.io>, Lianet Magrans <lmagrans@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>, Chia-Ping Tsai <chia7712@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, TengYao Chi <kitingiao@gmail.com>	2025-02-26 16:53:46 -05:00
PoAn Yang	eacf49f320	KAFKA-18281: Kafka is improperly validating non-advertised listeners for routable controller addresses (#18387 ) When a cluster is configured with a dynamic controller quorum, KRaft replica's endpoint are computed using the advertised.listeners property and not the quorum.controller.voters property. This change in the configuration makes it difficult to keeping all previous node configurations compatible with the new endpoint discovery functionality. The least intrusive solution is to rely on Kafka's reverse hostname lookup when the hostname is not specified. The effective advertised controller listener now remove '0.0.0.0' hostname if the endpoint came from the listener configuration and not the advertised.listener configuration. Reviewers: José Armando García Sancio <jsancio@apache.org>, Alyssa Huang <ahuang@confluent.io>	2025-02-24 19:52:28 -07:00
Ismael Juma	4e9d2feabc	MINOR: Remove request log space added inadvertently (#19011 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-23 13:19:25 -08:00
Calvin Liu	8383c88a70	MINOR: Move the ELR default version to 4.1 (#18954 ) - ELR is enabled (ELRV_1) by default if the cluster is created with its bootstrap metadata version >= IBP_4_1_IV0. - ELRV_1 can be manually enabled iff the metadata version is >= IBP_4_0_IV1. Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin P. McCabe <cmccabe@apache.org>, David Jacot <djacot@confluent.io>	2025-02-21 16:13:47 +01:00
TengYao Chi	2f181c92d3	KAFKA-18737 KafkaDockerWrapper setup functions fails due to storage format command (#18844 ) The current Docker Hub documentation for Kafka is based on the use of static voters. Since Kafka 4.0 utilizes dynamic voters, users following the doc of docker hub may encounter unexpected behavior. Due to the limited time available for the 4.0.0 release, a simple and quick solution is to revert to using static voters within the Docker image. This can be achieved by adding a configuration file with static voter definitions to the kafka/docker folder, keeping it separate from the main kafka/config directory. This approach allows us to encourage the use of dynamic voters in typical deployments while maintaining compatibility within the Docker image. Reviewers: Vedarth Sharma <142404391+VedarthConfluent@users.noreply.github.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-21 20:45:23 +08:00
Shivsundar R	c7db98e15e	KAFKA-18829: Added check before converting to IMPLICIT mode (#18964 ) (Cherry-pick) (#18982 ) Cherry-picked `3603c8fe35` into 4.0. This was a bug fix to address https://issues.apache.org/jira/browse/KAFKA-18829. Now, we will only move to IMPLICIT mode in `ShareConsumerImpl`, if there were any records to be acknowledged, and if the next `poll()`/`commitAsync()`/`commitSync()` was called. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-02-21 09:28:41 +00:00
TengYao Chi	d3791c39e3	KAFKA-18831 Migrating to log4j2 introduce behavior changes of adjusting level dynamically (#18969 ) fix the following behavior changes. 1) in log4j 1, users can't change the logger by parent if the logger is declared by properties explicitly. For example, `org.apache.kafka.controller` has level explicitly in the properties. Hence, we can't use "org.apache.kafka=INFO" to change the level of `org.apache.kafka.controller` to INFO. By contrast, log4j2 allows us to change all child loggers by the parent logger. 2) in log4j2, we can change the level of root to impact all loggers' level. By contrast, log4j 1 can't. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-21 16:16:38 +08:00
Calvin Liu	8a08ca676f	MINOR: Deflake EligibleLeaderReplicasIntegrationTest (#18923 ) Make sure to give enough time for the partition ISR updates. Reviewers: David Jacot <djacot@confluent.io>	2025-02-20 14:14:54 +01:00
Ismael Juma	3fbbd0a3ee	KAFKA-18601: Assume a baseline of 3.3 for server protocol versions (#18845 ) 3.3.0 was the first KRaft release that was deemed production-ready and also when KIP-778 (KRaft to KRaft upgrades) landed. Given that, it's reasonable for 4.x to only support upgrades from 3.3.0 or newer (the metadata version also needs to be set to "3.3" or newer before upgrading). Noteworthy changes: 1. `AlterPartition` no longer includes topic names, which makes it possible to simplify `AlterParitionManager` logic. 2. Metadata versions older than `IBP_3_3_IV3` have been removed and `IBP_3_3_IV3` is now the minimum version. 3. `MINIMUM_BOOTSTRAP_VERSION` has been removed. 4. Removed `isLeaderRecoverySupported`, `isNoOpsRecordSupported`, `isKRaftSupported`, `isBrokerRegistrationChangeRecordSupported` and `isInControlledShutdownStateSupported` - these are always `true` now. Also removed related conditional code. 5. Removed default metadata version or metadata version fallbacks in multiple places - we now fail-fast instead of potentially using an incorrect metadata version. 6. Update `MetadataBatchLoader.resetToImage` to set `hasSeenRecord` based on whether image is empty - this was a previously existing issue that became more apparent after the changes in this PR. 7. Remove `ibp` parameter from `BootstrapDirectory` 8. A number of tests were not useful anymore and have been removed. I will update the upgrade notes via a separate PR as there are a few things that need changing and it would be easier to do so that way. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Justine Olshan <jolshan@confluen.io>, Ken Huang <s7133700@gmail.com>	2025-02-19 05:57:04 -08:00
PoAn Yang	546d9ce39b	KAFKA-18773 Migrate the log4j1 config to log4j 2 for native image and README (#18872 ) - update reflection-config.json and resource-config.json to include log4j2 and jackson - remove unused jackson scala library - fix the incorrect path of log4j2.yaml - adopt workaround (--standalone) to make this PR work and it will be fixed by KAFKA-18737) Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-19 00:49:23 +08:00
PoAn Yang	c4ea05f684	KAFKA-18784 Fix ConsumerWithLegacyMessageFormatIntegrationTest (#18889 ) In PR #18267, we removed old message format for cases in ConsumerWithLegacyMessageFormatIntegrationTest. Although test cases can pass, they don't fulfill original purpose. We can't send old message format since 4.0, so I change cases to append old records by ReplicaManager directly. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-17 20:43:49 +08:00
Ming-Yen Chung	6abb4775b9	KAFKA-18790 Fix testCustomQuotaCallback (#18906 ) Frequently updating the trust store can cause unexpected termination of the AsyncConsumer background thread. 1. To resolve this issue, reuse the same AdminClient instead of recreating it. 2. Add error logging when fail to initialize resources for the consumer network thread. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-15 03:08:30 +08:00
Calvin Liu	720f4f446d	KAFKA-18654[2/2]: Transction V2 retry add partitions on the server side when handling produce request. (#18810 ) During the transaction commit phase, it is normal to hit CONCURRENT_TRANSACTION error before the transaction markers are fully propagated. Instead of letting the client to retry the produce request, it is better to retry on the server side. Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>	2025-02-13 09:33:20 -08:00
PoAn Yang	da641dcc62	KAFKA-18771: fix Flaky test KRaftClusterTest .testDescribeQuorumRequestToControllers (#18859 ) The case testDescribeQuorumRequestToControllers shutdowns raft client but not the controller. This makes client has chance to send a request to the controller and get NOT_LEADER_OR_FOLLOWER error. However, if the raft client finishes shutdown before handling the request, the request will not be handled. Shutdown the controller before doing KafkaFuture#get for the client request, so we can make sure the request is handled by another controller eventually. Signed-off-by: PoAn Yang <payang@apache.org> Reviewers: Luke Chen <showuon@gmail.com>	2025-02-12 16:18:18 +08:00
Edoardo Comar	7fb3879799	KAFKA-18758: NullPointerException in shutdown following InvalidConfigurationException (#18833 ) * KAFKA-18758: NullPointerException in shutdown following InvalidConfigurationException Add checks for null in shutdown as BrokerLifecycleManager is not instantiaited if LogManager constructor throws an Exception	2025-02-11 10:18:10 +00:00
TengYao Chi	438fbf5f8e	KAFKA-18396: Migrate log4j1 configuration to log4j2 in KafkaDockerWrapper (#18394 ) After log4j migration, we need to update the logging configuration in KafkaDockerWrapper from log4j1 to log4j2. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2025-02-11 13:26:47 +05:30
Ken Huang	8e8423fed0	KAFKA-18366 Remove KafkaConfig.interBrokerProtocolVersion (#18820 ) Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-11 06:24:44 +08:00
Jhen-Yung Hsu	2452d67f2e	KAFKA-18743 Remove leader.imbalance.per.broker.percentage as it is not supported by Kraft (#18821 ) Remove `leader.imbalance.per.broker.percentage` from config. Add `leader.imbalance.per.broker.percentage` to release note Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-11 04:02:24 +08:00
Ken Huang	ad6db0952b	KAFKA-18225 ClientQuotaCallback#updateClusterMetadata is unsupported by kraft (#18196 ) This commit ensures that the ClientQuotaCallback#updateClusterMetadata method is executed in KRaft mode. This method is triggered whenever a topic or cluster metadata change occurs. However, in KRaft mode, the current implementation of the updateClusterMetadata API is inefficient due to the requirement of creating a full Cluster object. To address this, a follow-up issue (KAFKA-18239) has been created to explore more efficient mechanisms for providing cluster information to the ClientQuotaCallback without incurring the overhead of a full Cluster object creation. Reviewers: Mickael Maison <mickael.maison@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-11 01:07:28 +08:00
PoAn Yang	b3837f831e	KAFKA-17833: Convert DescribeAuthorizedOperationsTest to use KRaft (#18252 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2025-02-07 15:45:05 +01:00
Colin Patrick McCabe	b6e6a3c68a	KAFKA-18360 Remove zookeeper configurations (#18566 ) Remove broker.id.generation.enable and reserved.broker.max.id, which are not used in KRaft mode. Remove inter.broker.protocol.version, which is not used in KRaft mode. Reviewers: PoAn Yang <payang@apache.org>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-06 22:22:36 +08:00
Ken Huang	cf8d3ac49e	KAFKA-18530 Remove ZooKeeperInternals (#18641 ) Since zk has been removed in 4.0, config handlers no longer need to handle the "<default>" value. This PR streamlines the config update process by eliminating the unnecessary string checks for "<default>" Reviewers: Christo Lolov <lolovc@amazon.com>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-06 18:56:52 +08:00
TengYao Chi	f3d2607cf4	MINOR: Remove unused QuotaConfgHandler (#18617 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-06 18:54:06 +08:00
mingdaoy	8b0ef93bb4	KAFKA-18499 Clean up zookeeper from LogConfig (#18583 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-06 18:49:52 +08:00
Ming-Yen Chung	10105f9cf3	MINOR: Fix wrong config property in KafkaConfigTest (#18815 ) Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-06 17:10:17 +08:00
Kuan-Po Tseng	4c9d335bcb	KAFKA-18206: EmbeddedKafkaCluster must set features (#18189 ) related to KAFKA-18206, set features in EmbeddedKafkaCluster in both streams and connect module, note that this PR also fix potential transaction with empty records in sendPrivileged method as transaction version 2 doesn't allow this kind of scenario. Reviewers: Justine Olshan <jolshan@confluent.io>	2025-02-05 09:17:48 -08:00
TengYao Chi	6684319185	KAFKA-18645: New consumer should align close timeout handling with classic consumer (#18702 ) Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-05 09:11:09 -05:00
kevin-wu24	98d238aef0	KAFKA-16524; Metrics for KIP-853 (#18304 ) This change implement some of the metrics enumerated in KIP-853. The KafkaRaftMetrics object now exposes number-of-voters, number-of-observers and uncommitted-voter-change. The number-of-observers and uncommitted-voter-change metrics are only present on the active controller or leader, since it does not make sense for other replicas to report these metrics. In order to make these two metrics thread-safe, KafkaRaftMetrics needs to be passed into LeaderState, and therefore QuorumState. This introduces a circularity since the KafkaRaftMetrics constructor takes in QuorumState. To break the circularity for now, the logic using QuorumState will be moved to the KafkaRaftMetrics#initialize method. The BrokerServerMetrics object now exposes ignored-static-voters. The ControllerServerMetrics object now exposes IgnoredStaticVoters. To implement both metrics for "ignored static voters", this PR introduces the ExternalKRaftMetrics interface, which allows for higher layer metrics objects to be accessible within the raft module. Reviewers: José Armando García Sancio <jsancio@apache.org>	2025-02-04 14:19:11 -08:00
Calvin Liu	226532a966	KAFKA-18635: reenable the unclean shutdown detection (#18277 ) We need to re-enable the unclean shutdown detection when in ELR mode, which was inadvertently removed during the development process. Reviewers: David Mao <dmao@confluent.io>, Jun Rao <junrao@gmail.com>	2025-02-04 14:07:27 -08:00
David Arthur	0d2fd09f83	KAFKA-16446: Improve controller event duration logging (#15622 ) There are times when the controller has a high event processing time, such as during startup, or when creating a topic with many partitions. We can see these processing times in the p99 metric (kafka.controller:type=ControllerEventManager,name=EventQueueProcessingTimeMs), however it's difficult to see exactly which event is causing high processing time. With DEBUG logs, we see every event along with its processing time. Even with this, it's a bit tedious to find the event with a high processing time. This PR logs all events which take longer than 2 seconds at ERROR level. This will help identify events that are taking far too long, and which could be disruptive to the operation of the controller. The slow event logging looks like this: ``` [2024-12-20 15:03:39,754] ERROR [QuorumController id=1] Exceptionally slow controller event createTopics took 5240 ms. (org.apache.kafka.controller.EventPerformanceMonitor) ``` Also, every 60 seconds, it logs some event time statistics, including average time, maximum time, and the name of the event which took the longest. This periodic message looks like this: ``` [2024-12-20 15:35:04,798] INFO [QuorumController id=1] In the last 60000 ms period, 333 events were completed, which took an average of 12.34 ms each. The slowest event was handleCommit[baseOffset=0], which took 41.90 ms. (org.apache.kafka.controller.EventPerformanceMonitor) ``` An operator can disable these logs by adding the following to their log4j config: ``` org.apache.kafka.controller.EventPerformanceMonitor=OFF ``` Reviewers: Colin P. McCabe <cmccabe@apache.org>	2025-02-04 12:55:30 -08:00
kevin-wu24	32064d5b42	KAFKA-18305: validate controller.listener.names is not in inter.broker.listener.name for kcontrollers (#18222 ) When inter.broker.listener is explicitly set, validate that it is not in the set of controller.listener.names. Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>	2025-02-04 12:28:28 -08:00
Sean Quah	dbf4cef852	KAFKA-18690: Keep leader metadata for RE2J-assigned partitions (#18777 ) Reviewers: Lianet Magrans <lmagrans@confluent.io>	2025-02-04 13:23:06 -05:00
Justine Olshan	64a26f9c7f	KAFKA-18691: Flaky test testFencingOnTransactionExpiration (#18793 ) It appears this test was failing because the transaction was never aborting and the concurrent transactions errors would not go away. `ccab9eb` introduced the test failure because it requires the transaction to complete, but I suspect the lack of completion was happening before the change. The timeout for the write is based on the transactional timeout, and 100ms seemed too small -- thus the requests to update the state would often repeatedly time out. Also removed the loop since it was not necessary. Reviewers: Jeff Kim <jeff.kim@confluent.io>, Calvin Liu <caliu@confluent.io>	2025-02-04 09:00:16 -08:00
Luke Chen	8026d6b3e8	KAFKA-18230: Handle not controller or not leader error in admin client (#18165 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-04 16:52:11 +01:00

1 2 3 4 5 ...

5538 Commits