kafka

Commit Graph

Author	SHA1	Message	Date
Lan Ding	abbb6b3c13	KAFKA-19471: Enable acknowledgement for a record which could not be deserialized (#20148 ) CI / build (push) Waiting to run Details This patch mainly includes two improvements: 1. Update currentFetch when `pollForFetches()` throws an exception. 2. Add an override `KafkaShareConsumer.acknowledge(String topic, int partition, long offset, AcknowledgeType type)` . Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-07-27 22:35:04 +01:00
Apoorv Mittal	d350f603a4	KAFKA-18265: Move inflight batch and state classes from SharePartition (2/N) (#20230 ) CI / build (push) Waiting to run Details Another refactor PR to move in-flight batch and state out of SharePartition. This PR concludes the refactoring and subsequent PRs for this ticket will involve code cleanups and better lock handling. However the intent is to keep PRs small so they can be reviewed easily. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-07-23 23:01:23 +01:00
Apoorv Mittal	a663ce3f45	KAFKA-18265: Move acquisition lock classes from share partition (1/N) (#20227 ) While working on KAFKA-19476, I realized that we need to refactor SharePartition for read/write lock handling. I have started some work in the area. For the initial PR, I have moved AcquisitionLockTimeout class outside of SharePartition. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-07-23 20:21:42 +01:00
Kamal Chandraprakash	93adaea599	KAFKA-19523: Gracefully handle error while building remoteLogAuxState (#20201 ) CI / build (push) Waiting to run Details Improve the error handling while building the remote-log-auxiliary state when a follower node with an empty disk begin to synchronise with the leader. If the topic has remote storage enabled, then the ReplicaFetcherThread attempt to build the remote-log-auxiliary state. Note that the remote-log-auxiliary state gets invoked only when the leader-log-start-offset is non-zero and leader-log-start-offset is not equal to leader-local-log-start-offset. When the LeaderAndISR request is received, then the ReplicaManager#becomeLeaderOrFollower invokes 'makeFollowers' initially, followed by the RemoteLogManager#onLeadershipChange call. As a result, when ReplicaFetcherThread initiates the RemoteLogManager#fetchRemoteLogSegmentMetadata, the partition may not have been initialized at that time and throws retriable exception. Introduced RetriableRemoteStorageException to gracefully handle the error. After the patch: ``` [2025-07-19 19:28:20,934] INFO [ReplicaFetcher replicaId=3, leaderId=1, fetcherId=0] Could not build remote log auxiliary state for orange-1 due to error: RemoteLogManager is not ready for partition: orange-1 (kafka.server.ReplicaFetcherThread) [2025-07-19 19:28:20,934] INFO [ReplicaFetcher replicaId=3, leaderId=2, fetcherId=0] Could not build remote log auxiliary state for orange-0 due to error: RemoteLogManager is not ready for partition: orange-0 (kafka.server.ReplicaFetcherThread) ``` Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>	2025-07-23 19:29:31 +05:30
Chang-Chi Hsu	8a5549ca9b	MINOR: Rename waitForTopic to waitTopicCreation (#20216 ) Changes: Rename `waitForTopic` to `waitTopicCreation` for better clarity Reasons: To align with `waitTopicDeletion` Reference: https://github.com/apache/kafka/pull/20108/files#r2221659660 Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>	2025-07-22 21:02:57 +08:00
Federico Valeri	f5fcc4188f	KAFKA-19503: Deprecate MX4j support (#20208 ) CI / build (push) Waiting to run Details This feature adds maintenance burden and potential security concerns while providing no apparent value to the Kafka community. See [KIP-1193](https://cwiki.apache.org/confluence/x/dAxJFg) for more details. Reviewers: TengYao Chi <frankvicky@apache.org>, Ken Huang <s7133700@gmail.com> --------- Signed-off-by: Federico Valeri <fedevaleri@gmail.com>	2025-07-22 20:36:24 +08:00
Apoorv Mittal	f52f2b99e5	KAFKA-19476: Removing AtomicBoolean for findNextFetchOfffset (1/N) (#20207 ) CI / build (push) Waiting to run Details The PR refactors the findNextFetchOffset variable from AtomicBoolean to boolean itself as the access is always done while holding a lock. This also improves handling of `writeShareGroupState` method response where now complete lock is not required, rather on sub-section. Reviewers: Abhinav Dixit <adixit@confluent.io>, Andrew Schofield <aschofield@confluent.io>	2025-07-21 13:12:13 +01:00
Lan Ding	ef07b5fad1	KAFKA-19461: Add share group admin integration tests to PlaintextAdminIntegrationTest (#20103 ) Add its for `Admin.deleteShareGroupOffsets`, `Admin.alterShareGroupOffsets` and `Admin.listShareGroupOffsets` to `PlaintextAdminIntegrationTest`. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-07-21 09:08:26 +01:00
Dongnuo Lyu	50598191dc	MINOR: Add tests on TxnOffsetCommit and EndTxnMarker protection against invalid producer epoch when TV2 is used (#20024 ) CI / build (push) Waiting to run Details This patch adds an API level integration test for the producer epoch verification when processing transactional offset commit and end txn markers. Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi <kitingiao@gmail.com>, Sean Quah <squah@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2025-07-21 06:34:29 +08:00
Lan Ding	9a2f202a1e	MINOR: Move ClientQuotasRequestTest to server module (#20053 ) CI / build (push) Waiting to run Details 1. Move ClientQuotasRequestTest to server module. 2. Rewrite ClientQuotasRequestTest in Java. Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-07-20 23:14:55 +08:00
Lan Ding	9572d19c59	KAFKA-19509: Improve error message when release version is wrong (#20185 ) CI / build (push) Waiting to run Details Improve the error message in the kafka-storage.sh when an incorrect release-version is given. Specifically, following the behavior of kafka-feature.sh, when an incorrect release-version is entered, it returns the currently supported versions to the user. Reviewers: TengYao Chi <frankvicky@apache.org>, Yung <yungyung7654321@gmail.com>	2025-07-18 11:39:55 +08:00
Elizabeth Bennett	f81853ca88	KAFKA-19441: encapsulate MetadataImage in GroupCoordinator/ShareCoordinator (#20061 ) CI / build (push) Waiting to run Details The MetadataImage has a lot of stuff in it and it gets passed around in many places in the new GroupCoordinator. This makes it difficult to understand what metadata the group coordinator actually relies on and makes it too easy to use metadata in ways it wasn't meant to be used. This change encapsulate the MetadataImage in an interface (`CoordinatorMetadataImage`) that indicates and controls what metadata the group coordinator actually uses. Now it is much easier at a glance to see what dependencies the GroupCoordinator has on the metadata. Also, now we have a level of indirection that allows more flexibility in how the GroupCoordinator is provided the metadata it needs.	2025-07-18 08:16:54 +08:00
Gaurav Narula	12761c07ae	KAFKA-19458: resume cleaning on future replica dir change (#20082 ) `ReplicaManager#alterReplicaLogDirs` does not resume log cleaner while handling an `AlterReplicaLogDirs` request for a topic partition which already has an `AlterReplicaLogDirs` in progress, leading to a resource leak where the cleaning for topic partitions remains paused even after the log directory has been altered. This change ensures we invoke `LogManager#resumeCleaning` if the future replica directory has changed. Reviewers: Jun Rao <junrao@gmail.com>	2025-07-17 13:13:09 -07:00
Calvin Liu	9412051dc6	MINOR: Bump LATEST_PRODUCTION to 4.1IV1 and Use MV to enable ELR (#20137 ) Removing the isEligibleLeaderReplicasV1Enabled to let ELR be enabled if MV is at least 4.1IV1. Also bump the Latest Prod MV to 4.1IV1 Reviewers: Paolo Patierno <ppatierno@live.com>, Jun Rao <junrao@gmail.com>	2025-07-17 11:53:10 -07:00
Logan Zhu	d03878c7fb	MINOR: Migrate CoordinatorLoaderImpl from Scala to Java (#20089 ) CI / build (push) Waiting to run Details ### Summary of Changes - Rewrote both `CoordinatorLoaderImpl` and `CoordinatorLoaderImplTest` in Java, replacing their original Scala implementations. - Removed the direct dependency on `ReplicaManager` and replaced it with functional interfaces for `partitionLogSupplier` and `partitionLogEndOffsetSupplier` - Preserved original logic and test coverage during migration. Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>	2025-07-18 01:51:46 +08:00
Gaurav Narula	7e9df7d03d	KAFKA-19505: allow mocking UnifiedLog#topicId in ReplicaManagerTest (#20167 ) The mocked value for `UnifiedLog#topicId` was incorrectly set up which caused test failure. Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <payang@apache.org>, Satish Duggana <satishd@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>	2025-07-17 10:40:00 +08:00
Sanskar Jhajharia	65a9337739	MINOR: Add ShareFetch quota session verification test (#20164 ) CI / build (push) Waiting to run Details ### Background As part of KIP-932 implementation, ShareFetch requests need to properly integrate with Kafka's quota system. This requires that ShareFetch requests extract and pass the correct session information (Principal, client address, client ID) to quota managers, ensuring consistent quota enforcement between ShareFetch and traditional Fetch requests. ### Changes This PR adds `testHandleShareFetchRequestQuotaTagsVerification()`, `testHandleShareAcknowledgeRequestQuotaTagsVerification` and `testHandleShareFetchWithAcknowledgementQuotaTagsVerification` to `KafkaApisTest`, which provides verification of quota tag extraction and session handling for ShareFetch and ShareAcknowledge requests. - Ensures ShareFetch/ShareAck requests are properly constructed with the correct client ID, principal, client address, and API key - Verifies the request context contains the expected session information - Uses `ArgumentCaptor` to capture the exact `Session` and `RequestChannel.Request` objects passed to quota managers - Verifies both `quotas.fetch.maybeRecordAndGetThrottleTimeMs()` and `quotas.request.maybeRecordAndGetThrottleTimeMs()` are called with correct parameters as and when needed. - Validates that the captured `RequestChannel.Request` object maintains the correct request context information - Ensures the client ID passed to quota managers matches the test-defined value - Verifies that in case of Acks being piggybacked on the fetch requests, the quotas are applied only once and not twice. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>	2025-07-16 09:56:01 +01:00
Sanskar Jhajharia	9f092420f1	MINOR: Extend Quota Tests for ShareFetch requests (#20163 ) ### Summary Extends RequestQuotaTest to include ShareFetch API quota testing, ensuring compliance with KIP-932. ### Key Changes - New test: testShareFetchUsesSameFetchSensor() - Verifies ShareFetch and Fetch use the same FETCH quota sensor - New test: testResponseThrottleTimeWhenBothShareFetchAndRequestQuotasViolated() - Tests ShareFetch throttling behaviour - Request builder: Added ApiKeys.SHARE_FETCH case with proper ShareFetch request construction - Some minor cleanup wrt use of Collections Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>	2025-07-14 21:28:25 +01:00
Kevin Wu	a64f5bf6ab	KAFKA-19254 Add generic feature level metrics (#20021 ) This PR adds the following metrics for each of the supported production features (`metadata.version`, `kraft.version`, `transaction.version`, etc.): `kafka.server:type=MetadataLoader,name=FinalizedLevel,featureName=X` `kafka.server:type=node-metrics,name=maximum-supported-level,feature-name=X` `kafka.server:type=node-metrics,name=minimum-supported-level,feature-name=X` Reviewers: Josep Prat <josep.prat@aiven.io>, PoAn Yang <payang@apache.org>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Lan Ding <isDing_L@163.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-07-15 04:27:04 +08:00
Rajani K	a61a37f7dd	KAFKA-19452: Fix flaky test LogRecoveryTest.testHWCheckpointWithFailuresMultipleLogSegments (#20121 ) CI / build (push) Waiting to run Details The `testHWCheckpointWithFailuresMultipleLogSegments` test in `LogRecoveryTest` was failing intermittently due to a race condition during its failure simulation. In successful runs, the follower broker would restart and rejoin the In-Sync Replica (ISR) set before the old leader's failure was fully processed. This allowed for a clean and timely leader election to the now in-sync follower. However, in the failing runs, the follower did not rejoin the ISR before the leader election was triggered. With no replicas in the ISR and unclean leader election disabled by default for the test, the controller correctly refused to elect a new leader, causing the test to time out. This commit fixes the flakiness by overriding the controller configuration for this test to explicitly enable unclean leader election. This allows the out-of-sync replica to be promoted to leader, making the test deterministic and stable. Reviewers: Jun Rao <junrao@gmail.com>	2025-07-14 09:42:00 -07:00
Luke Chen	e1ff387605	KAFKA-14915: Allow reading from remote storage for multiple partitions in one fetchRequest (#20045 ) This PR enables reading remote storage for multiple partitions in one fetchRequest. The main changes are: 1. In `DelayedRemoteFetch`, we accept multiple remoteFetchTasks and other metadata now. 2. In `DelayedRemoteFetch`, we'll wait until all remoteFetch done, either succeeded or failed. 3. In `ReplicaManager#fetchMessage`, we'll create one `DelayedRemoteFetch` and pass multiple remoteFetch metadata to it, and watch all of them. 4. Added tests Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Federico Valeri <fedevaleri@gmail.com>, Satish Duggana <satishd@apache.org>	2025-07-14 19:42:08 +05:30
Apoorv Mittal	986322dc36	MINOR: Moving the rollback out of lock in share partition (#20153 ) CI / build (push) Has been cancelled Details Fixup PR Labels / fixup-pr-labels (needs-attention) (push) Has been cancelled Details Fixup PR Labels / fixup-pr-labels (triage) (push) Has been cancelled Details Docker Image CVE Scanner / scan_jvm (3.7.2) (push) Has been cancelled Details Docker Image CVE Scanner / scan_jvm (3.8.1) (push) Has been cancelled Details Docker Image CVE Scanner / scan_jvm (3.9.1) (push) Has been cancelled Details Docker Image CVE Scanner / scan_jvm (4.0.0) (push) Has been cancelled Details Docker Image CVE Scanner / scan_jvm (latest) (push) Has been cancelled Details Fixup PR Labels / needs-attention (push) Has been cancelled Details Flaky Test Report / Flaky Test Report (push) Has been cancelled Details Moving rollback out of lock, if persister returns a completed future for write state then same data-plane-request-handler thread should not call purgatory safeTryAndComplete while holding SharePartition's write lock. Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>	2025-07-11 15:22:03 +01:00
Jhen-Yung Hsu	007fe6e92a	KAFKA-19466 LogConcurrencyTest should close the log when the test completes (#20110 ) - Fix testUncommittedDataNotConsumedFrequentSegmentRolls() and testUncommittedDataNotConsumed(), which call createLog() but never close the log when the tests complete. - Move LogConcurrencyTest to the Storage module and rewrite it in Java. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-07-10 01:01:42 +08:00
Gaurav Narula	36b9bb94f1	KAFKA-19474 Move WARN log on log truncation below HWM (#20106 ) CI / build (push) Waiting to run Details #5608 introduced a regression where the check for `targetOffset < log.highWatermark` to emit a `WARN` log was made incorrectly after truncating the log. This change moves the check for `targetOffset < log.highWatermark` to `UnifiedLog#truncateTo` and ensures we emit a `WARN` log on truncation below the replica's HWM by both the `ReplicaFetcherThread` and `ReplicaAlterLogDirsThread` Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-07-09 09:55:02 +08:00
Jonah Hooper	d86ba7f54a	KAFKA-18681: Created GetReplicaLogInfo RPCs (#19664 ) CI / build (push) Waiting to run Details Creates GetReplicaLogInfoRequest and GetReplicaLogInfoResponse RPCs Information returned by these brokers will be used to aid unclean-recovery by selecting longest logs. Reviewers: Alyssa Huang <ahuang@confluent.io>, Calvin Liu <caliu@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, TaiJuWu <tjwu1217@gmail.com>	2025-07-08 10:41:01 -07:00
Jhen-Yung Hsu	dde0b8cd92	MINOR: Prevent unnecessary test runs - KAFKA-19042 follow-up (#20122 ) CI / build (push) Waiting to run Details PlaintextConsumerTest should extend AbstractConsumerTest instead BaseConsumerTest. Otherwise, those tests will be executed on both `clients-integration-tests` and `core` (see https://github.com/apache/kafka/pull/20081/files#r2190749592). Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-07-08 07:42:15 +08:00
Ken Huang	a399852ced	KAFKA-19042 Move PlaintextConsumerTest to client-integration-tests module (#20081 ) Use Java to rewrite PlaintextConsumerTest by new test infra and move it to client-integration-tests module. Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-07-08 01:41:59 +08:00
Bolin Lin	e8ee7fc210	KAFKA-19315 Move ControllerMutationQuotaManager to server module (#19807 ) CI / build (push) Has been cancelled Details Migrate ControllerMutationQuotaManager to Java implementation and move to server module, including ClientQuotaManager and associated files. Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-07-07 01:55:38 +08:00
Andrew Schofield	729f9ccf06	KAFKA-19440: Handle top-level errors in AlterShareGroupOffsets RPC (#20049 ) While testing the code in https://github.com/apache/kafka/pull/19820, it became clear that the error handling problems were due to the underlying Admin API. This PR fixes the error handling for top-level errors in the AlterShareGroupOffsets RPC. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Lan Ding <isDing_L@163.com>, TaiJuWu <tjwu1217@gmail.com>	2025-07-03 11:00:56 +01:00
Luke Chen	eb378da99c	KAFKA-19462: Count fetch size when remote fetch (#20088 ) CI / build (push) Waiting to run Details Estimate the fetch size for remote fetch to avoid to exceed the `fetch.max.bytes` config. We don't want to query the remoteLogMetadata during API handling, thus we assume the remote fetch can get `max.partition.fetch.bytes` size. Tests added. Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>	2025-07-03 10:45:59 +08:00
Abhinav Dixit	7cb370b786	KAFKA-19463: nextFetchOffset does not take ongoing state transition into account (#20080 ) CI / build (push) Waiting to run Details ### About `nextFetchOffset` function in `SharePartition` updates the fetch offsets without considering batches/offsets which might be undergoing state transition. This can cause problems in updating to the right fetch offset. ### Testing The new code added has been tested with the help of unit tests. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>	2025-07-02 18:09:43 +01:00
Yunchi Pang	42041f4772	MINOR: Refactor createResponseConfig to avoid collection copy and conversion (#19867 ) issue: https://github.com/apache/kafka/pull/19687/files#r2094574178 Why: - To improve performance by avoiding redundant temporary collections and repeated method calls. - To make the utility more flexible for inputs from both Java and Scala. What: - Refactored `createResponseConfig` in `ConfigHelper.scala` by overloading the method to accept both Java maps and `AbstractConfig`. - Extracted helper functions to `ConfigHelperUtils` in the server module. Reviewers: Ken Huang <s7133700@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-07-02 21:32:11 +08:00
Tsung-Han Ho (Miles Ho)	ad934d3202	MINOR: Remove threadNamePrefix parameter from ReplicaManager and ReplicaFetcherManager (#20069 ) CI / build (push) Waiting to run Details - remove `threadNamePrefix` from `ReplicaManager` constructor - update `BrokerServer` to use updated constructor - remove `threadNamePrefix` from `ReplicaFetcherManager` Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi <frankvicky@apache.org>	2025-07-01 20:36:50 +08:00
TaiJuWu	bd14ed21b4	KAFKA-18486 Remove ReplicaManager#becomeLeaderOrFollower (#20037 ) The PR do following: 1. Remove ReplicaManager#becomeLeaderOrFollower. 2. Remove `LeaderAndIsrRequest` and `LeaderAndIsrResponse` 3. Migrate `LeaderAndIsrRequest.PartitionState` to server-common module and change to `PartitionState` 4. Remove `ControllerEpoch` from PartitionState 5. Remove `isShuttingDown` from BrokerServer and ReplicaManager Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-06-30 01:20:49 +08:00
TaiJuWu	a95522a5ba	KAFKA-19042 Rewrite ConsumerBounceTest by Java (#19822 ) This PR does the following: 1) Rewrites consumerBounceTest in Java. 2) Moves the test to clients-integration-test. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-06-30 00:40:36 +08:00
Xuan-Zhang Gong	05b6e81688	KAFKA-19420 Don't export SocketServer from ClusterInstance (#20002 ) CI / build (push) Waiting to run Details Fixup PR Labels / fixup-pr-labels (needs-attention) (push) Has been cancelled Details Fixup PR Labels / fixup-pr-labels (triage) (push) Has been cancelled Details Docker Image CVE Scanner / scan_jvm (3.7.2) (push) Has been cancelled Details Docker Image CVE Scanner / scan_jvm (3.8.1) (push) Has been cancelled Details Docker Image CVE Scanner / scan_jvm (3.9.1) (push) Has been cancelled Details Docker Image CVE Scanner / scan_jvm (4.0.0) (push) Has been cancelled Details Docker Image CVE Scanner / scan_jvm (latest) (push) Has been cancelled Details Fixup PR Labels / needs-attention (push) Has been cancelled Details Refactor the code related to SocketServer SocketServer is an internal class, and normally the integration tests should not use it directly. [KAFKA-19239](https://issues.apache.org/jira/browse/KAFKA-19239) will add a new helper to expose the bound ports, and so the tests that need to send raw request can leverage it without accessing the SocketServer. Reviewers: PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-06-27 21:12:57 +08:00
Ken Huang	b919836551	KAFKA-17662: config.providers configuration missing from the docs (#18930 ) Ensure the config.providers configuration is documented for all components supporting it Reviewers: Mickael Maison <mickael.maison@gmail.com>, Greg Harris <gharris1727@gmail.com>, Matthias J. Sax <mjsax@apache.org>	2025-06-27 14:13:55 +02:00
Apoorv Mittal	96ef1c520a	KAFKA-19436: Restrict cache update for ongoing batch/offset state (#20041 ) CI / build (push) Waiting to run Details In the stress testing it was noticed that on acquisition lock timeout, some offsets were not found in the cache. The cache can be tried to be updated in different acknowledgement calls hence if there is an ongoing transition which is not yet finished but another parallel acknowledgement triggers the cache update then the cache can be updated incorrectly, while first transition is not yet finished. Though the cache update happens for Archived and Acknowldeged records hence this issue or existing implementation should not hamper the queues functionality. But it might update the cache early when persister call might fail or this issue triggers error logs with offset not found in cache when acquisition lock timeouts (in some scenarios). Reviewers: Abhinav Dixit <adixit@confluent.io>, Andrew Schofield <aschofield@confluent.io>	2025-06-26 15:08:15 +01:00
David Jacot	f6a78c4c2b	KAFKA-19246; OffsetFetch API does not return group level errors correctly with version 1 (#19704 ) The OffsetFetch API does not support top level errors in version 1. Hence, the top level error must be returned at the partition level. Side note: It is a tad annoying that we create error response in multiple places (e.g. KafkaApis, Group CoordinatorService). There were a reason for this but I cannot remember. Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Sean Quah <squah@confluent.io>, Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>	2025-06-26 06:29:43 -07:00
Sanskar Jhajharia	56aeaa4c44	MINOR: Cleanup ShareFetchAcknowledgeRequestTest (#19852 ) CI / build (push) Waiting to run Details Now that Kafka supports Java 17, this PR cleans up the ShareFetchAcknowledgeRequestTest. The changes mostly include: - Collections.singletonList() is replaced with List.of() - Get rid of all asJava conversions Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-06-26 13:56:18 +08:00
Mahsa Seifikar	7aaba96cc1	KAFKA-19282: Update quotaTypesEnabled on quota removal in ClientQuotaManager (#19742 ) CI / build (push) Waiting to run Details In `kafka.server.ClientQuotaManager` class, `quotaTypesEnabled` is not updated when a quota is removed via `removeQuota` method in `DefaultQuotaCallback` class. This field is set when quotas are added in `updateQuota` but it's never changed or cleared. So in case all the quotas have been removed dynamically, the system may incorrectly assume the quotas are active, which leads to unnecessary metric creation or updates until the broker is restarted. Reviewers: Jonah Hooper <jhooper@confluent.io>, Hailey Ni <hni@confluent.io>, Alyssa Huang <ahuang@confluent.io>, David Jacot <djacot@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>	2025-06-25 21:29:46 +01:00
Jing-Jia Hung	5e23df0c8d	KAFKA-18486 Migrate tests to use applyDelta instead of becomeLeaderOrFollower for testInconsistentIdReturnsError and others (#20014 ) continues the migration effort for KAFKA-18486 by replacing usage of the deprecated `becomeLeaderOrFollower` API with `applyDelta` in several test cases. #### Updated tests: - `testInconsistentIdReturnsError` - `testMaybeAddLogDirFetchers` - `testMaybeAddLogDirFetchersPausingCleaning` - `testSuccessfulBuildRemoteLogAuxStateMetrics` - `testVerificationForTransactionalPartitionsOnly` - `testBecomeFollowerWhenLeaderIsUnchangedButMissedLeaderUpdate` Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-06-25 20:02:27 +08:00
Rajini Sivaram	33a1648c44	MINOR: Fix response for consumer group describe with empty group id (#20030 ) ConsumerGroupDescribe with an empty group id returns a response containing `null` groupId in a non-nullable field. Since the response cannot be serialized, this results in UNKNOWN_SERVER_ERROR being returned to the client. This PR sets the group id in the response to an empty string instead and adds request tests for empty group id. Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2025-06-25 10:33:44 +01:00
Colin Patrick McCabe	6b2013a001	KAFKA-19294: Fix BrokerLifecycleManager RPC timeouts (#19745 ) CI / build (push) Waiting to run Details Previously, we could wait for up to half of the broker session timeout for an RPC to complete, and then delay by up to half of the broker session timeout. When taken together, these two delays could lead to brokers erroneously missing heartbeats. This change removes exponential backoff for heartbeats sent from the broker to the controller. The load caused by heartbeats is not heavy, and controllers can easily time out heartbeats when the queue length is too long. Additionally, we now set the maximum RPC time to the length of the broker period. This minimizes the impact of heavy load. Reviewers: José Armando García Sancio <jsancio@apache.org>, David Arthur <mumrah@gmail.com>	2025-06-24 16:23:25 -07:00
Ken Huang	023833fe1f	KAFKA-18778 Fix the inconsistent lastest supported version in StorageTool.scala and FutureCommand (#19157 ) To maintain code consistency, `MetadataVersion#fromVersionString` uses `latestTesting()` as the latest version. Therefore, in the tools, we also need to maintain consistency by updating the outer logging to use `latestTesting()`. See the discussion: https://github.com/apache/kafka/pull/18845#discussion_r1950706791 Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-06-25 01:59:52 +08:00
Apoorv Mittal	1ca8779bee	MINOR: Correcting client error for fenced share partition (#20023 ) Correct the error when SharePartition is fenced. Reviewers: Abhinav Dixit <adixit@confluent.io>, Sushant Mahajan <smahajan@confluent.io>, Andrew Schofield <aschofield@confluent.io>	2025-06-24 09:46:14 +01:00
Sushant Mahajan	3d4407ff9d	MINOR: Change exceptions for few error codes in SharePartition. (#20020 ) CI / build (push) Waiting to run Details * The `SharePartition` class wraps the errors received from `PersisterStateManager` to be sent to the client. * In this PR, we are categorizing the errors a bit better. * Some exception messages in `PersisterStateManager` have been updated to show the share partition key. * Tests have been updated wherever needed. Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>	2025-06-23 19:27:15 +01:00
Ming-Yen Chung	b38573fcaa	KAFKA-18486 Remove becomeLeaderOrFollower from testPartition, testPreferredReplicaAs (#20009 ) Replace `leaderAndIsrRequest` and `becomeLeaderOrFollower` with `TopicsDelta`, `MetadataImage` and `ReplicaManager#applyDelta` for the following tests: * testPartitionListener * testPartitionMarkedOfflineIfLogCantBeCreated * testPartitionMetadataFileNotCreated * testPartitionsWithLateTransactionsCount * testPreferredReplicaAsFollower * testPreferredReplicaAsLeader * testPreferredReplicaAsLeaderWhenSameRackFollowerIsOutOfIsr * testProducerIdCountMetrics Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-06-23 16:42:30 +08:00
Bolin Lin	3404f65cdb	KAFKA-19324 Make org.apache.kafka.common.test.TestUtils package-private to prevent cross-module access (#19884 ) Description * Replace `org.apache.kafka.common.test.TestUtils` with `org.apache.kafka.test.TestUtils` in outer package modules to standardize test utility usage * Move `waitUntilLeaderIsElectedOrChangedWithAdmin` method from `org.apache.kafka.test.TestUtils` to `ClusterInstance` and refactor for better code organization * Add `org.apache.kafka.test.TestUtils` dependency to `transaction-coordinator` import control Reviewers: PoAn Yang [payang@apache.org](mailto:payang@apache.org), Ken Huang [s7133700@gmail.com](mailto:s7133700@gmail.com), Ken Huang [s7133700@gmail.com](mailto:s7133700@gmail.com), Chia-Ping Tsai [chia7712@gmail.com](mailto:chia7712@gmail.com)	2025-06-22 22:47:40 +08:00
S.Y. Wang	22bef988d4	KAFKA-18926 KafkaPrincipalBuilder should extend KafkaPrincipalSerde (#19987 ) In KRaft, custom KafkaPrincipalBuilder instances must implement KafkaPrincipalSerde to support the forward mechanism. Currently, this requirement is not enforced and relies on the developer's attention. With this patch, we can prevent incorrect implementations at compile time. Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-06-22 22:01:03 +08:00

1 2 3 4 5 ...

5960 Commits