Commit Graph

5927 Commits

Author SHA1 Message Date
TaiJuWu bd14ed21b4
KAFKA-18486 Remove ReplicaManager#becomeLeaderOrFollower (#20037)
The PR do following:

1. Remove  ReplicaManager#becomeLeaderOrFollower.
2. Remove `LeaderAndIsrRequest` and `LeaderAndIsrResponse`
3. Migrate `LeaderAndIsrRequest.PartitionState` to server-common module
and change to `PartitionState`
4. Remove `ControllerEpoch` from PartitionState
5. Remove `isShuttingDown` from BrokerServer and ReplicaManager

Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-30 01:20:49 +08:00
TaiJuWu a95522a5ba
KAFKA-19042 Rewrite ConsumerBounceTest by Java (#19822)
This PR does the following:
1) Rewrites consumerBounceTest in Java.
2) Moves the test to clients-integration-test.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-30 00:40:36 +08:00
Xuan-Zhang Gong 05b6e81688
KAFKA-19420 Don't export SocketServer from ClusterInstance (#20002)
CI / build (push) Waiting to run Details
Fixup PR Labels / fixup-pr-labels (needs-attention) (push) Has been cancelled Details
Fixup PR Labels / fixup-pr-labels (triage) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.7.2) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.8.1) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.9.1) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (4.0.0) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (latest) (push) Has been cancelled Details
Fixup PR Labels / needs-attention (push) Has been cancelled Details
Refactor the code related to SocketServer  SocketServer is an internal
class, and normally the integration tests should not use it directly.
[KAFKA-19239](https://issues.apache.org/jira/browse/KAFKA-19239) will
add a new helper to expose the bound ports, and so the tests that need
to send raw request can leverage it without accessing the SocketServer.

Reviewers: PoAn Yang <payang@apache.org>, Ken Huang
 <s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-27 21:12:57 +08:00
Ken Huang b919836551
KAFKA-17662: config.providers configuration missing from the docs (#18930)
Ensure the config.providers configuration is documented for all
components supporting it

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Greg Harris
<gharris1727@gmail.com>, Matthias J. Sax <mjsax@apache.org>
2025-06-27 14:13:55 +02:00
Apoorv Mittal 96ef1c520a
KAFKA-19436: Restrict cache update for ongoing batch/offset state (#20041)
CI / build (push) Waiting to run Details
In the stress testing it was noticed that on acquisition lock timeout,
some offsets were not found in the cache. The cache can be tried to be
updated in different acknowledgement calls hence if there is an ongoing
transition which is not yet finished but another parallel
acknowledgement triggers the cache update then the cache can be updated
incorrectly, while first transition is not yet finished.

Though the cache update happens for Archived and Acknowldeged records
hence this issue or existing implementation should not hamper the queues
functionality. But it might update the cache early when persister call
might fail or this issue triggers error logs with offset not found in
cache when acquisition lock timeouts (in some scenarios).

Reviewers: Abhinav Dixit <adixit@confluent.io>, Andrew Schofield
 <aschofield@confluent.io>
2025-06-26 15:08:15 +01:00
David Jacot f6a78c4c2b
KAFKA-19246; OffsetFetch API does not return group level errors correctly with version 1 (#19704)
The OffsetFetch API does not support top level errors in version 1.
Hence, the top level error must be returned at the partition level.

Side note: It is a tad annoying that we create error response in
multiple places (e.g. KafkaApis, Group CoordinatorService). There were a
reason for this but I cannot remember.

Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Sean Quah <squah@confluent.io>, Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>
2025-06-26 06:29:43 -07:00
Sanskar Jhajharia 56aeaa4c44
MINOR: Cleanup ShareFetchAcknowledgeRequestTest (#19852)
CI / build (push) Waiting to run Details
Now that Kafka supports Java 17, this PR cleans up the
ShareFetchAcknowledgeRequestTest.
The changes mostly include:
- Collections.singletonList() is replaced with List.of()
- Get rid of all asJava conversions

Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-26 13:56:18 +08:00
Mahsa Seifikar 7aaba96cc1
KAFKA-19282: Update quotaTypesEnabled on quota removal in ClientQuotaManager (#19742)
CI / build (push) Waiting to run Details
In `kafka.server.ClientQuotaManager` class, `quotaTypesEnabled` is not updated when a quota is removed via `removeQuota` method in `DefaultQuotaCallback` class. This field is set when quotas are added in `updateQuota` but it's never changed or cleared. So in case all the quotas have been removed dynamically, the system may incorrectly assume the quotas are active, which leads to unnecessary metric creation or updates until the broker is restarted.

Reviewers: Jonah Hooper <jhooper@confluent.io>, Hailey Ni <hni@confluent.io>, Alyssa Huang <ahuang@confluent.io>, David Jacot <djacot@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2025-06-25 21:29:46 +01:00
Jing-Jia Hung 5e23df0c8d
KAFKA-18486 Migrate tests to use applyDelta instead of becomeLeaderOrFollower for testInconsistentIdReturnsError and others (#20014)
continues the migration effort for KAFKA-18486 by replacing usage of the
deprecated `becomeLeaderOrFollower` API with `applyDelta` in several
test cases.

#### Updated tests:
- `testInconsistentIdReturnsError`
- `testMaybeAddLogDirFetchers`
- `testMaybeAddLogDirFetchersPausingCleaning`
- `testSuccessfulBuildRemoteLogAuxStateMetrics`
- `testVerificationForTransactionalPartitionsOnly`
- `testBecomeFollowerWhenLeaderIsUnchangedButMissedLeaderUpdate`

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TaiJuWu
 <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-25 20:02:27 +08:00
Rajini Sivaram 33a1648c44
MINOR: Fix response for consumer group describe with empty group id (#20030)
ConsumerGroupDescribe with an empty group id returns a response containing `null` groupId in a non-nullable field. Since the response cannot be serialized, this results in UNKNOWN_SERVER_ERROR being returned to the client. This PR sets the group id in the response to an empty string instead and adds request tests for empty group id.

Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-25 10:33:44 +01:00
Colin Patrick McCabe 6b2013a001
KAFKA-19294: Fix BrokerLifecycleManager RPC timeouts (#19745)
CI / build (push) Waiting to run Details
Previously, we could wait for up to half of the broker session timeout
for an RPC to complete, and then delay by up to half of the broker
session timeout. When taken together, these two delays could lead to
brokers erroneously missing heartbeats.

This change removes exponential backoff for heartbeats sent from the
broker to the controller. The load caused by heartbeats is not heavy,
and controllers can easily time out heartbeats when the queue length is
too long. Additionally, we now set the maximum RPC time to the length of
the broker period. This minimizes the impact of heavy load.

Reviewers: José Armando García Sancio <jsancio@apache.org>, David Arthur <mumrah@gmail.com>
2025-06-24 16:23:25 -07:00
Ken Huang 023833fe1f
KAFKA-18778 Fix the inconsistent lastest supported version in StorageTool.scala and FutureCommand (#19157)
To maintain code consistency, `MetadataVersion#fromVersionString` uses
`latestTesting()` as the latest version. Therefore, in the tools, we
also need to maintain consistency by updating the outer logging to use
`latestTesting()`.

See the discussion:
https://github.com/apache/kafka/pull/18845#discussion_r1950706791

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-25 01:59:52 +08:00
Apoorv Mittal 1ca8779bee
MINOR: Correcting client error for fenced share partition (#20023)
Correct the error when SharePartition is fenced.

Reviewers: Abhinav Dixit <adixit@confluent.io>, Sushant Mahajan
 <smahajan@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2025-06-24 09:46:14 +01:00
Sushant Mahajan 3d4407ff9d
MINOR: Change exceptions for few error codes in SharePartition. (#20020)
CI / build (push) Waiting to run Details
* The `SharePartition` class wraps the errors received from
`PersisterStateManager` to be sent to the client.
* In this PR, we are categorizing the errors a bit better.
* Some exception messages in `PersisterStateManager` have been updated
to show the share partition key.
* Tests have been updated wherever needed.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal
 <apoorvmittal10@gmail.com>
2025-06-23 19:27:15 +01:00
Ming-Yen Chung b38573fcaa
KAFKA-18486 Remove becomeLeaderOrFollower from testPartition*, testPreferredReplicaAs* (#20009)
Replace `leaderAndIsrRequest` and `becomeLeaderOrFollower` with
`TopicsDelta`, `MetadataImage` and `ReplicaManager#applyDelta` for the
following tests:
* testPartitionListener
* testPartitionMarkedOfflineIfLogCantBeCreated
* testPartitionMetadataFileNotCreated
* testPartitionsWithLateTransactionsCount
* testPreferredReplicaAsFollower
* testPreferredReplicaAsLeader
* testPreferredReplicaAsLeaderWhenSameRackFollowerIsOutOfIsr
* testProducerIdCountMetrics

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-23 16:42:30 +08:00
Bolin Lin 3404f65cdb
KAFKA-19324 Make org.apache.kafka.common.test.TestUtils package-private to prevent cross-module access (#19884)
Description

* Replace `org.apache.kafka.common.test.TestUtils` with
`org.apache.kafka.test.TestUtils` in outer package modules to
standardize test utility usage
* Move `waitUntilLeaderIsElectedOrChangedWithAdmin` method from
`org.apache.kafka.test.TestUtils` to `ClusterInstance` and refactor for
better code organization
* Add `org.apache.kafka.test.TestUtils` dependency to
`transaction-coordinator` import control

Reviewers: PoAn Yang [payang@apache.org](mailto:payang@apache.org), Ken
 Huang  [s7133700@gmail.com](mailto:s7133700@gmail.com), Ken Huang
 [s7133700@gmail.com](mailto:s7133700@gmail.com), Chia-Ping Tsai
 [chia7712@gmail.com](mailto:chia7712@gmail.com)
2025-06-22 22:47:40 +08:00
S.Y. Wang 22bef988d4
KAFKA-18926 KafkaPrincipalBuilder should extend KafkaPrincipalSerde (#19987)
In KRaft, custom KafkaPrincipalBuilder instances must implement
KafkaPrincipalSerde to support the forward mechanism. Currently, this
requirement is not enforced and relies on the developer's attention.

With this patch, we can prevent incorrect implementations at compile
time.

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-22 22:01:03 +08:00
Sanskar Jhajharia 992eaafb62
MINOR: Cleanup Core Module- Scala Modules (3/n) (#19804)
CI / build (push) Waiting to run Details
Now that Kafka Brokers support Java 17, this PR makes some changes in
core module. The changes in this PR are limited to only some Scala files
in the Core module's tests. The changes mostly include:
- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()

To be clear, the directories being targeted in this PR from unit.kafka
module:
- admin
- cluster
- coordinator
- docker
- integration
- metrics

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-22 00:20:39 +08:00
Chirag Wadhwa 7c77519f59
MINOR: changed the test testShareFetchRequestSuccessfulSharingBetweenMultipleConsumers to remove ambiguity (#19997)
CI / build (push) Waiting to run Details
The test testShareFetchRequestSuccessfulSharingBetweenMultipleConsumers
was recently found to be flaky. Making the following small change that
could potentially resolve the issue. Earlier, 1000 records were being
produced and then 3 consecutive share fetch requests were being sent. At
the end, assertions were done to make sure each share consumer receives
some records, and that none of them consume the same record. Since the
motive for the test is to see if multiple consumers can share the same
subscription and not consume the same record, a better way would be to
produce a record, consume that and repeat it 3 times with the 3
consumers. This ensures that every consumer consume a record, and a
previously consume record is not consumed again by the subsequent share
fetches.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield
 <aschofield@confluent.io>
2025-06-21 08:36:31 +01:00
Sushant Mahajan d5e2ecae95
MINOR: Reduce logging in persister. (#19998)
CI / build (push) Waiting to run Details
* Few logs in `PersisterStateManager` were noisy and not adding much
value.
* For the sake of reducing pollution, they have been moved to debug
level.
* Additional debug log in `DefaultStatePersister` to track epochs.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield
 <aschofield@confluent.io>
2025-06-20 13:53:46 +01:00
Chang-Chi Hsu 46b474a9de
KAFKA-19239 Rewrite IntegrationTestUtils by java (#19776)
This PR rewrites the IntegrationTestUtils.java from Scala to Java.

## Changes:

- Converted all the existing Scala code in IntegrationTestUtils.scala
into Java in IntegrationTestUtils.java.
- Preserved the original logic and functionality to ensure backward
compatibility.
- Updated relevant imports and dependencies accordingly.

Motivation:

The rewrite aims to standardize the codebase in Java, which aligns
better with the rest of the project and facilitates easier maintenance
by the Java-centric team.

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>,
 PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-20 01:46:29 +08:00
Xuan-Zhang Gong 79d2c3c62a
KAFKA-19406 Remove BrokerTopicStats#removeOldFollowerMetrics (#19962)
BTW: whether we should rename
`ReplicaManager#updateLeaderAndFollowerMetrics`

Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang
 <payang@apache.org>, TengYao Chi <kitingiao@gmail.com>, Lan Ding
 <isDing_L@163.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-19 17:57:22 +08:00
Lucas Brutschy 2a06335569
KAFKA-19413: Extended AuthorizerIntegrationTest to cover StreamsGroupDescribe (#19981)
CI / build (push) Waiting to run Details
Extending test coverage of authorization for streams group RPC
StreamsGroupDescribe. The RPC requires DESCRIBE GROUP and DESCRIBE TOPIC
permissions for all topics.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2025-06-18 10:19:34 +02:00
Ritika Reddy ba7b5c9b32
KAFKA-19367: Follow up bug fix (#19976)
CI / build (push) Waiting to run Details
This is a follow up to [https://github.com/apache/kafka/pull/19910](url)
`The coordinator failed to write an epoch fence transition for producer
jt142 to the transaction log with error COORDINATOR_NOT_AVAILABLE. The
epoch was increased to 2 but not returned to the client
(kafka.coordinator.transaction.TransactionCoordinator)` -- as we don't
bump the epoch with this change, we should also update the message to
not say "increased" and remove the
**epochAndMetadata.transactionMetadata.hasFailedEpochFence = true** line

In the test, the expected behavior is:
1) First append transaction to the log fails with
COORDINATOR_NOT_AVAILABLE (epoch 1)
2) We try init_pid again, this time the SINGLE epoch bump succeeds, and
the following things happen simultaneously (epoch 2)
     -> Transition to COMPLETE_ABORT
     -> Return CONCURRENT_TRANSACTION error to the client
3) The client retries, and there is another epoch bump; state
transitions to EMPTY (epoch 3)

Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits
 <alivshits@confluent.io>
2025-06-17 15:10:27 -07:00
Ken Huang 91ce182ec5
KAFKA-19042 Move ProducerSendWhileDeletionTest to client-integration-tests module (#19971)
Use Java to rewrite ProducerSendWhileDeletionTest by new test infra and
move it to client-integration-tests module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-18 00:47:30 +08:00
Kuan-Po Tseng 86cd5d50f5
KAFKA-14895: [2/N] Move AddPartitionsToTxnManager files to java (#19933)
Move AddPartitionsToTxnManagerTest to java

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-17 22:56:19 +08:00
Lucas Brutschy 31a7e01769
KAFKA-19412: Extended AuthorizerIntegrationTest to cover StreamsGroupHeartbeat (#19978)
Extending test coverage of authorization for streams group RPC
StreamsGroupHeartbeat. The RPC requires READ GROUP and DESCRIBE TOPIC
permissions for all topics. For creating internal topics, we require
CREATE TOPIC permission. If internal topic  creation fails, the request
does not fail, but the status reflects this problem.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2025-06-17 16:41:49 +02:00
PoAn Yang a94f7caf76
MINOR: add lower case lister name integration test (#19932)
In [KIP-1143](https://cwiki.apache.org/confluence/x/LwqWF), it
deprecated Endpoint#listenerName and removed
org.apache.kafka.network.EndPoint. Certain parts of the code depend on
listener name normalization. We should add a test to make sure there is
no regression.

Followup: 
https: //github.com/apache/kafka/pull/19191#issuecomment-2939855317
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-17 22:41:31 +08:00
Nick Guo fd70290633
KAFKA-18486 Remove ReplicaManager#becomeLeaderOrFollower from testFencedErrorCausedByBecomeLeader and other similar methods (#19966)
CI / build (push) Waiting to run Details
The included tests are as follows:

- testFencedErrorCausedByBecomeLeader
- testFetchBeyondHighWatermark
- testFetchFollowerNotAllowedForOlderClients
- testFetchFromFollowerShouldNotRunPreferLeaderSelect
- testFetchFromLeaderAlwaysAllowed
- testFetchMessagesWhenNotFollowerForOnePartition
- testFetchRequestRateMetrics
- testFetchShouldReturnImmediatelyWhenPreferredReadReplicaIsDefined
- testFollowerFetchWithDefaultSelectorNoForcedHwPropagation
- testFollowerStateNotUpdatedIfLogReadFails

I removed `testFetchMessagesWithInconsistentTopicId ` as it's no longer
needed, the "topicId" is now mandatory and cannot be null in our new
implementation.

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Lan Ding
<isDing_L@163.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-16 21:43:41 +08:00
Jhen-Yung Hsu 86419e9b8a
KAFKA-18486 Migrate ReplicaManagerTest#testTransactionAddPartitionRetry and other similar methods to use applyDelta (#19965)
CI / build (push) Waiting to run Details
Change becomeLeaderOrFollower to applyDelta in following test cases:

- testTransactionAddPartitionRetry
- testTransactionVerificationBlocksOutOfOrderSequence
- testTransactionVerificationDynamicDisablement
- testTransactionVerificationFlow
- testTransactionVerificationGuardOnMultiplePartitions
- testTransactionVerificationRejectsLowerProducerEpoch

Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-16 03:48:36 +08:00
Lan Ding 081deaa1a9
KAFKA-18486: Migrate ReplicaManagerTest to use applyDelta (#19954)
CI / build (push) Waiting to run Details
Change becomeLeaderOrFollower to applyDelta in following test cases

- testReadCommittedFetchLimitedAtLSO
- testReceiveOutOfOrderSequenceExceptionWithLogStartOffset
- testRemoteFetchExpiresPerSecMetric
- testRemoteLogReaderMetrics

Reviewers: PoAn Yang <payang@apache.org>, Bolin Lin
<linbolin1230@gmail.com>, Yung <yungyung7654321@gmail.com>, Jimmy Wang
<48462172+JimmyWang6@users.noreply.github.com>, Ken Huang
<s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>
2025-06-14 11:40:51 +08:00
Bolin Lin 94807bcd15
KAFKA-18486 Migrate ReplicaManagerTest RemoteFetchExpiresPerSecMetric and RemoteLogReaderMetrics with applyDelta (#19960)
CI / build (push) Waiting to run Details
Replace `becomeLeaderOrFollower` with `applyDelta` in method
RemoteFetchExpiresPerSecMetric and RemoteLogReaderMetrics

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-13 15:55:48 +08:00
PoAn Yang 991a10c19f
KAFKA-18486 Migrate ReplicaManagerTest#testOffsetOutOfRangeExceptionWhenFetchMessages to use applyDelta (#19952)
Change `becomeLeaderOrFollower` to `applyDelta` in following test cases
* testOffsetOutOfRangeExceptionWhenFetchMessages
* testOffsetOutOfRangeExceptionWhenReadFromLog
* testOldFollowerLosesMetricsWhenReassignPartitions
* testOldLeaderLosesMetricsWhenReassignPartitions

Reviewers: Bolin Lin <linbolin1230@gmail.com>, Lan Ding
 <isDing_L@163.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-13 15:47:51 +08:00
Kuan-Po Tseng 12d8a1bbf8
KAFKA-19237: Add dynamic config remote.log.manager.follower.thread.pool.size (#19809)
Deprecate the `remote.log.manager.thread.pool.size` configuration and introduce a new dynamic configuration:
`remote.log.manager.follower.thread.pool.size`.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Luke Chen <showuon@gmail.com>
2025-06-13 09:33:45 +05:30
Kirk True 7d6e5edf8e
KAFKA-19153: Add OAuth integration tests (#19938)
Adds a test dependency on
[mock-oauth2-server](https://github.com/navikt/mock-oauth2-server/) for
integration tests for OAuth layer. Also includes fixes for some
regressions that were caught by the integration tests.

Reviewers: Manikumar Reddy <manikumar@confluent.io>, Lianet Magrans
 <lmagrans@confluent.io>
2025-06-12 15:48:14 -04:00
Ritika Reddy 0b2e410d61
KAFKA-19367: Fix InitProducerId with TV2 double-increments epoch if ongoing transaction is aborted (#19910)
CI / build (push) Waiting to run Details
When InitProducerId is handled on the transaction coordinator, the
producer epoch is incremented (so that we fence stale requests), then if
a transaction was ongoing during this time, it's aborted.  With
transaction version 2 (a.k.a. KIP-890 part 2), abort increments the
producer epoch again (it's the part of the new abort / commit protocol),
so the epoch ends up incremented twice.

In most cases, this is benign, but in the case where the epoch of the
ongoing transaction is 32766, it's incremented to 32767, which is the
maximum value for short. Then, when it's incremented for the second
time, it goes negative, causing an illegal argument exception.

To fix this we just avoid bumping the epoch a second time.

Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits
 <alivshits@confluent.io>
2025-06-12 09:37:07 -07:00
Lianet Magrans 8f2ee4d7cd
KAFKA-18117; KAFKA-18729: Use assigned topic IDs to avoid full metadata requests on broker-side regex (#19814)
This PR uses topic IDs received in assignment (under new protocol) to
ensure that only these assigned topics are included in the consumer
metadata requests performed when the user subscribes to broker-side
regex (RE2J).

For handling the edge case of consumer needing metadata for topics IDs
(from RE2J) and topic names (from transient topics), the approach is to
send a request for the transient topics needed temporarily, and once
those resolved, the request for the topic IDs needed for RE2J will
follow. (this is because the broker doesn't accept requests for names
and IDs at the same time)

With the changes we also end up fixing another issue (KAFKA-18729) aimed
at avoiding iterating the full set of assigned partitions when checking
if a topic should be retained from the metadata response when using
RE2J.

Reviewers: David Jacot <djacot@confluent.io>
2025-06-12 10:50:31 -04:00
Dongnuo Lyu af012e1ec2
KAFKA-18961: Time-based refresh for server-side RE2J regex (#19904)
Consumers can subscribe to an RE2J SubscriptionPattern that will be
resolved and maintained on the server-side (KIP-848). Currently, those
regexes are refreshed on the coordinator when a consumer subscribes to a
new regex, or if there is a new topic metadata image (to ensure regex
resolution stays up-to-date with existing topics)

But with
[KAFKA-18813](https://issues.apache.org/jira/browse/KAFKA-18813), the
topics matching a regex are filtered based on ACLs. This generates a new
situation, as regexes resolution do not stay up-to-date as topics become
visible (ACLs added/delete).

This patch introduces time-based refresh for the subscribed regex by
- Adding internal `group.consumer.regex.batch.refresh.max.interval.ms`
config
that controls the refresh interval.
- Schedule a regex refresh when updating regex subscription if the
latest refresh is older than the max interval.

Reviewers: David Jacot <djacot@confluent.io>
2025-06-12 04:54:39 -07:00
Jing-Jia Hung 2a7457f2dd
KAFKA-18486 Migrate testPartitionMetadataFile to use applyDelta in place of deprecated becomeLeaderOrFollower (#19947)
CI / build (push) Waiting to run Details
Refactor testPartitionMetadataFile to use applyDelta and share
class-level partitions

- Replace deprecated becomeLeaderOrFollower with topicsCreateDelta +
applyDelta
- Test still asserts partition exists, local log exists, and verifies
partitionMetadataFile version (0) and topicId

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>,
 Chia-Ping Tsai <chia7712@gmail.com>
2025-06-12 09:20:47 +08:00
Ken Huang 7c715c02c0
KAFKA-18486 Update testClearPurgatoryOnBecomingFollower etc with KRaft mechanism in ReplicaManagerTest (#19924)
CI / build (push) Waiting to run Details
update the following test to avoid using `becomeLeaderOrFollower`
- testClearPurgatoryOnBecomingFollower
- testDelayedFetchIncludesAbortedTransactions
- testDisabledTransactionVerification
- testFailedBuildRemoteLogAuxStateMetrics

Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-06-11 18:52:08 +08:00
Nick Guo ab42f00bbe
KAFKA-18486 Remove ReplicaManager#becomeLeaderOrFollower in `testVerificationErrorConversions` (#19923)
Remove ReplicaManager#becomeLeaderOrFollower in
`testVerificationErrorConversionsTV1 ` and
`testVerificationErrorConversionsTV2 `.

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-11 18:47:23 +08:00
Jhen-Yung Hsu 2e968560e0
MINOR: Cleanup simplify set initialization with Set.of (#19925)
Simplify Set initialization and reduce the overhead of creating extra
collections.

The changes mostly include:
- new HashSet<>(List.of(...))
- new HashSet<>(Arrays.asList(...)) / new HashSet<>(asList(...))
- new HashSet<>(Collections.singletonList()) / new
HashSet<>(singletonList())
- new HashSet<>(Collections.emptyList())
- new HashSet<>(Set.of())

This change takes the following into account, and we will not change to
Set.of in these scenarios:
- Require `mutability` (UnsupportedOperationException).
- Allow `duplicate` elements (IllegalArgumentException).
- Allow `null` elements (NullPointerException).
- Depend on `Ordering`. `Set.of` does not guarantee order, so it could
make tests flaky or break public interfaces.

Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang
 <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-11 18:36:14 +08:00
Apoorv Mittal 997abe464f
KAFKA-19389: Fix memory consumption for completed share fetch requests (#19928)
For ShareFetch Requests, the fetch happens through DelayedShareFetch
operation. The operations which are already completed has reference to
data being sent as response. As the operation is watched over multiple
keys i.e. DelayedShareFetchGroupKey and DelayedShareFetchPartitionKey,
hence if the operation is already completed by either  watched keys  but
then again the reference to the operation is still present in other
watched  key. Which means the memory can only be free once purge
operation is  triggered by DelayedOperationPurgatory which removes the
watched key  operation from remaining keys, as the operation is already
completed.

The purge operation is dependent on the config
`ShareGroupConfig#SHARE_FETCH_PURGATORY_PURGE_INTERVAL_REQUESTS_CONFIG`
hence if the value is not smaller than the number of share fetch
requests which can consume complete memory of the broker then broker can
go out of memory. This can also be avoided by having lower fetch max
bytes for request but this value is client dependent hence can't rely to
prevent  the broker.

This PR triggers the completion on both watched keys hence the
DelayedShareFetch operation shall be removed from both keys which frees
the broker memory as soon the share fetch response is sent.

#### Testing

Tested with LocalTieredStorage where broker goes OOM after reading some
8040 messages before the fix, with default configurations as mentioned
in the
doc

[here](https://kafka.apache.org/documentation/#tiered_storage_config_ex).
But after the fix the consumption continues without any issue. And the
memory is released instantaneously.

Reviewers: Jun Rao <junrao@gmail.com>, Andrew Schofield
<aschofield@confluent.io>
2025-06-10 17:36:27 +01:00
Ming-Yen Chung 08aa469af7
KAFKA-19392 Fix metadata.log.segment.ms not being applied (#19936)
The original `props.setProperty(TopicConfig.SEGMENT_MS_CONFIG,
config.logSegmentMillis.toString)` in the `KafkaMetadataLog` constructor
was accidentally removed in #19371. Add a test to ensure this property
is properly  assigned.

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-10 17:01:54 +08:00
Kuan-Po Tseng 3a0a1705a1
KAFKA-18486 Remove becomeLeaderOrFollower from readFromLogWithOffsetOutOfRange and other related methods. (#19929)
refactor out becomeLeaderOrFollower in below tests
- readFromLogWithOffsetOutOfRange
- testBecomeFollowerWhileNewClientFetchInPurgatory
- testBecomeFollowerWhileOldClientFetchInPurgatory
- testBuildRemoteLogAuxStateMetricsThrowsException

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ken Huang
 <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>
2025-06-10 12:39:32 +08:00
Ken Huang df73133f3b
MINOR: Follow up KAFKA-19080 MetadataLogConfig (#19842)
CI / build (push) Waiting to run Details
See Discussion:
https://github.com/apache/kafka/pull/19371#discussion_r2109549343

Do the following changes:
- Update the internal config name with metadata prefix
- add the warning message for setting
`INTERNAL_METADATA_LOG_SEGMENT_BYTES_CONFIG`

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-09 03:13:03 +08:00
Nick Guo e23c8cea07
KAFKA-18486 Remove ReplicaManager#becomeLeaderOrFollower from `testReplicaAlterLogDirs` (#19922)
Use `applyDelta` replace `becomeLeaderOrFollower`

Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-06-09 03:00:00 +08:00
Ken Huang 8fd0d33670
KAFKA-19042 Move PlaintextConsumerSubscriptionTest to client-integration-tests module (#19827)
CI / build (push) Waiting to run Details
Use Java to rewrite PlaintextConsumerSubscriptionTest by new test infra
and move it to client-integration-tests module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-08 05:06:03 +08:00
Jing-Jia Hung c5e06f6e7a
KAFKA-18486 Update testExceptionWhenUnverifiedTransactionHasMultipleProducerIds (#19883)
- Replace the deprecated `becomeLeaderOrFollower` with the
metadata-based `applyDelta` method.
- Add overloaded `topicsCreateDelta` to support custom topic name and
topicId.

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Nick Guo <lansg0504@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-08 00:55:20 +08:00
Bolin Lin 8a7e4a1423
KAFKA-18486 Update activeProducerState wih KRaft mechanism in ReplicaManagerTest (#19890)
Description:
* replace RPC with KRaft mechanism to test activeProducerState in
ReplicaManagerTest

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-08 00:26:52 +08:00