Commit Graph

5950 Commits

Author SHA1 Message Date
Lan Ding 9572d19c59
KAFKA-19509: Improve error message when release version is wrong (#20185)
CI / build (push) Waiting to run Details
Improve the error message in the kafka-storage.sh when an incorrect
release-version is given. Specifically, following the behavior of
kafka-feature.sh, when an incorrect release-version is entered, it
returns the currently supported versions to the user.

Reviewers: TengYao Chi <frankvicky@apache.org>, Yung
 <yungyung7654321@gmail.com>
2025-07-18 11:39:55 +08:00
Elizabeth Bennett f81853ca88
KAFKA-19441: encapsulate MetadataImage in GroupCoordinator/ShareCoordinator (#20061)
CI / build (push) Waiting to run Details
The MetadataImage has a lot of stuff in it and it gets passed around in
many places in the new GroupCoordinator. This makes it difficult to
understand what metadata the group coordinator actually relies on and
makes it too easy to use metadata in ways it wasn't meant to be used. 

This change encapsulate the MetadataImage in an interface
(`CoordinatorMetadataImage`) that indicates and controls what metadata
the group coordinator actually uses. Now it is much easier at a glance
to see what dependencies the GroupCoordinator has on the metadata. Also,
now we have a level of indirection that allows more flexibility in how
the GroupCoordinator is provided the metadata it needs.
2025-07-18 08:16:54 +08:00
Gaurav Narula 12761c07ae
KAFKA-19458: resume cleaning on future replica dir change (#20082)
`ReplicaManager#alterReplicaLogDirs` does not resume log cleaner while
handling an `AlterReplicaLogDirs` request for a topic partition which
already has an `AlterReplicaLogDirs` in progress, leading to a resource
leak where the cleaning for topic partitions remains paused even after
the log directory has been altered.

This change ensures we invoke `LogManager#resumeCleaning` if the future
replica directory has changed.

Reviewers: Jun Rao <junrao@gmail.com>
2025-07-17 13:13:09 -07:00
Calvin Liu 9412051dc6
MINOR: Bump LATEST_PRODUCTION to 4.1IV1 and Use MV to enable ELR (#20137)
Removing the isEligibleLeaderReplicasV1Enabled to let ELR be enabled if
MV is at least 4.1IV1.  Also bump the Latest Prod MV to 4.1IV1

Reviewers: Paolo Patierno <ppatierno@live.com>, Jun Rao <junrao@gmail.com>
2025-07-17 11:53:10 -07:00
Logan Zhu d03878c7fb
MINOR: Migrate CoordinatorLoaderImpl from Scala to Java (#20089)
CI / build (push) Waiting to run Details
### Summary of Changes

- Rewrote both `CoordinatorLoaderImpl` and `CoordinatorLoaderImplTest`
in Java, replacing their original Scala implementations.
- Removed the direct dependency on `ReplicaManager` and replaced it with
functional interfaces for `partitionLogSupplier` and
`partitionLogEndOffsetSupplier`
- Preserved original logic and test coverage during migration.

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>,
 TengYao Chi <frankvicky@apache.org>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-07-18 01:51:46 +08:00
Gaurav Narula 7e9df7d03d
KAFKA-19505: allow mocking UnifiedLog#topicId in ReplicaManagerTest (#20167)
The mocked value for `UnifiedLog#topicId` was incorrectly set up which
caused test failure.

Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <payang@apache.org>, Satish Duggana <satishd@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-17 10:40:00 +08:00
Sanskar Jhajharia 65a9337739
MINOR: Add ShareFetch quota session verification test (#20164)
CI / build (push) Waiting to run Details
### Background
As part of KIP-932 implementation, ShareFetch requests need to properly
integrate with Kafka's quota system. This requires that ShareFetch
requests extract and pass the correct session information (Principal,
client address, client ID) to quota managers, ensuring consistent quota
enforcement between ShareFetch and traditional Fetch requests.

### Changes
This PR adds `testHandleShareFetchRequestQuotaTagsVerification()`,
`testHandleShareAcknowledgeRequestQuotaTagsVerification` and
`testHandleShareFetchWithAcknowledgementQuotaTagsVerification` to
`KafkaApisTest`, which provides verification of quota tag extraction and
session handling for ShareFetch and ShareAcknowledge requests.
   - Ensures ShareFetch/ShareAck requests are properly constructed with
the correct client ID, principal, client address, and API key
   - Verifies the request context contains the expected session
information
   - Uses `ArgumentCaptor` to capture the exact `Session` and
`RequestChannel.Request` objects passed to quota managers
   - Verifies both `quotas.fetch.maybeRecordAndGetThrottleTimeMs()` and
`quotas.request.maybeRecordAndGetThrottleTimeMs()` are called with
correct parameters as and when needed.
   - Validates that the captured `RequestChannel.Request` object
maintains the correct request context information
   - Ensures the client ID passed to quota managers matches the
test-defined value
   - Verifies that in case of Acks being piggybacked on the fetch
requests, the quotas are applied only once and not twice.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
2025-07-16 09:56:01 +01:00
Sanskar Jhajharia 9f092420f1
MINOR: Extend Quota Tests for ShareFetch requests (#20163)
### Summary
Extends RequestQuotaTest to include ShareFetch API quota testing,
ensuring compliance with KIP-932.

### Key Changes
- New test: testShareFetchUsesSameFetchSensor() - Verifies ShareFetch
and Fetch use the same FETCH quota sensor
- New test:
testResponseThrottleTimeWhenBothShareFetchAndRequestQuotasViolated() -
Tests ShareFetch throttling behaviour
- Request builder: Added ApiKeys.SHARE_FETCH case with proper ShareFetch
request construction
- Some minor cleanup wrt use of Collections

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
2025-07-14 21:28:25 +01:00
Kevin Wu a64f5bf6ab
KAFKA-19254 Add generic feature level metrics (#20021)
This PR adds the following metrics for each of the supported production
features (`metadata.version`, `kraft.version`, `transaction.version`,
etc.):

`kafka.server:type=MetadataLoader,name=FinalizedLevel,featureName=X`

`kafka.server:type=node-metrics,name=maximum-supported-level,feature-name=X`

`kafka.server:type=node-metrics,name=minimum-supported-level,feature-name=X`

Reviewers: Josep Prat <josep.prat@aiven.io>, PoAn Yang
 <payang@apache.org>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Lan Ding
 <isDing_L@163.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-15 04:27:04 +08:00
Rajani K a61a37f7dd
KAFKA-19452: Fix flaky test LogRecoveryTest.testHWCheckpointWithFailuresMultipleLogSegments (#20121)
CI / build (push) Waiting to run Details
The `testHWCheckpointWithFailuresMultipleLogSegments` test in
`LogRecoveryTest` was failing intermittently due to a race condition
during its failure simulation.

In successful runs, the follower broker would restart and rejoin the
In-Sync Replica (ISR) set before the old leader's failure was fully
processed. This allowed for a clean and timely leader election to the
now in-sync follower.

However, in the failing runs, the follower did not rejoin the ISR before
the leader election was triggered. With no replicas in the ISR and
unclean leader election disabled by default for the test, the controller
correctly refused to elect a new leader, causing the test to time out.

This commit fixes the flakiness by overriding the controller
configuration for this test to explicitly enable unclean leader
election. This allows the out-of-sync replica to be promoted to leader,
making the test deterministic and stable.

Reviewers: Jun Rao <junrao@gmail.com>
2025-07-14 09:42:00 -07:00
Luke Chen e1ff387605
KAFKA-14915: Allow reading from remote storage for multiple partitions in one fetchRequest (#20045)
This PR enables reading remote storage for multiple partitions in one
fetchRequest. The main changes are:
1. In `DelayedRemoteFetch`, we accept multiple remoteFetchTasks and
other metadata now.
2. In `DelayedRemoteFetch`, we'll wait until all remoteFetch done,
either succeeded or failed.
3. In `ReplicaManager#fetchMessage`, we'll create one
`DelayedRemoteFetch` and pass multiple remoteFetch metadata to it, and
watch all of them.
4. Added tests

Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Federico Valeri <fedevaleri@gmail.com>, Satish Duggana <satishd@apache.org>
2025-07-14 19:42:08 +05:30
Apoorv Mittal 986322dc36
MINOR: Moving the rollback out of lock in share partition (#20153)
CI / build (push) Has been cancelled Details
Fixup PR Labels / fixup-pr-labels (needs-attention) (push) Has been cancelled Details
Fixup PR Labels / fixup-pr-labels (triage) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.7.2) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.8.1) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.9.1) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (4.0.0) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (latest) (push) Has been cancelled Details
Fixup PR Labels / needs-attention (push) Has been cancelled Details
Flaky Test Report / Flaky Test Report (push) Has been cancelled Details
Moving rollback out of lock, if persister returns a completed future for
write state then same data-plane-request-handler thread should not call
purgatory safeTryAndComplete while holding SharePartition's write lock.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit
 <adixit@confluent.io>
2025-07-11 15:22:03 +01:00
Jhen-Yung Hsu 007fe6e92a
KAFKA-19466 LogConcurrencyTest should close the log when the test completes (#20110)
- Fix testUncommittedDataNotConsumedFrequentSegmentRolls() and
testUncommittedDataNotConsumed(), which call createLog() but never close
the log when the tests complete.
- Move LogConcurrencyTest to the Storage module and rewrite it in Java.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-10 01:01:42 +08:00
Gaurav Narula 36b9bb94f1
KAFKA-19474 Move WARN log on log truncation below HWM (#20106)
CI / build (push) Waiting to run Details
#5608 introduced a regression where the check for `targetOffset <
log.highWatermark`
to emit a `WARN` log was made incorrectly after truncating the log.

This change moves the check for `targetOffset < log.highWatermark`  to
`UnifiedLog#truncateTo` and ensures we emit a `WARN` log on truncation
below  the replica's HWM by both the `ReplicaFetcherThread` and
`ReplicaAlterLogDirsThread`

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-07-09 09:55:02 +08:00
Jonah Hooper d86ba7f54a
KAFKA-18681: Created GetReplicaLogInfo RPCs (#19664)
CI / build (push) Waiting to run Details
Creates GetReplicaLogInfoRequest and GetReplicaLogInfoResponse RPCs
Information returned by these brokers will be used to aid
unclean-recovery by selecting longest logs.

Reviewers: Alyssa Huang <ahuang@confluent.io>, Calvin Liu <caliu@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, TaiJuWu <tjwu1217@gmail.com>
2025-07-08 10:41:01 -07:00
Jhen-Yung Hsu dde0b8cd92
MINOR: Prevent unnecessary test runs - KAFKA-19042 follow-up (#20122)
CI / build (push) Waiting to run Details
PlaintextConsumerTest should extend AbstractConsumerTest instead
BaseConsumerTest. Otherwise, those tests will be executed on both
`clients-integration-tests` and `core` (see
https://github.com/apache/kafka/pull/20081/files#r2190749592).

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-08 07:42:15 +08:00
Ken Huang a399852ced
KAFKA-19042 Move PlaintextConsumerTest to client-integration-tests module (#20081)
Use Java to rewrite PlaintextConsumerTest by new test infra and  move it
to client-integration-tests module.

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-08 01:41:59 +08:00
Bolin Lin e8ee7fc210
KAFKA-19315 Move ControllerMutationQuotaManager to server module (#19807)
CI / build (push) Has been cancelled Details
Migrate ControllerMutationQuotaManager to Java implementation and move
to server module, including ClientQuotaManager and associated files.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-07 01:55:38 +08:00
Andrew Schofield 729f9ccf06
KAFKA-19440: Handle top-level errors in AlterShareGroupOffsets RPC (#20049)
While testing the code in https://github.com/apache/kafka/pull/19820, it
became clear that the error handling problems were due to the underlying
Admin API. This PR fixes the error handling for top-level errors in the
AlterShareGroupOffsets RPC.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Lan Ding
 <isDing_L@163.com>, TaiJuWu <tjwu1217@gmail.com>
2025-07-03 11:00:56 +01:00
Luke Chen eb378da99c
KAFKA-19462: Count fetch size when remote fetch (#20088)
CI / build (push) Waiting to run Details
Estimate the fetch size for remote fetch to avoid to exceed the
`fetch.max.bytes` config. We don't want to query the remoteLogMetadata
during API handling, thus we assume the remote fetch can get
`max.partition.fetch.bytes` size. Tests added.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-07-03 10:45:59 +08:00
Abhinav Dixit 7cb370b786
KAFKA-19463: nextFetchOffset does not take ongoing state transition into account (#20080)
CI / build (push) Waiting to run Details
### About
`nextFetchOffset` function in `SharePartition` updates the fetch offsets
without considering batches/offsets which might be undergoing state
transition. This can cause problems in updating to the right fetch
offset.

### Testing
The new code added has been tested with the help of unit tests.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
2025-07-02 18:09:43 +01:00
Yunchi Pang 42041f4772
MINOR: Refactor createResponseConfig to avoid collection copy and conversion (#19867)
issue: https://github.com/apache/kafka/pull/19687/files#r2094574178

Why:
- To improve performance by avoiding redundant temporary collections and
repeated method calls.
- To make the utility more flexible for inputs from both Java and Scala.

What:
- Refactored `createResponseConfig` in `ConfigHelper.scala` by
overloading the method to accept both Java maps and `AbstractConfig`.
- Extracted helper functions to `ConfigHelperUtils` in the server
module.

Reviewers: Ken Huang <s7133700@gmail.com>, Jhen-Yung Hsu
<jhenyunghsu@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping
Tsai <chia7712@gmail.com>
2025-07-02 21:32:11 +08:00
Tsung-Han Ho (Miles Ho) ad934d3202
MINOR: Remove threadNamePrefix parameter from ReplicaManager and ReplicaFetcherManager (#20069)
CI / build (push) Waiting to run Details
- remove `threadNamePrefix` from `ReplicaManager` constructor
- update `BrokerServer` to use updated constructor
- remove `threadNamePrefix` from `ReplicaFetcherManager`

Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi
 <frankvicky@apache.org>
2025-07-01 20:36:50 +08:00
TaiJuWu bd14ed21b4
KAFKA-18486 Remove ReplicaManager#becomeLeaderOrFollower (#20037)
The PR do following:

1. Remove  ReplicaManager#becomeLeaderOrFollower.
2. Remove `LeaderAndIsrRequest` and `LeaderAndIsrResponse`
3. Migrate `LeaderAndIsrRequest.PartitionState` to server-common module
and change to `PartitionState`
4. Remove `ControllerEpoch` from PartitionState
5. Remove `isShuttingDown` from BrokerServer and ReplicaManager

Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-30 01:20:49 +08:00
TaiJuWu a95522a5ba
KAFKA-19042 Rewrite ConsumerBounceTest by Java (#19822)
This PR does the following:
1) Rewrites consumerBounceTest in Java.
2) Moves the test to clients-integration-test.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-30 00:40:36 +08:00
Xuan-Zhang Gong 05b6e81688
KAFKA-19420 Don't export SocketServer from ClusterInstance (#20002)
CI / build (push) Waiting to run Details
Fixup PR Labels / fixup-pr-labels (needs-attention) (push) Has been cancelled Details
Fixup PR Labels / fixup-pr-labels (triage) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.7.2) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.8.1) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.9.1) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (4.0.0) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (latest) (push) Has been cancelled Details
Fixup PR Labels / needs-attention (push) Has been cancelled Details
Refactor the code related to SocketServer  SocketServer is an internal
class, and normally the integration tests should not use it directly.
[KAFKA-19239](https://issues.apache.org/jira/browse/KAFKA-19239) will
add a new helper to expose the bound ports, and so the tests that need
to send raw request can leverage it without accessing the SocketServer.

Reviewers: PoAn Yang <payang@apache.org>, Ken Huang
 <s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-27 21:12:57 +08:00
Ken Huang b919836551
KAFKA-17662: config.providers configuration missing from the docs (#18930)
Ensure the config.providers configuration is documented for all
components supporting it

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Greg Harris
<gharris1727@gmail.com>, Matthias J. Sax <mjsax@apache.org>
2025-06-27 14:13:55 +02:00
Apoorv Mittal 96ef1c520a
KAFKA-19436: Restrict cache update for ongoing batch/offset state (#20041)
CI / build (push) Waiting to run Details
In the stress testing it was noticed that on acquisition lock timeout,
some offsets were not found in the cache. The cache can be tried to be
updated in different acknowledgement calls hence if there is an ongoing
transition which is not yet finished but another parallel
acknowledgement triggers the cache update then the cache can be updated
incorrectly, while first transition is not yet finished.

Though the cache update happens for Archived and Acknowldeged records
hence this issue or existing implementation should not hamper the queues
functionality. But it might update the cache early when persister call
might fail or this issue triggers error logs with offset not found in
cache when acquisition lock timeouts (in some scenarios).

Reviewers: Abhinav Dixit <adixit@confluent.io>, Andrew Schofield
 <aschofield@confluent.io>
2025-06-26 15:08:15 +01:00
David Jacot f6a78c4c2b
KAFKA-19246; OffsetFetch API does not return group level errors correctly with version 1 (#19704)
The OffsetFetch API does not support top level errors in version 1.
Hence, the top level error must be returned at the partition level.

Side note: It is a tad annoying that we create error response in
multiple places (e.g. KafkaApis, Group CoordinatorService). There were a
reason for this but I cannot remember.

Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Sean Quah <squah@confluent.io>, Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>
2025-06-26 06:29:43 -07:00
Sanskar Jhajharia 56aeaa4c44
MINOR: Cleanup ShareFetchAcknowledgeRequestTest (#19852)
CI / build (push) Waiting to run Details
Now that Kafka supports Java 17, this PR cleans up the
ShareFetchAcknowledgeRequestTest.
The changes mostly include:
- Collections.singletonList() is replaced with List.of()
- Get rid of all asJava conversions

Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-26 13:56:18 +08:00
Mahsa Seifikar 7aaba96cc1
KAFKA-19282: Update quotaTypesEnabled on quota removal in ClientQuotaManager (#19742)
CI / build (push) Waiting to run Details
In `kafka.server.ClientQuotaManager` class, `quotaTypesEnabled` is not updated when a quota is removed via `removeQuota` method in `DefaultQuotaCallback` class. This field is set when quotas are added in `updateQuota` but it's never changed or cleared. So in case all the quotas have been removed dynamically, the system may incorrectly assume the quotas are active, which leads to unnecessary metric creation or updates until the broker is restarted.

Reviewers: Jonah Hooper <jhooper@confluent.io>, Hailey Ni <hni@confluent.io>, Alyssa Huang <ahuang@confluent.io>, David Jacot <djacot@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2025-06-25 21:29:46 +01:00
Jing-Jia Hung 5e23df0c8d
KAFKA-18486 Migrate tests to use applyDelta instead of becomeLeaderOrFollower for testInconsistentIdReturnsError and others (#20014)
continues the migration effort for KAFKA-18486 by replacing usage of the
deprecated `becomeLeaderOrFollower` API with `applyDelta` in several
test cases.

#### Updated tests:
- `testInconsistentIdReturnsError`
- `testMaybeAddLogDirFetchers`
- `testMaybeAddLogDirFetchersPausingCleaning`
- `testSuccessfulBuildRemoteLogAuxStateMetrics`
- `testVerificationForTransactionalPartitionsOnly`
- `testBecomeFollowerWhenLeaderIsUnchangedButMissedLeaderUpdate`

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TaiJuWu
 <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-25 20:02:27 +08:00
Rajini Sivaram 33a1648c44
MINOR: Fix response for consumer group describe with empty group id (#20030)
ConsumerGroupDescribe with an empty group id returns a response containing `null` groupId in a non-nullable field. Since the response cannot be serialized, this results in UNKNOWN_SERVER_ERROR being returned to the client. This PR sets the group id in the response to an empty string instead and adds request tests for empty group id.

Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-25 10:33:44 +01:00
Colin Patrick McCabe 6b2013a001
KAFKA-19294: Fix BrokerLifecycleManager RPC timeouts (#19745)
CI / build (push) Waiting to run Details
Previously, we could wait for up to half of the broker session timeout
for an RPC to complete, and then delay by up to half of the broker
session timeout. When taken together, these two delays could lead to
brokers erroneously missing heartbeats.

This change removes exponential backoff for heartbeats sent from the
broker to the controller. The load caused by heartbeats is not heavy,
and controllers can easily time out heartbeats when the queue length is
too long. Additionally, we now set the maximum RPC time to the length of
the broker period. This minimizes the impact of heavy load.

Reviewers: José Armando García Sancio <jsancio@apache.org>, David Arthur <mumrah@gmail.com>
2025-06-24 16:23:25 -07:00
Ken Huang 023833fe1f
KAFKA-18778 Fix the inconsistent lastest supported version in StorageTool.scala and FutureCommand (#19157)
To maintain code consistency, `MetadataVersion#fromVersionString` uses
`latestTesting()` as the latest version. Therefore, in the tools, we
also need to maintain consistency by updating the outer logging to use
`latestTesting()`.

See the discussion:
https://github.com/apache/kafka/pull/18845#discussion_r1950706791

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-25 01:59:52 +08:00
Apoorv Mittal 1ca8779bee
MINOR: Correcting client error for fenced share partition (#20023)
Correct the error when SharePartition is fenced.

Reviewers: Abhinav Dixit <adixit@confluent.io>, Sushant Mahajan
 <smahajan@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2025-06-24 09:46:14 +01:00
Sushant Mahajan 3d4407ff9d
MINOR: Change exceptions for few error codes in SharePartition. (#20020)
CI / build (push) Waiting to run Details
* The `SharePartition` class wraps the errors received from
`PersisterStateManager` to be sent to the client.
* In this PR, we are categorizing the errors a bit better.
* Some exception messages in `PersisterStateManager` have been updated
to show the share partition key.
* Tests have been updated wherever needed.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal
 <apoorvmittal10@gmail.com>
2025-06-23 19:27:15 +01:00
Ming-Yen Chung b38573fcaa
KAFKA-18486 Remove becomeLeaderOrFollower from testPartition*, testPreferredReplicaAs* (#20009)
Replace `leaderAndIsrRequest` and `becomeLeaderOrFollower` with
`TopicsDelta`, `MetadataImage` and `ReplicaManager#applyDelta` for the
following tests:
* testPartitionListener
* testPartitionMarkedOfflineIfLogCantBeCreated
* testPartitionMetadataFileNotCreated
* testPartitionsWithLateTransactionsCount
* testPreferredReplicaAsFollower
* testPreferredReplicaAsLeader
* testPreferredReplicaAsLeaderWhenSameRackFollowerIsOutOfIsr
* testProducerIdCountMetrics

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-23 16:42:30 +08:00
Bolin Lin 3404f65cdb
KAFKA-19324 Make org.apache.kafka.common.test.TestUtils package-private to prevent cross-module access (#19884)
Description

* Replace `org.apache.kafka.common.test.TestUtils` with
`org.apache.kafka.test.TestUtils` in outer package modules to
standardize test utility usage
* Move `waitUntilLeaderIsElectedOrChangedWithAdmin` method from
`org.apache.kafka.test.TestUtils` to `ClusterInstance` and refactor for
better code organization
* Add `org.apache.kafka.test.TestUtils` dependency to
`transaction-coordinator` import control

Reviewers: PoAn Yang [payang@apache.org](mailto:payang@apache.org), Ken
 Huang  [s7133700@gmail.com](mailto:s7133700@gmail.com), Ken Huang
 [s7133700@gmail.com](mailto:s7133700@gmail.com), Chia-Ping Tsai
 [chia7712@gmail.com](mailto:chia7712@gmail.com)
2025-06-22 22:47:40 +08:00
S.Y. Wang 22bef988d4
KAFKA-18926 KafkaPrincipalBuilder should extend KafkaPrincipalSerde (#19987)
In KRaft, custom KafkaPrincipalBuilder instances must implement
KafkaPrincipalSerde to support the forward mechanism. Currently, this
requirement is not enforced and relies on the developer's attention.

With this patch, we can prevent incorrect implementations at compile
time.

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-22 22:01:03 +08:00
Sanskar Jhajharia 992eaafb62
MINOR: Cleanup Core Module- Scala Modules (3/n) (#19804)
CI / build (push) Waiting to run Details
Now that Kafka Brokers support Java 17, this PR makes some changes in
core module. The changes in this PR are limited to only some Scala files
in the Core module's tests. The changes mostly include:
- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()

To be clear, the directories being targeted in this PR from unit.kafka
module:
- admin
- cluster
- coordinator
- docker
- integration
- metrics

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-22 00:20:39 +08:00
Chirag Wadhwa 7c77519f59
MINOR: changed the test testShareFetchRequestSuccessfulSharingBetweenMultipleConsumers to remove ambiguity (#19997)
CI / build (push) Waiting to run Details
The test testShareFetchRequestSuccessfulSharingBetweenMultipleConsumers
was recently found to be flaky. Making the following small change that
could potentially resolve the issue. Earlier, 1000 records were being
produced and then 3 consecutive share fetch requests were being sent. At
the end, assertions were done to make sure each share consumer receives
some records, and that none of them consume the same record. Since the
motive for the test is to see if multiple consumers can share the same
subscription and not consume the same record, a better way would be to
produce a record, consume that and repeat it 3 times with the 3
consumers. This ensures that every consumer consume a record, and a
previously consume record is not consumed again by the subsequent share
fetches.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield
 <aschofield@confluent.io>
2025-06-21 08:36:31 +01:00
Sushant Mahajan d5e2ecae95
MINOR: Reduce logging in persister. (#19998)
CI / build (push) Waiting to run Details
* Few logs in `PersisterStateManager` were noisy and not adding much
value.
* For the sake of reducing pollution, they have been moved to debug
level.
* Additional debug log in `DefaultStatePersister` to track epochs.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield
 <aschofield@confluent.io>
2025-06-20 13:53:46 +01:00
Chang-Chi Hsu 46b474a9de
KAFKA-19239 Rewrite IntegrationTestUtils by java (#19776)
This PR rewrites the IntegrationTestUtils.java from Scala to Java.

## Changes:

- Converted all the existing Scala code in IntegrationTestUtils.scala
into Java in IntegrationTestUtils.java.
- Preserved the original logic and functionality to ensure backward
compatibility.
- Updated relevant imports and dependencies accordingly.

Motivation:

The rewrite aims to standardize the codebase in Java, which aligns
better with the rest of the project and facilitates easier maintenance
by the Java-centric team.

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>,
 PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-20 01:46:29 +08:00
Xuan-Zhang Gong 79d2c3c62a
KAFKA-19406 Remove BrokerTopicStats#removeOldFollowerMetrics (#19962)
BTW: whether we should rename
`ReplicaManager#updateLeaderAndFollowerMetrics`

Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang
 <payang@apache.org>, TengYao Chi <kitingiao@gmail.com>, Lan Ding
 <isDing_L@163.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-19 17:57:22 +08:00
Lucas Brutschy 2a06335569
KAFKA-19413: Extended AuthorizerIntegrationTest to cover StreamsGroupDescribe (#19981)
CI / build (push) Waiting to run Details
Extending test coverage of authorization for streams group RPC
StreamsGroupDescribe. The RPC requires DESCRIBE GROUP and DESCRIBE TOPIC
permissions for all topics.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2025-06-18 10:19:34 +02:00
Ritika Reddy ba7b5c9b32
KAFKA-19367: Follow up bug fix (#19976)
CI / build (push) Waiting to run Details
This is a follow up to [https://github.com/apache/kafka/pull/19910](url)
`The coordinator failed to write an epoch fence transition for producer
jt142 to the transaction log with error COORDINATOR_NOT_AVAILABLE. The
epoch was increased to 2 but not returned to the client
(kafka.coordinator.transaction.TransactionCoordinator)` -- as we don't
bump the epoch with this change, we should also update the message to
not say "increased" and remove the
**epochAndMetadata.transactionMetadata.hasFailedEpochFence = true** line

In the test, the expected behavior is:
1) First append transaction to the log fails with
COORDINATOR_NOT_AVAILABLE (epoch 1)
2) We try init_pid again, this time the SINGLE epoch bump succeeds, and
the following things happen simultaneously (epoch 2)
     -> Transition to COMPLETE_ABORT
     -> Return CONCURRENT_TRANSACTION error to the client
3) The client retries, and there is another epoch bump; state
transitions to EMPTY (epoch 3)

Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits
 <alivshits@confluent.io>
2025-06-17 15:10:27 -07:00
Ken Huang 91ce182ec5
KAFKA-19042 Move ProducerSendWhileDeletionTest to client-integration-tests module (#19971)
Use Java to rewrite ProducerSendWhileDeletionTest by new test infra and
move it to client-integration-tests module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-18 00:47:30 +08:00
Kuan-Po Tseng 86cd5d50f5
KAFKA-14895: [2/N] Move AddPartitionsToTxnManager files to java (#19933)
Move AddPartitionsToTxnManagerTest to java

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-17 22:56:19 +08:00
Lucas Brutschy 31a7e01769
KAFKA-19412: Extended AuthorizerIntegrationTest to cover StreamsGroupHeartbeat (#19978)
Extending test coverage of authorization for streams group RPC
StreamsGroupHeartbeat. The RPC requires READ GROUP and DESCRIBE TOPIC
permissions for all topics. For creating internal topics, we require
CREATE TOPIC permission. If internal topic  creation fails, the request
does not fail, but the status reflects this problem.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2025-06-17 16:41:49 +02:00