Commit Graph

5969 Commits

Author SHA1 Message Date
Federico Valeri cd9dde11de
MINOR: Improve skip-record-metadata description (#20291)
This flag also skips control records, so the description needs to be
updated.

---------

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

Reviewers: Luke Chen <showuon@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Vincent Potucek
2025-08-05 08:50:50 +08:00
Chang-Chi Hsu 888861d803
MINOR: Replace boundPort with brokerBoundPort (#20297)
## Changes:
- Replaced all references to boundPort with brokerBoundPort.

## Reasons
- boundPort and brokerBoundPort share the same definition and behavior.

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>,
 Chia-Ping Tsai <chia7712@gmail.com>
2025-08-04 16:44:12 +08:00
Apoorv Mittal 05d71ad1a8
KAFKA-19476: Concurrent execution fixes for lock timeout and lso movement (#20286)
CI / build (push) Has been cancelled Details
The PR fixes following:

1. In case share partition arrive at a state which should be treated as
final state
of that batch/offset (example - LSO movement which causes offset/batch
to be ARCHIVED permanently), the result of pending write state RPCs for
that offset/batch override the ARCHIVED state. Hence track such updates
and apply when transition is completed.

2. If an acquisition lock timeout occurs while an offset/batch is
undergoing transition followed by write state RPC failure, then
respective batch/offset can
land in a scenario where the offset stays in ACQUIRED state with no
acquisition lock timeout task.

3. If a timer task is cancelled, but due to concurrent execution of
timer task and acknowledgement, there can be a scenario when timer task
has processed post cancellation. Hence it can mark the offset/batch
re-avaialble despite already acknowledged.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit
 <adixit@confluent.io>
2025-08-01 23:20:25 +01:00
Andrew Schofield b909544e99
MINOR: Improve consistency of acknowledge type terminology (#20282)
The code had a mixture of "acknowledgement type" and "acknowledge type".
The latter is preferred.

Reviewers: TengYao Chi <frankvicky@apache.org>, Lan Ding
 <isDing_L@163.com>
2025-08-01 21:17:22 +01:00
Now dda1b5a4e8
MINOR: Fix duplicate 'to' in ExactlyOnceMessageProcessor javadoc (#20228)
CI / build (push) Waiting to run Details
Fixed a simple typo in javadoc comment where "to to" appeared instead of
"to".

_No functional changes_

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-30 23:59:49 +08:00
Kevin Wu 1bcaa19c46
KAFKA-19489; Extra validation when formatting a node (#20136)
CI / build (push) Waiting to run Details
This PR adds a check to the storage tool's format command which throws a
TerseFailure when the controller.quorum.voters config is defined and the
node is formatted with the --standalone flag or the
--initial-controllers flag.

Without this check, it is possible to have two voter sets. For example,
in a three node setup, the two nodes that formatted with
--no-initial-controllers could form quorum with each other since they
have the static voter set, and the --standalone node would ignore the
config and read the voter set of itself from its log, forming its own
quorum of 1.

Reviewers: José Armando García Sancio <jsancio@apache.org>, TaiJuWu
 <tjwu1217@gmail.com>, Alyssa Huang <ahuang@confluent.io>
2025-07-30 10:58:08 -04:00
Mickael Maison 6973deab03
MINOR: Cleanups in storage module (#20087)
Cleanups including:
- Java 17 syntax, record and switch
- assertEquals() order
- javadoc

Reviewers: Andrew Schofield <aschofield@confluent.io>, Jhen-Yung Hsu
 <jhenyunghsu@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-07-30 16:02:01 +08:00
jimmy dd784e7d7a
KAFKA-16717 [3/N]: Add AdminClient.alterShareGroupOffsets (#19820)
[KAFKA-16717](https://issues.apache.org/jira/browse/KAFKA-16717) aims to
finish the AlterShareGroupOffsets for ShareGroupCommand part.

Reviewers: Lan Ding <isDing_L@163.com>, Chia-Ping Tsai
 <chia7712@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Andrew Schofield
 <aschofield@confluent.io>
2025-07-29 11:47:24 +01:00
Apoorv Mittal 875537f54b
KAFKA-19555: Restrict records acquisition post max in-flight limit (#20253)
The PR restricts the records being acquired post max-inflight limit.
Previously the max in-flight limit was only enforced while considering
the share partition for further fetches i.e. once the limit was reached
the share partition was not considered for further fetches. However,
when the records are actively released then there might be some records
being acquired post max-inflight limit. This is evident with higher
number of consumers reading from same share partition and releasing the
records.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Lan Ding
<isDing_L@163.com>
2025-07-29 10:40:06 +01:00
Lan Ding abbb6b3c13
KAFKA-19471: Enable acknowledgement for a record which could not be deserialized (#20148)
CI / build (push) Waiting to run Details
This patch mainly includes two improvements:

1. Update currentFetch when `pollForFetches()` throws an exception.
2. Add an override `KafkaShareConsumer.acknowledge(String topic, int
partition, long offset, AcknowledgeType type)` .

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-07-27 22:35:04 +01:00
Apoorv Mittal d350f603a4
KAFKA-18265: Move inflight batch and state classes from SharePartition (2/N) (#20230)
CI / build (push) Waiting to run Details
Another refactor PR to move in-flight batch and state out of
SharePartition. This PR concludes the refactoring and subsequent PRs for
this ticket will involve code cleanups and better lock handling. However
the intent is to keep PRs small so they can be reviewed easily.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-07-23 23:01:23 +01:00
Apoorv Mittal a663ce3f45
KAFKA-18265: Move acquisition lock classes from share partition (1/N) (#20227)
While working on KAFKA-19476, I realized that we need to refactor
SharePartition for read/write lock handling. I have started some work in
the area. For the initial PR, I have moved AcquisitionLockTimeout class
outside of SharePartition.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-07-23 20:21:42 +01:00
Kamal Chandraprakash 93adaea599
KAFKA-19523: Gracefully handle error while building remoteLogAuxState (#20201)
CI / build (push) Waiting to run Details
Improve the error handling while building the remote-log-auxiliary state
when a follower node with an empty disk begin to synchronise with the
leader. If the topic has remote storage enabled, then the
ReplicaFetcherThread attempt to build the remote-log-auxiliary state.
Note that the remote-log-auxiliary state gets invoked only when the
leader-log-start-offset is non-zero and leader-log-start-offset is not
equal to leader-local-log-start-offset.

When the LeaderAndISR request is received, then the
ReplicaManager#becomeLeaderOrFollower invokes 'makeFollowers' initially,
followed by the RemoteLogManager#onLeadershipChange call. As a result,
when ReplicaFetcherThread initiates the
RemoteLogManager#fetchRemoteLogSegmentMetadata, the partition may not
have been initialized at that time and throws retriable exception.

Introduced RetriableRemoteStorageException to gracefully handle the
error.

After the patch:
```
[2025-07-19 19:28:20,934] INFO [ReplicaFetcher replicaId=3, leaderId=1,
fetcherId=0] Could not build remote log auxiliary state for orange-1 due
to error: RemoteLogManager is not ready for partition: orange-1
(kafka.server.ReplicaFetcherThread)
[2025-07-19 19:28:20,934] INFO [ReplicaFetcher replicaId=3, leaderId=2,
fetcherId=0] Could not build remote log auxiliary state for orange-0 due
to error: RemoteLogManager is not ready for partition: orange-0
(kafka.server.ReplicaFetcherThread)
```

Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
2025-07-23 19:29:31 +05:30
Chang-Chi Hsu 8a5549ca9b
MINOR: Rename waitForTopic to waitTopicCreation (#20216)
Changes: Rename `waitForTopic` to `waitTopicCreation` for better clarity
Reasons: To align with `waitTopicDeletion`  Reference:
https://github.com/apache/kafka/pull/20108/files#r2221659660

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<frankvicky@apache.org>
2025-07-22 21:02:57 +08:00
Federico Valeri f5fcc4188f
KAFKA-19503: Deprecate MX4j support (#20208)
CI / build (push) Waiting to run Details
This feature adds maintenance burden and potential security concerns
while providing no apparent value to the Kafka community. See
[KIP-1193](https://cwiki.apache.org/confluence/x/dAxJFg) for more
details.

Reviewers: TengYao Chi <frankvicky@apache.org>, Ken Huang
 <s7133700@gmail.com>

---------

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
2025-07-22 20:36:24 +08:00
Apoorv Mittal f52f2b99e5
KAFKA-19476: Removing AtomicBoolean for findNextFetchOfffset (1/N) (#20207)
CI / build (push) Waiting to run Details
The PR refactors the findNextFetchOffset variable from AtomicBoolean to
boolean itself as the access is always done while holding a lock. This
also improves handling of `writeShareGroupState` method response where
now complete lock is not required, rather on sub-section.

Reviewers: Abhinav Dixit <adixit@confluent.io>, Andrew Schofield
 <aschofield@confluent.io>
2025-07-21 13:12:13 +01:00
Lan Ding ef07b5fad1
KAFKA-19461: Add share group admin integration tests to PlaintextAdminIntegrationTest (#20103)
Add its for `Admin.deleteShareGroupOffsets`,
`Admin.alterShareGroupOffsets` and `Admin.listShareGroupOffsets`  to
`PlaintextAdminIntegrationTest`.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-07-21 09:08:26 +01:00
Dongnuo Lyu 50598191dc
MINOR: Add tests on TxnOffsetCommit and EndTxnMarker protection against invalid producer epoch when TV2 is used (#20024)
CI / build (push) Waiting to run Details
This patch adds an API level integration test for the producer epoch
verification when processing transactional offset commit and end txn
markers.

Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi
 <kitingiao@gmail.com>, Sean Quah <squah@confluent.io>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-07-21 06:34:29 +08:00
Lan Ding 9a2f202a1e
MINOR: Move ClientQuotasRequestTest to server module (#20053)
CI / build (push) Waiting to run Details
1. Move ClientQuotasRequestTest to server module.
2. Rewrite ClientQuotasRequestTest in Java.

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-07-20 23:14:55 +08:00
Lan Ding 9572d19c59
KAFKA-19509: Improve error message when release version is wrong (#20185)
CI / build (push) Waiting to run Details
Improve the error message in the kafka-storage.sh when an incorrect
release-version is given. Specifically, following the behavior of
kafka-feature.sh, when an incorrect release-version is entered, it
returns the currently supported versions to the user.

Reviewers: TengYao Chi <frankvicky@apache.org>, Yung
 <yungyung7654321@gmail.com>
2025-07-18 11:39:55 +08:00
Elizabeth Bennett f81853ca88
KAFKA-19441: encapsulate MetadataImage in GroupCoordinator/ShareCoordinator (#20061)
CI / build (push) Waiting to run Details
The MetadataImage has a lot of stuff in it and it gets passed around in
many places in the new GroupCoordinator. This makes it difficult to
understand what metadata the group coordinator actually relies on and
makes it too easy to use metadata in ways it wasn't meant to be used. 

This change encapsulate the MetadataImage in an interface
(`CoordinatorMetadataImage`) that indicates and controls what metadata
the group coordinator actually uses. Now it is much easier at a glance
to see what dependencies the GroupCoordinator has on the metadata. Also,
now we have a level of indirection that allows more flexibility in how
the GroupCoordinator is provided the metadata it needs.
2025-07-18 08:16:54 +08:00
Gaurav Narula 12761c07ae
KAFKA-19458: resume cleaning on future replica dir change (#20082)
`ReplicaManager#alterReplicaLogDirs` does not resume log cleaner while
handling an `AlterReplicaLogDirs` request for a topic partition which
already has an `AlterReplicaLogDirs` in progress, leading to a resource
leak where the cleaning for topic partitions remains paused even after
the log directory has been altered.

This change ensures we invoke `LogManager#resumeCleaning` if the future
replica directory has changed.

Reviewers: Jun Rao <junrao@gmail.com>
2025-07-17 13:13:09 -07:00
Calvin Liu 9412051dc6
MINOR: Bump LATEST_PRODUCTION to 4.1IV1 and Use MV to enable ELR (#20137)
Removing the isEligibleLeaderReplicasV1Enabled to let ELR be enabled if
MV is at least 4.1IV1.  Also bump the Latest Prod MV to 4.1IV1

Reviewers: Paolo Patierno <ppatierno@live.com>, Jun Rao <junrao@gmail.com>
2025-07-17 11:53:10 -07:00
Logan Zhu d03878c7fb
MINOR: Migrate CoordinatorLoaderImpl from Scala to Java (#20089)
CI / build (push) Waiting to run Details
### Summary of Changes

- Rewrote both `CoordinatorLoaderImpl` and `CoordinatorLoaderImplTest`
in Java, replacing their original Scala implementations.
- Removed the direct dependency on `ReplicaManager` and replaced it with
functional interfaces for `partitionLogSupplier` and
`partitionLogEndOffsetSupplier`
- Preserved original logic and test coverage during migration.

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>,
 TengYao Chi <frankvicky@apache.org>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-07-18 01:51:46 +08:00
Gaurav Narula 7e9df7d03d
KAFKA-19505: allow mocking UnifiedLog#topicId in ReplicaManagerTest (#20167)
The mocked value for `UnifiedLog#topicId` was incorrectly set up which
caused test failure.

Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <payang@apache.org>, Satish Duggana <satishd@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-17 10:40:00 +08:00
Sanskar Jhajharia 65a9337739
MINOR: Add ShareFetch quota session verification test (#20164)
CI / build (push) Waiting to run Details
### Background
As part of KIP-932 implementation, ShareFetch requests need to properly
integrate with Kafka's quota system. This requires that ShareFetch
requests extract and pass the correct session information (Principal,
client address, client ID) to quota managers, ensuring consistent quota
enforcement between ShareFetch and traditional Fetch requests.

### Changes
This PR adds `testHandleShareFetchRequestQuotaTagsVerification()`,
`testHandleShareAcknowledgeRequestQuotaTagsVerification` and
`testHandleShareFetchWithAcknowledgementQuotaTagsVerification` to
`KafkaApisTest`, which provides verification of quota tag extraction and
session handling for ShareFetch and ShareAcknowledge requests.
   - Ensures ShareFetch/ShareAck requests are properly constructed with
the correct client ID, principal, client address, and API key
   - Verifies the request context contains the expected session
information
   - Uses `ArgumentCaptor` to capture the exact `Session` and
`RequestChannel.Request` objects passed to quota managers
   - Verifies both `quotas.fetch.maybeRecordAndGetThrottleTimeMs()` and
`quotas.request.maybeRecordAndGetThrottleTimeMs()` are called with
correct parameters as and when needed.
   - Validates that the captured `RequestChannel.Request` object
maintains the correct request context information
   - Ensures the client ID passed to quota managers matches the
test-defined value
   - Verifies that in case of Acks being piggybacked on the fetch
requests, the quotas are applied only once and not twice.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
2025-07-16 09:56:01 +01:00
Sanskar Jhajharia 9f092420f1
MINOR: Extend Quota Tests for ShareFetch requests (#20163)
### Summary
Extends RequestQuotaTest to include ShareFetch API quota testing,
ensuring compliance with KIP-932.

### Key Changes
- New test: testShareFetchUsesSameFetchSensor() - Verifies ShareFetch
and Fetch use the same FETCH quota sensor
- New test:
testResponseThrottleTimeWhenBothShareFetchAndRequestQuotasViolated() -
Tests ShareFetch throttling behaviour
- Request builder: Added ApiKeys.SHARE_FETCH case with proper ShareFetch
request construction
- Some minor cleanup wrt use of Collections

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
2025-07-14 21:28:25 +01:00
Kevin Wu a64f5bf6ab
KAFKA-19254 Add generic feature level metrics (#20021)
This PR adds the following metrics for each of the supported production
features (`metadata.version`, `kraft.version`, `transaction.version`,
etc.):

`kafka.server:type=MetadataLoader,name=FinalizedLevel,featureName=X`

`kafka.server:type=node-metrics,name=maximum-supported-level,feature-name=X`

`kafka.server:type=node-metrics,name=minimum-supported-level,feature-name=X`

Reviewers: Josep Prat <josep.prat@aiven.io>, PoAn Yang
 <payang@apache.org>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Lan Ding
 <isDing_L@163.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-15 04:27:04 +08:00
Rajani K a61a37f7dd
KAFKA-19452: Fix flaky test LogRecoveryTest.testHWCheckpointWithFailuresMultipleLogSegments (#20121)
CI / build (push) Waiting to run Details
The `testHWCheckpointWithFailuresMultipleLogSegments` test in
`LogRecoveryTest` was failing intermittently due to a race condition
during its failure simulation.

In successful runs, the follower broker would restart and rejoin the
In-Sync Replica (ISR) set before the old leader's failure was fully
processed. This allowed for a clean and timely leader election to the
now in-sync follower.

However, in the failing runs, the follower did not rejoin the ISR before
the leader election was triggered. With no replicas in the ISR and
unclean leader election disabled by default for the test, the controller
correctly refused to elect a new leader, causing the test to time out.

This commit fixes the flakiness by overriding the controller
configuration for this test to explicitly enable unclean leader
election. This allows the out-of-sync replica to be promoted to leader,
making the test deterministic and stable.

Reviewers: Jun Rao <junrao@gmail.com>
2025-07-14 09:42:00 -07:00
Luke Chen e1ff387605
KAFKA-14915: Allow reading from remote storage for multiple partitions in one fetchRequest (#20045)
This PR enables reading remote storage for multiple partitions in one
fetchRequest. The main changes are:
1. In `DelayedRemoteFetch`, we accept multiple remoteFetchTasks and
other metadata now.
2. In `DelayedRemoteFetch`, we'll wait until all remoteFetch done,
either succeeded or failed.
3. In `ReplicaManager#fetchMessage`, we'll create one
`DelayedRemoteFetch` and pass multiple remoteFetch metadata to it, and
watch all of them.
4. Added tests

Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Federico Valeri <fedevaleri@gmail.com>, Satish Duggana <satishd@apache.org>
2025-07-14 19:42:08 +05:30
Apoorv Mittal 986322dc36
MINOR: Moving the rollback out of lock in share partition (#20153)
CI / build (push) Has been cancelled Details
Fixup PR Labels / fixup-pr-labels (needs-attention) (push) Has been cancelled Details
Fixup PR Labels / fixup-pr-labels (triage) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.7.2) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.8.1) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.9.1) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (4.0.0) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (latest) (push) Has been cancelled Details
Fixup PR Labels / needs-attention (push) Has been cancelled Details
Flaky Test Report / Flaky Test Report (push) Has been cancelled Details
Moving rollback out of lock, if persister returns a completed future for
write state then same data-plane-request-handler thread should not call
purgatory safeTryAndComplete while holding SharePartition's write lock.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit
 <adixit@confluent.io>
2025-07-11 15:22:03 +01:00
Jhen-Yung Hsu 007fe6e92a
KAFKA-19466 LogConcurrencyTest should close the log when the test completes (#20110)
- Fix testUncommittedDataNotConsumedFrequentSegmentRolls() and
testUncommittedDataNotConsumed(), which call createLog() but never close
the log when the tests complete.
- Move LogConcurrencyTest to the Storage module and rewrite it in Java.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-10 01:01:42 +08:00
Gaurav Narula 36b9bb94f1
KAFKA-19474 Move WARN log on log truncation below HWM (#20106)
CI / build (push) Waiting to run Details
#5608 introduced a regression where the check for `targetOffset <
log.highWatermark`
to emit a `WARN` log was made incorrectly after truncating the log.

This change moves the check for `targetOffset < log.highWatermark`  to
`UnifiedLog#truncateTo` and ensures we emit a `WARN` log on truncation
below  the replica's HWM by both the `ReplicaFetcherThread` and
`ReplicaAlterLogDirsThread`

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-07-09 09:55:02 +08:00
Jonah Hooper d86ba7f54a
KAFKA-18681: Created GetReplicaLogInfo RPCs (#19664)
CI / build (push) Waiting to run Details
Creates GetReplicaLogInfoRequest and GetReplicaLogInfoResponse RPCs
Information returned by these brokers will be used to aid
unclean-recovery by selecting longest logs.

Reviewers: Alyssa Huang <ahuang@confluent.io>, Calvin Liu <caliu@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, TaiJuWu <tjwu1217@gmail.com>
2025-07-08 10:41:01 -07:00
Jhen-Yung Hsu dde0b8cd92
MINOR: Prevent unnecessary test runs - KAFKA-19042 follow-up (#20122)
CI / build (push) Waiting to run Details
PlaintextConsumerTest should extend AbstractConsumerTest instead
BaseConsumerTest. Otherwise, those tests will be executed on both
`clients-integration-tests` and `core` (see
https://github.com/apache/kafka/pull/20081/files#r2190749592).

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-08 07:42:15 +08:00
Ken Huang a399852ced
KAFKA-19042 Move PlaintextConsumerTest to client-integration-tests module (#20081)
Use Java to rewrite PlaintextConsumerTest by new test infra and  move it
to client-integration-tests module.

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-08 01:41:59 +08:00
Bolin Lin e8ee7fc210
KAFKA-19315 Move ControllerMutationQuotaManager to server module (#19807)
CI / build (push) Has been cancelled Details
Migrate ControllerMutationQuotaManager to Java implementation and move
to server module, including ClientQuotaManager and associated files.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-07 01:55:38 +08:00
Andrew Schofield 729f9ccf06
KAFKA-19440: Handle top-level errors in AlterShareGroupOffsets RPC (#20049)
While testing the code in https://github.com/apache/kafka/pull/19820, it
became clear that the error handling problems were due to the underlying
Admin API. This PR fixes the error handling for top-level errors in the
AlterShareGroupOffsets RPC.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Lan Ding
 <isDing_L@163.com>, TaiJuWu <tjwu1217@gmail.com>
2025-07-03 11:00:56 +01:00
Luke Chen eb378da99c
KAFKA-19462: Count fetch size when remote fetch (#20088)
CI / build (push) Waiting to run Details
Estimate the fetch size for remote fetch to avoid to exceed the
`fetch.max.bytes` config. We don't want to query the remoteLogMetadata
during API handling, thus we assume the remote fetch can get
`max.partition.fetch.bytes` size. Tests added.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-07-03 10:45:59 +08:00
Abhinav Dixit 7cb370b786
KAFKA-19463: nextFetchOffset does not take ongoing state transition into account (#20080)
CI / build (push) Waiting to run Details
### About
`nextFetchOffset` function in `SharePartition` updates the fetch offsets
without considering batches/offsets which might be undergoing state
transition. This can cause problems in updating to the right fetch
offset.

### Testing
The new code added has been tested with the help of unit tests.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
2025-07-02 18:09:43 +01:00
Yunchi Pang 42041f4772
MINOR: Refactor createResponseConfig to avoid collection copy and conversion (#19867)
issue: https://github.com/apache/kafka/pull/19687/files#r2094574178

Why:
- To improve performance by avoiding redundant temporary collections and
repeated method calls.
- To make the utility more flexible for inputs from both Java and Scala.

What:
- Refactored `createResponseConfig` in `ConfigHelper.scala` by
overloading the method to accept both Java maps and `AbstractConfig`.
- Extracted helper functions to `ConfigHelperUtils` in the server
module.

Reviewers: Ken Huang <s7133700@gmail.com>, Jhen-Yung Hsu
<jhenyunghsu@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping
Tsai <chia7712@gmail.com>
2025-07-02 21:32:11 +08:00
Tsung-Han Ho (Miles Ho) ad934d3202
MINOR: Remove threadNamePrefix parameter from ReplicaManager and ReplicaFetcherManager (#20069)
CI / build (push) Waiting to run Details
- remove `threadNamePrefix` from `ReplicaManager` constructor
- update `BrokerServer` to use updated constructor
- remove `threadNamePrefix` from `ReplicaFetcherManager`

Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi
 <frankvicky@apache.org>
2025-07-01 20:36:50 +08:00
TaiJuWu bd14ed21b4
KAFKA-18486 Remove ReplicaManager#becomeLeaderOrFollower (#20037)
The PR do following:

1. Remove  ReplicaManager#becomeLeaderOrFollower.
2. Remove `LeaderAndIsrRequest` and `LeaderAndIsrResponse`
3. Migrate `LeaderAndIsrRequest.PartitionState` to server-common module
and change to `PartitionState`
4. Remove `ControllerEpoch` from PartitionState
5. Remove `isShuttingDown` from BrokerServer and ReplicaManager

Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-30 01:20:49 +08:00
TaiJuWu a95522a5ba
KAFKA-19042 Rewrite ConsumerBounceTest by Java (#19822)
This PR does the following:
1) Rewrites consumerBounceTest in Java.
2) Moves the test to clients-integration-test.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-06-30 00:40:36 +08:00
Xuan-Zhang Gong 05b6e81688
KAFKA-19420 Don't export SocketServer from ClusterInstance (#20002)
CI / build (push) Waiting to run Details
Fixup PR Labels / fixup-pr-labels (needs-attention) (push) Has been cancelled Details
Fixup PR Labels / fixup-pr-labels (triage) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.7.2) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.8.1) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (3.9.1) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (4.0.0) (push) Has been cancelled Details
Docker Image CVE Scanner / scan_jvm (latest) (push) Has been cancelled Details
Fixup PR Labels / needs-attention (push) Has been cancelled Details
Refactor the code related to SocketServer  SocketServer is an internal
class, and normally the integration tests should not use it directly.
[KAFKA-19239](https://issues.apache.org/jira/browse/KAFKA-19239) will
add a new helper to expose the bound ports, and so the tests that need
to send raw request can leverage it without accessing the SocketServer.

Reviewers: PoAn Yang <payang@apache.org>, Ken Huang
 <s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-27 21:12:57 +08:00
Ken Huang b919836551
KAFKA-17662: config.providers configuration missing from the docs (#18930)
Ensure the config.providers configuration is documented for all
components supporting it

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Greg Harris
<gharris1727@gmail.com>, Matthias J. Sax <mjsax@apache.org>
2025-06-27 14:13:55 +02:00
Apoorv Mittal 96ef1c520a
KAFKA-19436: Restrict cache update for ongoing batch/offset state (#20041)
CI / build (push) Waiting to run Details
In the stress testing it was noticed that on acquisition lock timeout,
some offsets were not found in the cache. The cache can be tried to be
updated in different acknowledgement calls hence if there is an ongoing
transition which is not yet finished but another parallel
acknowledgement triggers the cache update then the cache can be updated
incorrectly, while first transition is not yet finished.

Though the cache update happens for Archived and Acknowldeged records
hence this issue or existing implementation should not hamper the queues
functionality. But it might update the cache early when persister call
might fail or this issue triggers error logs with offset not found in
cache when acquisition lock timeouts (in some scenarios).

Reviewers: Abhinav Dixit <adixit@confluent.io>, Andrew Schofield
 <aschofield@confluent.io>
2025-06-26 15:08:15 +01:00
David Jacot f6a78c4c2b
KAFKA-19246; OffsetFetch API does not return group level errors correctly with version 1 (#19704)
The OffsetFetch API does not support top level errors in version 1.
Hence, the top level error must be returned at the partition level.

Side note: It is a tad annoying that we create error response in
multiple places (e.g. KafkaApis, Group CoordinatorService). There were a
reason for this but I cannot remember.

Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Sean Quah <squah@confluent.io>, Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>
2025-06-26 06:29:43 -07:00
Sanskar Jhajharia 56aeaa4c44
MINOR: Cleanup ShareFetchAcknowledgeRequestTest (#19852)
CI / build (push) Waiting to run Details
Now that Kafka supports Java 17, this PR cleans up the
ShareFetchAcknowledgeRequestTest.
The changes mostly include:
- Collections.singletonList() is replaced with List.of()
- Get rid of all asJava conversions

Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-06-26 13:56:18 +08:00
Mahsa Seifikar 7aaba96cc1
KAFKA-19282: Update quotaTypesEnabled on quota removal in ClientQuotaManager (#19742)
CI / build (push) Waiting to run Details
In `kafka.server.ClientQuotaManager` class, `quotaTypesEnabled` is not updated when a quota is removed via `removeQuota` method in `DefaultQuotaCallback` class. This field is set when quotas are added in `updateQuota` but it's never changed or cleared. So in case all the quotas have been removed dynamically, the system may incorrectly assume the quotas are active, which leads to unnecessary metric creation or updates until the broker is restarted.

Reviewers: Jonah Hooper <jhooper@confluent.io>, Hailey Ni <hni@confluent.io>, Alyssa Huang <ahuang@confluent.io>, David Jacot <djacot@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2025-06-25 21:29:46 +01:00