Commit Graph

392 Commits

Author SHA1 Message Date
Ken Huang 7d098cfbbd
KAFKA-17876/ KAFKA-19150 Rename AssignmentsManager and RemoteStorageThreadPool metrics (#20265)
Rename org.apache.kafka.server:type=AssignmentsManager and
org.apache.kafka.storage.internals.log.RemoteStorageThreadPool metrics
for the consist, these metrics should be

- `kafka.log.remote:type=...`
- `kafka.server:type=...`

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-29 01:24:38 +08:00
Sanskar Jhajharia d2a699954d
MINOR: Cleanup `toString` methods in Storage Module (#20432)
Getting rid of a bunch of `toString` functions in record classes in
Storage Module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-28 23:15:28 +08:00
Ken Huang 41611b4bd2
MINOR: Followup KAFKA-19112 document updated (#20492)
Some sections are not very clear, and we need to update the
documentation.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Jun Rao
 <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-28 19:06:06 +08:00
Ritika Reddy 0a483618b9
KAFKA-19690-Add epoch check before verification guard check to prevent unexpected fatal error (#20534)
We are seeing cases where a Kafka Streams (KS) thread stalls for ~20
seconds. During this stall, the broker correctly aborts the open
transaction (triggered by the 10-second transaction timeout).   However,
when the KS thread resumes, instead of receiving the expected
InvalidProducerEpochException (which we already handle gracefully as
part of transaction abort), the client is instead hit with an
InvalidTxnStateException. KS currently treats this as a fatal error,
causing the application to fail.

To fix this, we've added an epoch check before the verification check to
send the recoverable  InvalidProducerEpochException instead of the fatal
InvalidTxnStateException. This helps safeguard both tv1 and tv2 clients

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-09-23 13:45:42 -07:00
Uladzislau Blok f16d1f3c9d
KAFKA-19299: Fix race condition in RemoteIndexCacheTest (#19927)
This MR should be couple of race conditions in RemoteIndexCacheTest.

1. There was a race condition between cache-cleanup-thread and test
thread, which wants to check that cache is gone. This was fixed with
TestUtils#waitForCondition
2. After each test we check that there is not thread leak. This check
wasn't working properly, because live of thread status is set by JVM
level, we can only set interrupted status (using private native void
interrupt0(); method under the hood), but we don't really know when JVM
will change the live status of thread. To fix this I've refactored
TestUtils#assertNoLeakedThreadsWithNameAndDaemonStatus method to use
TestUtils#waitForCondition. This fix should also affect few other tests,
which were flaky because of this check. See gradle run on

[develocity](https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Europe%2FLondon&tests.container=org.apache.kafka.storage.internals.log.RemoteIndexCacheTest&tests.sortField=FLAKY)

After fix test were run 10000 times with repeated test annotation:

`./gradlew clean storage:test --tests

org.apache.kafka.storage.internals.log.RemoteIndexCacheTest.testCacheEntryIsDeletedOnRemoval`
...  `Gradle Test Run :storage:test > Gradle Test Executor 20 >
RemoteIndexCacheTest > testCacheEntryIsDeletedOnRemoval() > repetition
9998 of 10000 PASSED`  `Gradle Test Run :storage:test > Gradle Test
Executor 20 > RemoteIndexCacheTest > testCacheEntryIsDeletedOnRemoval()
> repetition 9999 of 10000 PASSED`  `Gradle Test Run :storage:test >
Gradle Test Executor 20 > RemoteIndexCacheTest >
testCacheEntryIsDeletedOnRemoval() > repetition 10000 of 10000 PASSED`
`BUILD SUCCESSFUL in 20m 9s`  `148 actionable tasks: 148 executed`

Reviewers: Lianet Magrans <lmagrans@confluent.io>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-09-22 11:20:14 -04:00
Chang-Chi Hsu 5919762009
MINOR: Remove exitMessage.set() call in TopicBasedRemoteLogMetadataManagerTest (#20563)
- **Reasons:** In this case, the `exit(int statusCode)` method invokes
`exit(statusCode, null)`, which means
the `message` argument is always `null` in this code path. As a result,
assigning
`exitMessage` has no effect and can be safely removed.

- **Changes:** Remove a redundant field assignment.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-20 18:04:10 +08:00
keemsisi 2fd54837f0
MINOR: Update on fixing tag description missing in javadoc (#20380)
* Added tag description to @throws in method javadoc
* Added explicit throws IndexOffsetOverflowException to method signature

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-15 10:13:49 +08:00
Ken Huang 0a12eaa80e
KAFKA-19112 Unifying LIST-Type Configuration Validation and Default Values (#20334)
We add the three main changes in this PR

- Disallowing null values for most LIST-type configurations makes sense,
since users cannot explicitly set a configuration to null in a
properties file. Therefore, only configurations with a default value of
null should be allowed to accept null.
- Disallowing duplicate values is reasonable, as there are currently no
known configurations in Kafka that require specifying the same value
multiple times. Allowing duplicates is both rare in practice and
potentially confusing to users.
- Disallowing empty list, even though many configurations currently
accept them. In practice, setting an empty list for several of these
configurations can lead to server startup failures or unexpected
behavior. Therefore, enforcing non-empty lists helps prevent
misconfiguration and improves system robustness.
These changes may introduce some backward incompatibility, but this
trade-off is justified by the significant improvements in safety,
consistency, and overall user experience.

Additionally, we introduce two minor adjustments:

- Reclassify some STRING-type configurations as LIST-type, particularly
those using comma-separated values to represent multiple entries. This
change reflects the actual semantics used in Kafka.
- Update the default values for some configurations to better align with
other configs.
These changes will not introduce any compatibility issues.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-09-06 01:25:55 +08:00
Matthias J. Sax 342a8e6773
MINOR: suppress build warning (#20424)
Suppress build warning.

Reviewers: TengYao Chi <frankvicky@apache.org>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-01 11:12:11 -07:00
Matthias J. Sax c7154b8bf8
MINOR: improve RLMQuotaMetricsTest (#20425)
Adds metrics description verification to RLMQuotaMetricsTest.

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-08-29 17:35:48 +08:00
Abhijeet Kumar 8d93d1096c
KAFKA-17108: Add EarliestPendingUpload offset spec in ListOffsets API (#16584)
This is the first part of the implementation of

[KIP-1023](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset)

The purpose of this pull request is for the broker to start returning
the correct offset when it receives a -6 as a timestamp in a ListOffsets
API request.

Added unit tests for the new timestamp.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-08-27 08:34:31 +05:30
Chang-Chi Hsu eba9839776
MINOR: Remove fetchQuotaMetrics and copyQuotaMetrics on close (#20394)
- Changes: Remove  fetchQuotaMetrics and copyQuotaMetrics in RemoteLogManager on close

from: https://github.com/apache/kafka/pull/20342#discussion_r2290612736

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-08-23 10:04:58 +05:30
Kamal Chandraprakash a056672f7c
KAFKA-19599: Reduce the frequency of ReplicaNotAvailableException thrown to clients when RLMM is not ready (#20345)
During broker restarts, the topic-based RemoteLogMetadataManager (RLMM)
constructs the state by reading the internal `__remote_log_metadata`
topic. When the partition is not ready to perform remote storage
operations, then ReplicaNotAvailableException thrown back to the
consumer. The clients retries the request immediately.

This results in a lot of FETCH requests on the broker and utilizes the
request handler threads. Using the CountdownLatch to reduce the
frequency of ReplicaNotAvailableException thrown back to the clients.
This will improve the request handler thread usage on the broker.

Previously for one consumer, when RLMM is not ready for a partition,
then ~9K  FetchConsumer requests / sec are received on the broker. With
this  patch, the number of FETCH requests reduced by 95% to 600 / sec.

Reviewers: Lan Ding <isDing_L@163.com>, Satish Duggana
 <satishd@apache.org>
2025-08-20 09:48:57 +05:30
Kamal Chandraprakash f0c3d93104
KAFKA-19597: Stop the RSM after closing the remote-log reader threads to handle requests gracefully (#20342)
During shutdown, when the RSM closes first, then the ongoing requests
might throw an error. To handle the ongoing requests gracefully, closing
the RSM after closing the remote-log reader thread pools.

Reviewers: Satish Duggana <satishd@apache.org>
2025-08-19 21:56:27 +05:30
Lan Ding d0a9a04a02
MINOR: Cleanups in storage module (#20270)
Cleanups including:

- Rewrite `FetchCountAndOp`  as a record class
- Replace `Tuple` by `Map.Entry`

Reviewers: TengYao Chi <frankvicky@apache.org>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-07-31 20:55:49 +08:00
Mickael Maison 6973deab03
MINOR: Cleanups in storage module (#20087)
Cleanups including:
- Java 17 syntax, record and switch
- assertEquals() order
- javadoc

Reviewers: Andrew Schofield <aschofield@confluent.io>, Jhen-Yung Hsu
 <jhenyunghsu@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-07-30 16:02:01 +08:00
Ken Huang 96c8e86cdf
KAFKA-19530 RemoteLogManager should record lag stats when remote storage is offline (#20218)
CI / build (push) Waiting to run Details
When remote storage is offline, then the segmentLag and bytesLag metrics
are not recorded.   These metrics are useful to know the pending data to
upload when remote storage is down.

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Kamal Chandraprakash
 <kamal.chandraprakash@gmail.com>
2025-07-29 20:08:06 +05:30
George Wu 7dba91d025
KAFKA-19484: Fix bug with tiered storage throttle metrics (#20129)
Fixes a bug with tiered storage quota metrics introduced in [KIP-956](https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas).
The metrics tracking how much time have been spent in a throttled state
can stop reporting if a cluster stops stops doing remote copy/fetch and
the sensors go inactive.

This change delegates the job of refreshing inactive sensors to
SensorAccess. There's pretty similar logic in RLMQuotaManager which is
actually responsible for tracking and enforcing quotas and also uses a
Sensor object.

```
remote-fetch-throttle-time-avg
remote-copy-throttle-time-avg
remote-fetch-throttle-time-max
remote-copy-throttle-time-max
```
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-07-29 19:37:41 +05:30
Lan Ding dfced692d2
KAFKA-19551: Remove the handling of FatalExitError in RemoteStorageThreadPool (#20245)
CI / build (push) Waiting to run Details
FatalExitError is not thrown after
[KAFKA-19425](https://issues.apache.org/jira/browse/KAFKA-19425). Clean
up the handling of FatalExitError in `RemoteStorageThreadPool`.
2025-07-28 17:49:45 +05:30
stroller e61c297b73
KAFKA-19425: Stop the server when fail to initialize to avoid local segment never got deleted. (#20007)
We found that one broker's local segment on disk never get removed
forever no matter how long it stored. The disk always keep increasing.


![image](https://github.com/user-attachments/assets/42129bb6-7d07-481b-923f-971da3ab12da)
note: Partition 2's node is the exception node.

After we trouble shooting. we find if one broker is very slow to startup
it will cause the
TopicBasedRemoteLogMetadataManager#initializeResources's fail sometime
(it meet expectation due to the server is not ready as fast). Thus it
won't stop the server so that the server still run just with some
exception log but not shutdown.  It won't upload to remote for the local
so that the local segment never to deleted.

So propose the change to shutdown the broker to avoid the silence
critical error which caused the disk keep increasing forever.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Luke
 Chen <showuon@gmail.com>
2025-07-28 17:47:09 +05:30
majialong a27d6e32b0
MINOR: Optimize RemoteLogManager#buildFilteredLeaderEpochMap (#20205)
Optimize `RemoteLogManager#buildFilteredLeaderEpochMap` .

Add a temporary unit test `testBuildFilteredLeaderEpochMapModify` in
`RemoteLogManagerTest` to verify the output consistency of the method
before and after optimization.

Randomly generate leaderEpochs and iterate 100000 times for
verification.

```
    @Test
    public void testBuildFilteredLeaderEpochMapModify() {
        int testIterations = 100000;

        for (int i = 0; i < testIterations; i++) { TreeMap<Integer,
Long> leaderEpochToStartOffset =
generateRandomLeaderEpochAndStartOffset();

            // before optimize              NavigableMap<Integer, Long>
optimizeBefore =
RemoteLogManager.buildFilteredLeaderEpochMap(leaderEpochToStartOffset);

            // after optimize              NavigableMap<Integer, Long>
optimizeAfter =
RemoteLogManager.buildFilteredLeaderEpochMap2(leaderEpochToStartOffset);

            assertEquals(optimizeBefore, optimizeAfter);          } }

    private static TreeMap<Integer, Long>
generateRandomLeaderEpochAndStartOffset() {          TreeMap<Integer,
Long> map = new TreeMap<>();          Random random = new Random();
int numEntries = random.nextInt(100000);          long lastStartOffset =
0;

        for (int i = 0; i < numEntries; i++) {              // generate
a leader epoch              int leaderEpoch = random.nextInt(100000);
long startOffset;

            // generate a random start offset , or use the last start
offset              if (i > 0 && random.nextDouble() < 0.2) {
startOffset = lastStartOffset;              } else { startOffset =
Math.abs(random.nextLong()) % 100000;              } lastStartOffset =
startOffset;

            map.put(leaderEpoch, startOffset);
        }
        return map;
    }
```

Command:
``` ./gradlew storage:test --tests RemoteLogManagerTest```

Result:  All unit tests passed.

<img width="1258" height="424" alt="image"
src="https://github.com/user-attachments/assets/7d9fc3b5-3bbc-440f-b1cf-3a2a5f97557a"
/>  <img width="411" height="66" alt="image"
src="https://github.com/user-attachments/assets/22a0b443-88e8-43d2-a3f2-51266935ed34"
/>

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>,
 Chia-Ping Tsai <chia7712@gmail.com>
2025-07-24 23:16:27 +08:00
Kamal Chandraprakash 93adaea599
KAFKA-19523: Gracefully handle error while building remoteLogAuxState (#20201)
CI / build (push) Waiting to run Details
Improve the error handling while building the remote-log-auxiliary state
when a follower node with an empty disk begin to synchronise with the
leader. If the topic has remote storage enabled, then the
ReplicaFetcherThread attempt to build the remote-log-auxiliary state.
Note that the remote-log-auxiliary state gets invoked only when the
leader-log-start-offset is non-zero and leader-log-start-offset is not
equal to leader-local-log-start-offset.

When the LeaderAndISR request is received, then the
ReplicaManager#becomeLeaderOrFollower invokes 'makeFollowers' initially,
followed by the RemoteLogManager#onLeadershipChange call. As a result,
when ReplicaFetcherThread initiates the
RemoteLogManager#fetchRemoteLogSegmentMetadata, the partition may not
have been initialized at that time and throws retriable exception.

Introduced RetriableRemoteStorageException to gracefully handle the
error.

After the patch:
```
[2025-07-19 19:28:20,934] INFO [ReplicaFetcher replicaId=3, leaderId=1,
fetcherId=0] Could not build remote log auxiliary state for orange-1 due
to error: RemoteLogManager is not ready for partition: orange-1
(kafka.server.ReplicaFetcherThread)
[2025-07-19 19:28:20,934] INFO [ReplicaFetcher replicaId=3, leaderId=2,
fetcherId=0] Could not build remote log auxiliary state for orange-0 due
to error: RemoteLogManager is not ready for partition: orange-0
(kafka.server.ReplicaFetcherThread)
```

Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
2025-07-23 19:29:31 +05:30
Kamal Chandraprakash 16c079ed23
KAFKA-19525: Refactor TopicBasedRLMM implementation to remove unused code (#20204)
CI / build (push) Waiting to run Details
- startConsumerThread is always true so removed the variable.
- Replaced the repetitive lock handling logic with
`withReadLockAndEnsureInitialized` to reduce duplication and improve
readability.
- Consolidated the logic in `initializeResources` and. simplified method
arguments to better encapsulate configuration.
- Extracted common code and reduced the usage of global variables.
- Named the variables properly.

Tests:
- Existing UTs since this patch refactored the code.

Reviewers: PoAn Yang <payang@apache.org>
2025-07-23 12:19:13 +05:30
Chang-Chi Hsu 8a5549ca9b
MINOR: Rename waitForTopic to waitTopicCreation (#20216)
Changes: Rename `waitForTopic` to `waitTopicCreation` for better clarity
Reasons: To align with `waitTopicDeletion`  Reference:
https://github.com/apache/kafka/pull/20108/files#r2221659660

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<frankvicky@apache.org>
2025-07-22 21:02:57 +08:00
majialong 9b542b6ea2
MINOR: Correct RemoteLogManager.getLeaderEpochEntries comment (#20181)
CI / build (push) Waiting to run Details
The comment on the RemoteLogManager.getLeaderEpochEntries method has a
small error description,it should be start(inclusive)and end(exclusive).

Reviewers: Ken Huang <s7133700@gmail.com>, Lan Ding <isDing_L@163.com>,
 Chia-Ping Tsai <chia7712@gmail.com>
2025-07-19 00:04:03 +08:00
Masahiro Mori daece61a50
MINOR: Refactor LockUtils and improve comments (follow up to KAFKA-19390) (#20131)
CI / build (push) Waiting to run Details
This PR performs a refactoring of LockUtils and improves inline
comments, as a follow-up to https://github.com/apache/kafka/pull/19961.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>
2025-07-15 10:07:01 -07:00
Lan Ding 6437135bc0
KAFKA-19451: fix flaky test RemoteIndexCacheTest.testCacheEntryIsDeletedOnRemoval() (#20085)
**Problem Description**
In the `RemoteIndexCache.cleanup()` method, the asynchronous invocation
of `index.deleteIfExists()` may cause a conflict. When the
`getIndexFileFromRemoteCacheDir()` method is executed, it utilizes
`Files.walk()` to traverse all files in the directory path. If
`index.deleteIfExists()` is triggered during this traversal, a
`NoSuchFileException` will be thrown.

**Solution**
To resolve this issue, ensure that `index.deleteIfExists()` has been
fully executed before invoking `getIndexFileFromRemoteCacheDir()`.

Reviewers: Jun Rao <junrao@gmail.com>
2025-07-14 12:01:50 -07:00
Luke Chen e1ff387605
KAFKA-14915: Allow reading from remote storage for multiple partitions in one fetchRequest (#20045)
This PR enables reading remote storage for multiple partitions in one
fetchRequest. The main changes are:
1. In `DelayedRemoteFetch`, we accept multiple remoteFetchTasks and
other metadata now.
2. In `DelayedRemoteFetch`, we'll wait until all remoteFetch done,
either succeeded or failed.
3. In `ReplicaManager#fetchMessage`, we'll create one
`DelayedRemoteFetch` and pass multiple remoteFetch metadata to it, and
watch all of them.
4. Added tests

Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Federico Valeri <fedevaleri@gmail.com>, Satish Duggana <satishd@apache.org>
2025-07-14 19:42:08 +05:30
Jhen-Yung Hsu 007fe6e92a
KAFKA-19466 LogConcurrencyTest should close the log when the test completes (#20110)
- Fix testUncommittedDataNotConsumedFrequentSegmentRolls() and
testUncommittedDataNotConsumed(), which call createLog() but never close
the log when the tests complete.
- Move LogConcurrencyTest to the Storage module and rewrite it in Java.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-10 01:01:42 +08:00
Gaurav Narula 36b9bb94f1
KAFKA-19474 Move WARN log on log truncation below HWM (#20106)
CI / build (push) Waiting to run Details
#5608 introduced a regression where the check for `targetOffset <
log.highWatermark`
to emit a `WARN` log was made incorrectly after truncating the log.

This change moves the check for `targetOffset < log.highWatermark`  to
`UnifiedLog#truncateTo` and ensures we emit a `WARN` log on truncation
below  the replica's HWM by both the `ReplicaFetcherThread` and
`ReplicaAlterLogDirsThread`

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-07-09 09:55:02 +08:00
Masahiro Mori ea7b145860
KAFKA-19390: Call safeForceUnmap() in AbstractIndex.resize() on Linux to prevent stale mmap of index files (#19961)
https://issues.apache.org/jira/browse/KAFKA-19390

The AbstractIndex.resize() method does not release the old memory map
for both index and time index files.  In some cases, Mixed GC may not
run for a long time, which can cause the broker to crash when the
vm.max_map_count limit is reached.

The root cause is that safeForceUnmap() is not being called on Linux
within resize(), so we have changed the code to unmap old mmap on all
operating systems.

The same problem was reported in
[KAFKA-7442](https://issues.apache.org/jira/browse/KAFKA-7442), but the
PR submitted at that time did not acquire all necessary locks around the
mmap accesses and was closed without fixing the issue.

Reviewers: Jun Rao <junrao@gmail.com>
2025-07-08 09:15:32 -07:00
Ken Huang d31885d33c
MINOR: Use <code> block instead of backtick (#20107)
CI / build (push) Waiting to run Details
When writing HTML, it's recommended to use the <code> element instead of
backticks for inline code formatting.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, TengYao Chi
<frankvicky@apache.org>
2025-07-06 14:49:51 +08:00
Jhen-Yung Hsu 2e3ddb22ae
MINOR: Fix the tests in LogValidatorTest (#20093)
CI / build (push) Waiting to run Details
Fix incorrect tests introduced in the refactor

5b9cbcf886

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>,
Chia-Ping Tsai <chia7712@gmail.com>
2025-07-03 19:04:43 +08:00
stroller 14ea11dc31
KAFKA-19371: Don't create the __remote_log_metadata topic when it already exists during broker restarts (#19899)
* The CREATE_TOPIC request gets issued only when it is clear that the
topic does not exist in the cluster.
* When the request to describe the topic gets timed-out or any exception
thrown other than UnknownTopicOrPartitionException, then the same gets
re-thrown and the describe/create topic request gets retried in the next
iteration until the initializationRetryMaxTimeoutMs gets breached.

Fixes: https://issues.apache.org/jira/browse/KAFKA-19371

Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash
<kamal.chandraprakash@gmail.com>

---------

Co-authored-by: stroller.fu <stroller.fu@zoom.us>
2025-07-02 11:22:26 +05:30
Okada Haruki 959021de59
KAFKA-19407 Fix potential IllegalStateException when appending to timeIndex (#19972)
## Summary
- Fix potential race condition in
LogSegment#readMaxTimestampAndOffsetSoFar(), which may result in
non-monotonic offsets and causes replication to stop.
- See https://issues.apache.org/jira/browse/KAFKA-19407 for the details
how it happen.

Reviewers: Vincent PÉRICART <mauhiz@gmail.com>, Jun Rao
 <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-25 00:35:53 +08:00
Bolin Lin 3404f65cdb
KAFKA-19324 Make org.apache.kafka.common.test.TestUtils package-private to prevent cross-module access (#19884)
Description

* Replace `org.apache.kafka.common.test.TestUtils` with
`org.apache.kafka.test.TestUtils` in outer package modules to
standardize test utility usage
* Move `waitUntilLeaderIsElectedOrChangedWithAdmin` method from
`org.apache.kafka.test.TestUtils` to `ClusterInstance` and refactor for
better code organization
* Add `org.apache.kafka.test.TestUtils` dependency to
`transaction-coordinator` import control

Reviewers: PoAn Yang [payang@apache.org](mailto:payang@apache.org), Ken
 Huang  [s7133700@gmail.com](mailto:s7133700@gmail.com), Ken Huang
 [s7133700@gmail.com](mailto:s7133700@gmail.com), Chia-Ping Tsai
 [chia7712@gmail.com](mailto:chia7712@gmail.com)
2025-06-22 22:47:40 +08:00
Xuan-Zhang Gong 79d2c3c62a
KAFKA-19406 Remove BrokerTopicStats#removeOldFollowerMetrics (#19962)
BTW: whether we should rename
`ReplicaManager#updateLeaderAndFollowerMetrics`

Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang
 <payang@apache.org>, TengYao Chi <kitingiao@gmail.com>, Lan Ding
 <isDing_L@163.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-19 17:57:22 +08:00
Kuan-Po Tseng 12d8a1bbf8
KAFKA-19237: Add dynamic config remote.log.manager.follower.thread.pool.size (#19809)
Deprecate the `remote.log.manager.thread.pool.size` configuration and introduce a new dynamic configuration:
`remote.log.manager.follower.thread.pool.size`.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Luke Chen <showuon@gmail.com>
2025-06-13 09:33:45 +05:30
Jhen-Yung Hsu 2e968560e0
MINOR: Cleanup simplify set initialization with Set.of (#19925)
Simplify Set initialization and reduce the overhead of creating extra
collections.

The changes mostly include:
- new HashSet<>(List.of(...))
- new HashSet<>(Arrays.asList(...)) / new HashSet<>(asList(...))
- new HashSet<>(Collections.singletonList()) / new
HashSet<>(singletonList())
- new HashSet<>(Collections.emptyList())
- new HashSet<>(Set.of())

This change takes the following into account, and we will not change to
Set.of in these scenarios:
- Require `mutability` (UnsupportedOperationException).
- Allow `duplicate` elements (IllegalArgumentException).
- Allow `null` elements (NullPointerException).
- Depend on `Ordering`. `Set.of` does not guarantee order, so it could
make tests flaky or break public interfaces.

Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang
 <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-11 18:36:14 +08:00
Gaurav Narula edd0efdebf
KAFKA-19221 Propagate IOException on LogSegment#close (#19607)
Log segment closure results in right sizing the segment on disk along
with the associated index files.

This is specially important for TimeIndexes where a failure to right
size may eventually cause log roll failures leading to under replication
and log cleaner failures.

This change uses `Utils.closeAll` which propagates exceptions, resulting
in an "unclean" shutdown. That would then cause the broker to attempt to
recover the log segment and the index on next startup, thereby avoiding
the failures described above.

Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Jun Rao
 <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-11 01:09:52 +08:00
Dmitry Werner f69379cf6b
MINOR: Remove unused code from storage classes (#19853)
CI / build (push) Waiting to run Details
Remove unused code from storage classes.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>,
 TengYao Chi <kitingiao@gmail.com>, Kuan-Po Tseng <brandboat@gmail.com>,
 Chia-Ping Tsai <chia7712@gmail.com>
2025-06-11 00:22:50 +08:00
Ritika Reddy 3479ce793b
KAFKA-18202: Add rejection for non-zero sequences in TV2 (KIP-890) (#19902)
This change handles rejecting non-zero sequences when there is an empty
producerIDState with TV2.  The scenario will be covered with the
re-triable OutOfOrderSequence error.

For Transactions V2 with empty state:   Allow only sequence 0 is allowed for
new producers or after state cleanup (new validation added)   Don't allow any
non-zero sequence is rejected with our specific error message   Don't allow any epoch
bumps still require sequence 0 (existing validation remains)

For Transactions V1 with empty state:   Allow ANY sequence number is allowed
(0, 5, 100, etc.)   Don't allow epoch bumps still require sequence 0 (existing
validation)

Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits
 <alivshits@confluent.io>
2025-06-06 09:23:10 -07:00
Ritika Reddy cc25d217da
KAFKA-18042: Reject the produce request with lower producer epoch early (KIP-890) (#19844)
CI / build (push) Waiting to run Details
With the transaction V2, replica manager checks whether the incoming
producer request produces to a partition belonging to a transaction.
ReplicaManager figures this out by checking the producer epoch stored in
the partition log. However, the current code does not reject the produce
request if its producer epoch is lower than the stored producer epoch.
It is an optimization to reject such requests earlier instead of sending
an AddPartitionToTxn request and getting rejected in the response.

Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits
 <alivshits@confluent.io>
2025-06-04 13:21:53 -07:00
Ken Huang bcda92b5b9
KAFKA-19080 The constraint on segment.ms is not enforced at topic level (#19371)
CI / build (push) Waiting to run Details
The main issue was that we forgot to set
`TopicConfig.SEGMENT_BYTES_CONFIG` to at least `1024 * 1024`, which
caused problems in tests with small segment sizes.

To address this, we introduced a new internal config:
`LogConfig.INTERNAL_SEGMENT_BYTES_CONFIG`, allowing us to set smaller
segment bytes specifically for testing purposes.

We also updated the logic so that if a user configures the topic-level
segment bytes without explicitly setting the internal config, the
internal value will no longer be returned to the user.

In addition, we removed
`MetadataLogConfig#METADATA_LOG_SEGMENT_MIN_BYTES_CONFIG` and added
three new internal configurations:
- `INTERNAL_MAX_BATCH_SIZE_IN_BYTES_CONFIG`
- `INTERNAL_MAX_FETCH_SIZE_IN_BYTES_CONFIG`
- `INTERNAL_DELETE_DELAY_MILLIS_CONFIG`

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-05-25 20:57:22 +08:00
Hong-Yi Chen 69a457d8a5
KAFKA-19034 [1/N] Rewrite RemoteTopicCrudTest by ClusterTest and move it to storage module (#19681)
CI / build (push) Waiting to run Details
This PR rewrites `RemoteTopicCrudTest` in Java using the `@ClusterTest`
framework and moves it to the `storage` module.

**Note:** Two test cases have not yet been migrated

- `testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic`
-

`testClusterWithoutTieredStorageStartsSuccessfullyIfTopicWithTieringDisabled`

These tests rely on modifying broker configs during the test lifecycle,
which `ClusterTest` currently does not support. They will be migrated in
a follow-up PR after
[#16808](https://github.com/apache/kafka/pull/16808) is merged, which
introduces support for config updates in `ClusterTest`.

Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-05-25 14:50:16 +08:00
Yu-Syuan Jheng 1407b12e2f
KAFKA-19313 Replace LogOffsetMetadata#UNIFIED_LOG_UNKNOWN_OFFSET by UnifiedLog.UNKNOWN_OFFSET (#19767)
CI / build (push) Waiting to run Details
Replaces the UNIFIED_LOG_UNKNOWN_OFFSET constant in LogOffsetMetadata
with UnifiedLog.UNKNOWN_OFFSET.

Reviewers: PoAn Yang <payang@apache.org>, Ken Huang
<s7133700@gmail.com>, YuChia Ma <minecraftmiku831@gmail.com>, Chia-Ping
Tsai <chia7712@gmail.com>
2025-05-24 23:33:26 +08:00
Lucas Brutschy bff1602df3
KAFKA-19280: Fix NoSuchElementException in UnifiedLog (#19717)
In FETCH requests and TXN_OFFSET_COMMIT requests, on current trunk we
run into a race condition inside UnifiedLog, causing a
`NoSuchElementException` in
`UnifiedLog.fetchLastStableOffsetMetadata(UnifiedLog.java:651)`.

The cause is that the line a performing an `isPresent` check on a
volatile Optional before accessing it in `get`, leaving the door open to
a race condition when the optional changes between `isPresent` and
`get`. This change takes a copy of the volatile variable first.
2025-05-17 21:17:38 +02:00
Jhen-Yung Hsu ced56a320b
MINOR: Move logDirs config out of KafkaConfig (#19579)
CI / build (push) Waiting to run Details
Follow up https://github.com/apache/kafka/pull/19460/files#r2062664349

Reviewers: Ismael Juma <ismael@juma.me.uk>, PoAn Yang
<payang@apache.org>, TaiJuWu <tjwu1217@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-05-17 00:52:20 +08:00
Andrew Schofield 7ae9a26fc2
MINOR: Mark RemoteIndexCacheTest.testConcurrentRemoveReadForCache1 flaky (#19732)
Marking flaky test as a result of 5% failure rate.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-05-16 09:03:08 +01:00
YuChia Ma 05169aa201
MINOR: Add deprecation warning for `log.cleaner.enable` and `log.cleaner.threads` (#19674)
Add a warning message when using log.cleaner.enable to remind users that
this configuration is deprecated. Also, add a warning message for
log.cleaner.threads=0 because in version 5.0, the value must be greater
than zero.

Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang
<payang@apache.org>, TengYao Chi <frankvicky@apache.org>, TaiJuWu
<tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-05-15 00:18:27 +08:00