Commit Graph

337 Commits

Author SHA1 Message Date
PoAn Yang 965743c35b
KAFKA-19131: Adjust remote storage reader thread maximum pool size to avoid illegal argument (#19532)
The remote storage reader thread pool use same count for both maximum
and core size. If users adjust the pool size larger than original value,
it throws `IllegalArgumentException`. Updated both value to fix the
issue.

---------

Signed-off-by: PoAn Yang <payang@apache.org>

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-04-25 15:36:17 +05:30
Xuan-Zhang Gong 18584b11ec
MINOR: ignore null judgement in LogCleaner (#19524)
about https://github.com/apache/kafka/pull/19387/files#r2052025917

Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai
 <chia7712@gmail.com>, TengYao Chi <frankvicky@apache.org>
2025-04-21 21:22:56 +08:00
Mickael Maison 7710d1c951
KAFKA-14487: Move LogManager static methods/fields to storage module (#19302)
Move the static fields/methods

Reviewers: Luke Chen <showuon@gmail.com>
2025-04-21 12:03:30 +02:00
TaiJuWu 6e4e0df057
KAFKA-18891: Add KIP-877 support to RemoteLogMetadataManager and RemoteStorageManager (#19286)
1. Remove `RemoteLogManager#startup` and
`RemoteLogManager#onEndpointCreated`
2. Move endpoint creation to `BrokerServer`
3. Move `RemoteLogMetadataManager#configure` and
`RemoteLogStorageManager#configure` to RemoteLogManager constructor

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ken Huang
 <s7133700@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>
2025-04-18 15:04:37 +02:00
Kamal Chandraprakash 2cd733c9b3
KAFKA-17184: Fix the error thrown while accessing the RemoteIndexCache (#19462)
For segments that are uploaded to remote, RemoteIndexCache caches the
fetched offset, timestamp, and transaction index entries on the first
invocation to remote, then the subsequent invocations are accessed from
local.

The remote indexes that are cached locally gets removed on two cases:

1. Remote segments that are deleted due to breach by retention size/time
and start-offset.
2. The number of cached indexes exceed the remote-log-index-cache size
limit of 1 GB (default).

There are two layers of locks used in the RemoteIndexCache. First-layer
lock on the RemoteIndexCache and the second-layer lock on the
RemoteIndexCache#Entry.

**Issue**

1. The first-layer of lock coordinates the remote-log reader and deleter
threads. To ensure that the reader and deleter threads are not blocked
on each other, we only take `lock.readLock()` when accessing/deleting
the cached index entries.
2. The issue happens when both the reader and deleter threads took the
readLock, then the deleter thread marked the index as
`markedForCleanup`. Now, the reader thread which holds the `indexEntry`
gets an IllegalStateException when accessing it.
3. This is a concurrency issue, where we mark the entry as
`markedForCleanup` before removing it from the cache. See
RemoteIndexCache#remove, and RemoteIndexCache#removeAll methods.
4. When an entry gets evicted from cache due to breach by maxSize of 1
GB, then the cache remove that entry before calling the evictionListener
and all the operations are performed atomically by caffeine cache.

**Solution**

1. When the deleter thread marks an Entry for deletion, then we rename
the underlying index files with ".deleted" as suffix and add a job to
the remote-log-index-cleaner thread which perform the actual cleanup.
Previously, the indexes were not accessible once it was marked for
deletion. Now, we allow to access those renamed files (from entry that
is about to be removed and held by reader thread) until those relevant
files are removed from disk.
2. Similar to local-log index/segment deletion, once the files gets
renamed with ".deleted" as suffix then the actual deletion of file
happens after `file.delete.delay.ms` delay of 1 minute. The renamed
index files gets deleted after 30 seconds.
3. During this time, if the same index entry gets fetched again from
remote, then it does not have conflict with the deleted entry as the
file names are different.

Reviewers: Satish Duggana <satishd@apache.org>
2025-04-18 16:43:37 +05:30
Mickael Maison c73d97de0c
KAFKA-14523: Move kafka.log.remote classes to storage (#19474)
Pretty much a straight forward move of these classes. I just updated
`RemoteLogManagerTest` to not use `KafkaConfig`

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-04-17 11:05:14 +02:00
Ken Huang b4e75fbab1
HOTFIX: add SuppressWarnings to TieredStorageTestUtils (#19494)
We need add SuppressWarnings annotation, because `log.cleaner.enable`
mark deprecated.

Reviewers: PoAn Yang <payang@apache.org>, Kuan-Po Tseng
<brandboat@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-04-17 11:10:59 +08:00
TengYao Chi 73afcc9b69
KAFKA-13610: Deprecate log.cleaner.enable configuration (#19472)
JIRA: KAFKA-13610  This patch deprecates the `log.cleaner.enable`
configuration. It's part of
[KIP-1148](https://cwiki.apache.org/confluence/x/XAyWF).

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang
 <payang@apache.org>, Ken Huang <s7133700@gmail.com>, Jhen-Yung Hsu
 <jhenyunghsu@gmail.com>
2025-04-16 10:27:44 +08:00
Mickael Maison 321a380d0a
KAFKA-14523: Decouple RemoteLogManager and Partition (#19391)
Remove the last dependency in the core module.

Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <poan.yang@suse.com>
2025-04-15 09:56:27 +02:00
Mickael Maison d183cf9ac1
KAFKA-18172 Move RemoteIndexCacheTest to the storage module (#19469)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-04-15 15:53:41 +08:00
Dmitry Werner 7863b35064
KAFKA-14485: Move LogCleaner to storage module (#19387)
Move LogCleaner and related classes to storage module and rewrite in
Java.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Jun Rao <junrao@gmail.com>
2025-04-11 09:21:05 -07:00
Nick Guo e69a311068
KAFKA-19076 replace `String` by `Supplier<String>` for UnifiedLog#maybeHandleIOException (#19392)
jira: https://issues.apache.org/jira/browse/KAFKA-19076

the message is used when the function encounters error, so the error
message should be created lazy.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-04-07 00:43:44 +08:00
Xuan-Zhang Gong ab4a0f0ec1
MINOR: small optimization by judgment (#19386)
judgments can help avoid unnecessary `segments.sizeInBytes()`  loops

from https://github.com/apache/kafka/pull/18393/files#r2029925512

Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-04-06 22:08:05 +08:00
TengYao Chi 74acbd200d
KAFKA-16758: Extend Consumer#close with an option to leave the group or not (#17614)
JIRA: [KAFKA-16758](https://issues.apache.org/jira/browse/KAFKA-16758)
This PR is aim to deliver

[KIP-1092](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=321719077),
please refer to KIP-1092 and KAFKA-16758 for further details.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Chia-Ping
Tsai <chia7712@gmail.com>, Kirk True <kirk@kirktrue.pro>
2025-04-05 22:02:45 -07:00
Mickael Maison 08a93fe12a
KAFKA-14523: Move DelayedRemoteListOffsets to the storage module (#19285)
Decouple RemoteLogManager and ReplicaManager.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-04-05 19:51:13 +08:00
Ken Huang ef73fb921b
MINOR: Remove DeleteSegmentsByRetentionTimeTest#executeTieredStorageTest flaky annotation (#19301)
This test was fixed by [this
commit](https://github.com/apache/kafka/pull/18861) and hasn't failed
for about two weeks. Therefore, we can remove the `@Flaky` annotation.

Gradle report:
https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1743081652853&search.startTimeMin=1741795200000&search.tags=github%2Ctrunk&search.timeZoneId=Asia%2FTaipei&tests.container=org.apache.kafka.tiered.storage.integration.DeleteSegmentsByRetentionTimeTest

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-04-05 00:05:57 +08:00
Sanskar Jhajharia 03b1b720e9
MINOR: Cleanup Storage Module (#19072)
Given that now we support Java 17 on our brokers, this PR replace the
use of the following in storage module:

- Collections.singletonList() and Collections.emptyList() with List.of()
- Collections.singletonMap() and Collections.emptyMap() with Map.of()
- Collections.singleton() and Collections.emptySet() with Set.of()
- Arrays.asList() with List.of()

Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-04-04 02:15:58 +08:00
TaiJuWu f1bb29b93a
MINOR: migrate BrokerCompressionTest to storage module (#19277)
There are two change for this PR.

1. Move `BrokerCompressionTest ` from core to storage
2. Rewrite `BrokerCompressionTest ` from scala to java

Reviewers: TengYao Chi <kitingiao@gmail.com>, PoAn Yang
<payang@apache.org>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-04-03 22:43:42 +08:00
PoAn Yang be80e3cb8a
KAFKA-18923: resource leak in RSM fetchIndex inputStream (#19111)
Fix resource leak in RSM inputStream.

Reviewers: Luke Chen <showuon@gmail.com>
2025-04-03 15:18:05 +08:00
PoAn Yang 4a5ae144ea
KAFKA-19032 Remove TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames (#19270)
The zookeeper mode was removed in 4.0. The test cases don't need to
specify quorum. Following variable and functions can be replaced:
- TestWithParameterizedQuorumAndGroupProtocolNames
- getTestQuorumAndGroupProtocolParametersClassicGroupProtocolOnly
- getTestQuorumAndGroupProtocolParametersConsumerGroupProtocolOnly
- getTestQuorumAndGroupProtocolParametersAll

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-30 02:11:07 +08:00
PoAn Yang c125cc7dd1
KAFKA-19036 Rewrite LogAppendTimeTest and move it to storage module (#19282)
Use Java to rewrite `LogAppendTimeTest` by new test infra and move it to
storage module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-29 03:14:53 +08:00
PoAn Yang b9d5597b44
KAFKA-17830 Cover unit tests for TBRLMM init failure scenarios (#19076)
Add unit tests for TBRLMM when initializing clients.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-27 18:20:02 +08:00
Dmitry Werner 84b8fec089
KAFKA-14486 Move LogCleanerManager to storage module (#19216)
Move LogCleanerManager and related classes to storage module and rewrite
in Java.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Jun Rao
<junrao@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Chia-Ping
Tsai <chia7712@gmail.com>
2025-03-27 12:35:38 +08:00
Jorge Esteban Quilcate Otoya f24945b519
KAFKA-15931: Cancel RemoteLogReader gracefully (#19197)
Reverts commit
2723dbf3a0
and
269e8892ad.

Instead of reopening the transaction index, it cancels the
RemoteFetchTask without interrupting it--avoiding to close the
TransactionIndex channel.

This will lead to complete the execution of the remote fetch but ignoring
the results. Given that this is considered a rare case, we could live
with this. If it becomes a performance issue, it could be optimized.

Reviewers: Jun Rao <junrao@gmail.com>
2025-03-20 10:20:44 -07:00
Ken Huang 31e1a57c41
KAFKA-18989 Optimize FileRecord#searchForOffsetWithSize (#19214)
The `lastOffset` includes the entire batch header, so we should check `baseOffset` instead.

To optimize this, we need to update the search logic. The previous
approach simply checked whether each batch's `lastOffset()` was greater
than or equal to the target offset. Once it found the first batch that
met this condition, it returned that batch immediately.

Now that we are using `baseOffset()`, we need to handle a special case:
if the `targetOffset` falls between the `lastOffset` of the previous
batch and the `baseOffset` of the matching batch, we should select the
matching batch. The updated logic is structured as follows:

1. First, if baseOffset exactly equals targetOffset, return immediately.
2. If we find the first batch with baseOffset greater than targetOffset
    - Check if the previous batch contains the target
- If there's no previous batch, return the current batch or the previous
batch doesn't contain the target, return the current batch
5. After iterating through all batches, check if the last batch contains
the target offset.

This code path is not thread-safe, so we need to prevent `EOFException`.
To avoid this exception, I am still using an early return. In this
scenario, `lastOffset` is still used within the loop, but it should be
executed at most once within the loop.

Therefore, in the new implementation, `lastOffset` will be executed at
most once. In most cases, this results in an optimization.

Test: Verifying Memory Usage Improvement  
To evaluate whether this optimization helps, I followed the steps below
to monitor memory usage:

1. Start a Standalone Kafka Server  
```sh
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/server.properties
bin/kafka-server-start.sh config/server.properties
```  

2. Use Performance Console Tools to Produce and Consume Records  

**Produce Records:**  
```sh
./kafka-producer-perf-test.sh \
  --topic test-topic \
  --num-records 1000000000 \
  --record-size 100 \
  --throughput -1 \
  --producer-props bootstrap.servers=localhost:9092
```  
**Consume Records:**  
```sh
./bin/kafka-consumer-perf-test.sh \
  --topic test-topic \
  --messages 1000000000 \
  --bootstrap-server localhost:9092
```  
It can be observed that memory usage has significantly decreased.
trunk:
![CleanShot 2025-03-16 at 11 53
31@2x](https://github.com/user-attachments/assets/eec26b1d-38ed-41c8-8c49-e5c68643761b)
this PR:
![CleanShot 2025-03-16 at 17 41
56@2x](https://github.com/user-attachments/assets/c8d4c234-18c2-4642-88ae-9f96cf54fccc)

Reviewers: Kirk True <kirk@kirktrue.pro>, TengYao Chi
<kitingiao@gmail.com>, David Arthur <mumrah@gmail.com>, Jun Rao
<junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-20 16:33:35 +08:00
Ken Huang 93499df6e9
KAFKA-18924 Running the storage module tests produces a storage/storage.log file (#19147)
Change log4j2 logging directory to `build/kafka-storage-test/`

Reviewers: Sanskar Jhajharia <sjhajharia@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-15 05:20:45 +08:00
Ken Huang f50a17fa8d
KAFKA-18606: Flaky test DeleteSegmentsByRetentionTimeTest#executeTieredStorageTest (#18861)
Jira: https://issues.apache.org/jira/browse/KAFKA-18606

This flaky test is caused by
23c459286b,
it modify the default `LOG_MESSAGE_TIMESTAMP_AFTER_MAX_MS_DEFAULT` from
`Long.MAX_VALUE` to `3600000`(1 hour) and remove the remove the produce
record timestamp
23c459286b (diff-d157da9c40cb386be02d1f917db9e5f6293cdbc82e45a39115bf8629fc19d55cL59).
This test case is testing that data handling after the retention period
(1ms) has expired then delete of related segmant. The old test
https://github.com/apache/kafka/pull/16932 add
`TimeUnit.DAYS.toMillis(1))` makes the third record expire, thus this
test is flaky a lot due to this record.

Reviewers: Jun Rao <jun@confluent.io>
2025-03-14 08:10:36 -07:00
Mickael Maison 759fbbba8b
KAFKA-14484: Move UnifiedLog to storage module (#19030)
Rewrite UnifiedLog in Java

Reviewers: Jun Rao <jun@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-13 10:49:55 +01:00
David Arthur 0ebc3e83c5
MINOR Mar 12 Flaky tests (#19190)
Mark the following tests as flaky:

* StickyAssignorTest > testLargeAssignmentAndGroupWithUniformSubscription
* DeleteSegmentsByRetentionTimeTest
* QuorumControllerTest > testUncleanShutdownBrokerElrEnabled

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-03-12 13:47:35 -04:00
Apoorv Mittal f3da8f500e
KAFKA-18936: Fix share fetch when records are larger than max bytes (#19145)
The PR fixes the behaviour when records are fetched which are larger
than `fetch.max.bytes` config.

The usage of `hardMaxBytesLimit` is in ReplicaManager where it decides
whether to fetch a single record or not. The file records get sliced
based on the bytes requested. However, if `hardMaxBytesLimit` is false
then at least one record is fetched and bytes are adjusted accordingly in
`localLog`.

Reviewers: Jun Rao <junrao@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>
2025-03-12 09:03:35 +00:00
Jorge Esteban Quilcate Otoya 2723dbf3a0
MINOR: fix add missing @Test lost while rebasing (#19149)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-10 22:19:01 +08:00
Lucas Brutschy fc2e3dfce9
MINOR: Disallow unused local variables (#18963)
Recently, we found a regression that could have been detected by static
analysis, since a local variable wasn't being passed to a method during
a refactoring, and was left unused. It was fixed in
[7a749b5](7a749b589f),
but almost slipped into 4.0. Unused variables are typically detected by
IDEs, but this is insufficient to prevent these kinds of bugs. This
change enables unused local variable detection in checkstyle for Kafka.

A few notes on the usage:
- There are two situations in which people actually want to have a local
variable but not use it. First, there are `for (Type ignored:
collection)` loops which have to loop `collection.length` number of
times, but that do not use `ignored` in the loop body. These are
typically still easier to read than a classical `for` loop. Second, some
IDEs detect it if a return value of a function such as `File.delete` is
not being used. In this case, people sometimes store the result in an
unused local variable to make ignoring the return value explicit and to
avoid the squiggly lines.
- In Java 22, unsued local variables can be omitted by using a single
underscore `_`. This is supported by checkstyle. In pre-22 versions,
IntelliJ allows such variables to be named `ignored` to suppress the
unused local variable warning. This pattern is often (but not
consistently) used in the Kafka codebase. This is, however, not
supported by checkstyle.

Since we cannot switch to Java 22, yet, and we want to use automated
detection using checkstyle, we have to resort to prefixing the unused
local variables with `@SuppressWarnings("UnusedLocalVariable")`. We have
to apply this in 11 cases across the Kafka codebase. While not being
pretty, I'd argue it's worth it to prevent bugs like the one fixed in
[7a749b5](7a749b589f).

Reviewers: Andrew Schofield <aschofield@confluent.io>, David Arthur
<mumrah@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Bruno
Cadonna <cadonna@apache.org>, Kirk True <ktrue@confluent.io>
2025-03-10 09:37:35 +01:00
Jorge Esteban Quilcate Otoya 269e8892ad
KAFKA-15931: Reopen TransactionIndex if channel is closed (#15241)
Cached TransactionIndex may get closed if interrupted, causing following calls to always fail with ClosedChannelException, and forcing process to be restarted. In order to avoid this issue, a new method is exposed by TransactionIndex to validate state of channel; and index is reopened if closed.

Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Nikhil Ramakrishnan <ramakrishnan.nikhil@gmail.com>
2025-03-07 19:31:21 +08:00
Xuan-Zhang Gong 45f932819e
KAFKA-18864:remove the Evolving tag from stable public interfaces (#19036)
The purpose of this PR is to remove the `@InterfaceStability.Evolving` from classes that were created over a year ago.

Reviewers: Jun Rao <junrao@gmail.com>
2025-02-28 13:24:24 -08:00
Vadym Zhytkevych bc24cac412
KAFKA-18371 TopicBasedRemoteLogMetadataManagerConfig exposes sensitive configuration data in logs (#18349)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-28 06:35:33 +08:00
Ismael Juma 3dba3125e9
KAFKA-18601: Assume a baseline of 3.3 for server protocol versions (#18845)
3.3.0 was the first KRaft release that was deemed production-ready and also
when KIP-778 (KRaft to KRaft upgrades) landed. Given that, it's reasonable
for 4.x to only support upgrades from 3.3.0 or newer (the metadata version also
needs to be set to "3.3" or newer before upgrading).

Noteworthy changes:
1. `AlterPartition` no longer includes topic names, which makes it possible to
simplify `AlterParitionManager` logic.
2. Metadata versions older than `IBP_3_3_IV3` have been removed and
`IBP_3_3_IV3` is now the minimum version.
3. `MINIMUM_BOOTSTRAP_VERSION` has been removed.
4. Removed `isLeaderRecoverySupported`, `isNoOpsRecordSupported`,
`isKRaftSupported`, `isBrokerRegistrationChangeRecordSupported` and
`isInControlledShutdownStateSupported` - these are always `true` now.
Also removed related conditional code.
5. Removed default metadata version or metadata version fallbacks in
multiple places - we now fail-fast instead of potentially using an incorrect
metadata version.
6. Update `MetadataBatchLoader.resetToImage` to set `hasSeenRecord`
based on whether image is empty - this was a previously existing issue that
became more apparent after the changes in this PR.
7. Remove `ibp` parameter from `BootstrapDirectory`
8. A number of tests were not useful anymore and have been removed.

I will update the upgrade notes via a separate PR as there are a few things that
need changing and it would be easier to do so that way.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Justine Olshan <jolshan@confluen.io>, Ken Huang <s7133700@gmail.com>
2025-02-19 05:35:42 -08:00
TaiJuWu 934b0159bb
KAFKA-18089: Upgrade Caffeine lib to 3.1.8 (#18004)
- Fixed the RemoteIndexCacheTest that fails with caffeine > 3.1.1

Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-02-18 21:51:38 +05:30
Mickael Maison 0a2fab9310
KAFKA-14484: Decouple UnifiedLog and RemoteLogManager (#18460)
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2025-02-18 15:10:31 +01:00
Kamal Chandraprakash da3643c6b4
KAFKA-18787: RemoteIndexCache fails to delete invalid files on init (#18888)
The stale/invalid files that ends-with ".deleted" and ".tmp" should be cleaned when the broker gets restarted.

- fix the remote-index-cache test to use the logDir instead of topicDir
- fix the flaky test

Reviewers: Luke Chen <showuon@gmail.com>
2025-02-18 12:56:03 +05:30
Mickael Maison ece91e9247
KAFKA-14484: Move UnifiedLog static methods to storage (#18039)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-11 09:55:32 +01:00
Kamal Chandraprakash cb9c6718fa
KAFKA-18722: Remove the unreferenced methods in TBRLMM and ConsumerManager (#18791)
Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>
2025-02-05 13:13:24 +00:00
Justine Olshan ab8ef87c7f
KAFKA-18654 [1/2]: Transaction Version 2 performance regression due to early return (#18720)
https://issues.apache.org/jira/browse/KAFKA-18575 solved a critical race condition by returning with CONCURRENT_TRANSACTIONS early when the transaction was still completing.
In testing, it was discovered that this early return could cause performance regressions.

Prior to KIP-890 the addpartitions call was a separate call from the producer. There was a previous change https://issues.apache.org/jira/browse/KAFKA-5477 that decreased the retry backoff to 20ms. With KIP-890 and making the call through the produce path, we go back to the default retry backoff which takes longer. Prior to 18575 we introduce a slight delay when sending to the coordinator, so prior to 18575, we are less likely to return quickly and get stuck in this backoff. However, based on results from produce benchmarks, we can still run into the default backoff in some scenarios.

This PR reverts KAFKA-18575, and doesn't return early and wait until the coordinator for checking if a transaction is ongoing. Instead, it will fix the handling with the verification guard so we don't hit the edge condition.

Also cleans up some of the verification text that was unclear.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Artem Livshits <alivshits@confluent.io>
2025-02-03 15:24:34 -08:00
Kamal Chandraprakash 87b536d5ec
MINOR: Remove the noisy log in consumer manager (#18787)
The statement gets logged in the INFO level and gets printed for every message produced to the __remote_log_metadata topic. Removed the log statement as it is needed only during debug session. And, we have another log at DEBUG level to capture this information.

Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>
2025-02-03 22:51:41 +05:30
Logan Zhu 04ea25b3c3
MINOR: Replace lambda expressions with method references in ProducerStateManager (#18753)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Divij Vaidya <diviv@amazon.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-03 10:16:22 +08:00
mingdaoy c23d4a0d73
KAFKA-18499 Clean up zookeeper from LogConfig (#18583)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-25 22:31:46 +08:00
Apoorv Mittal 70eab7778d
KAFKA-17894: Implemented broker topic metrics for Share Group 1/N (KIP-1103) (#18444)
The PR implements the BrokerTopicMetrics defined in KIP-1103.

The PR also corrected the share-acknowledgement-rate and share-acknowledgement-count metrics defined in KIP-932 as they are moved to BrokerTopicMetrics, necessary changes to KIP-932 broker metrics will be done once we complete KIP-1103.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>
2025-01-24 09:34:54 -08:00
David Arthur 8c0a0e07ce
KAFKA-17587 Refactor test infrastructure (#18602)
This patch reorganizes our test infrastructure into three Gradle modules:

":test-common:test-common-internal-api" is now a minimal dependency which exposes interfaces and annotations only. It has one project dependency on server-common to expose commonly used data classes (MetadataVersion, Feature, etc). Since this pulls in server-common, this module is Java 17+. It cannot be used by ":clients" or other Java 11 modules.

":test-common:test-common-util" includes the auto-quarantined JUnit extension. The @Flaky annotation has been moved here. Since this module has no project dependencies, we can add it to the Java 11 list so that ":clients" and others can utilize the @Flaky annotation

":test-common:test-common-runtime" now includes all of the test infrastructure code (TestKitNodes, etc). This module carries heavy dependencies (core, etc) and so it should not normally be included as a compile-time dependency.

In addition to this reorganization, this patch leverages JUnit SPI service discovery so that modules can utilize the integration test framework without depending on ":core". This will allow us to start moving integration tests out of core and into the appropriate sub-module. This is done by adding ":test-common:test-common-runtime" as a testRuntimeOnly dependency rather than as a testImplementation dependency. A trivial example was added to QuorumControllerTest to illustrate this.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-24 09:03:43 -05:00
qingbozhang a783dc69b9
MINOR: Replace Stream#collect with Stream#toList api from JDK 16 (#18655)
Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-23 12:15:41 +01:00
Pramithas Dhakal 27552e7b11
MINOR: Make `forceUnmap` method protected (#18638)
Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-23 11:39:45 +01:00
Dmitry Werner eb1a1fc7c8
MINOR: Remove recordVersion param from javadoc (#18603)
In commit af233f6e6dbc2b6ffa the RecordVersion parameter has been
removed, but it remains in the javadoc.

Reviewers: Ismael Juma
2025-01-21 18:35:13 -08:00