Commit Graph

16396 Commits

Author SHA1 Message Date
Nikita Shupletsov 28e7803037
KAFKA-19744: Move restore time calculation to ChangelogMetadata (#20613)
CI / build (push) Waiting to run Details
- Move restore time calculation to ChangelogMetadata.
- Introduced a new interface to propagate the calculated value to the
stores to avoid modifications in the public interface.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-10-02 21:24:36 -07:00
Mahsa Seifikar 8468317dac
KAFKA-19467; Add a metric for controller thread idleness (#20422)
CI / build (push) Waiting to run Details
This change adds the metric ControllerEventManager::AvgIdleRatio which
measures the amount of time the controller spends blocked waiting for
events vs the amount of time spent processing events. A value of 1.0
means that the controller spent the entire interval blocked waiting for
events.

Reviewers: José Armando García Sancio <jsancio@apache.org>, Kevin Wu
 <kevin.wu2412@gmail.com>, Alyssa Huang <ahuang@confluent.io>, TengYao
 Chi <frankvicky@apache.org>, Reviewers: Chia-Ping Tsai
 <chia7712@apache.org>
2025-10-02 14:02:47 -04:00
Chang-Yu Huang 33cd114375
KAFKA-9825 Kafka protocol BNF format should have some way to display tagged fields (#20536)
CI / build (push) Waiting to run Details
# Description
The [protocol guide](https://kafka.apache.org/protocol) 1) doesn't
display
tagged fields in BNF, and 2) includes empty tagged fields and redundant
nested tables in tables.

# Change
## BNF
Now tagged fields are displayed as FIELD_NAME<tag number>

Old:  <img width="1316" height="275" alt="Screenshot 2025-09-13 at 5 34
28 PM"

src="https://github.com/user-attachments/assets/c3e59382-7a6b-43f3-bc7a-893fb27d524d"
/>

New:  <img width="1386" height="328" alt="Screenshot 2025-09-24 at 12 50
34 PM"

src="https://github.com/user-attachments/assets/1ddbc95e-b0a7-4cd5-a5e0-e1303ffd2d06"
/>

Array Field: <img width="914" height="275" alt="Screenshot 2025-09-24 at
12 52 19 PM"

src="https://github.com/user-attachments/assets/cfe66a21-0d66-4f23-8e5d-1d5dac8e4c9b"
/>

## Table
Empty tagged fields are removed from the table.

Nested table for tagged fie  Old: <img width="805" height="506"
alt="Screenshot 2025-09-28 at 11 07 01 PM"

src="https://github.com/user-attachments/assets/0669c2f3-150c-479d-b6ff-1d2857540fef"
/>  lds are removed. Tag of the field is shown in the "Field" column.

New:   <img width="1371" height="727" alt="Screenshot 2025-09-28 at 11
10 30 PM"

src="https://github.com/user-attachments/assets/030abde6-60ec-4195-9778-da48ebd01084"
/>

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-10-02 11:01:53 +01:00
Andrew Schofield 7f65b1fa96
MINOR: Typo in ListShareGroupOffsetsResults javadoc (#20623)
CI / build (push) Waiting to run Details
Fixed a tiny javadoc typo.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
2025-10-01 20:19:50 +01:00
Jhen-Yung Hsu 0ddc69da70
KAFKA-19721: Update streams documentation with KIP-1147 changes (#20606)
Update KIP-1147 changes (renaming --property to --formatter-property) in
the ops and streams documentation.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-10-01 20:17:47 +01:00
Genseric Ghiro 7426629ba4
MINOR: Moving quota test to core directory (#20582)
## Summary

Quota test isn't testing anything on the client side, but rather
enforcing server-side quotas, so moving it out of the clients directory
into the core directory.

Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-10-01 11:51:45 -04:00
Logan Zhu 423330ebe7
KAFKA-19692 improve the docs of "clusterId" for AddRaftVoterOptions and RemoveRaftVoterOptions (#20555)
CI / build (push) Has been cancelled Details
Improves the documentation of the clusterId field in AddRaftVoterOptions
and RemoveRaftVoterOptions.

The changes include:
1. Adding Javadoc to both addRaftVoter and removeRaftVoter methods to
explain the behavior of the optional clusterId.
2. Integration tests have been added to verify the correct behavior of
add and remove voter operations with and without clusterId, including
scenarios with inconsistent cluster ids.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-09-30 21:17:41 +08:00
lorcan d1a821226c
KAFKA-15873: Filter topics before sorting (#19304)
CI / build (push) Waiting to run Details
Partially addresses KAFKA-15873. When filtering and sorting, we should
be applying the filter before the sort of topics. Order that
unauthorizedForDescribeTopicMetadata is added to not relevant as it is a
HashSet.

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Calvin Liu
 <caliu@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2025-09-29 09:02:32 -07:00
Lan Ding 1ebca7817b
KAFKA-19539: Kafka Streams should also purge internal topics based on user commit requests (#20234)
Repartition topic records should be purged up to the currently committed
offset once `repartition.purge.interval.ms` duration has passed.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-09-29 08:26:49 -07:00
Deep Golani 71c5a426b8
KAFKA-12506: Strengthen AdjustStreamThreadCountTest with stateful counting and higher throughput (#20540)
Add count store and output topic; produce 1,000 records across 50 keys
to better exercise concurrency.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-09-29 08:23:05 -07:00
YuChia Ma 92169b8f08
KAFKA-19357 AsyncConsumer#close hangs as commitAsync never completes when coordinator is missing (#19914)
Problem:  When AsyncConsumer is closing, CoordinatorRequestManager stops
looking for coordinator by returning EMPTY in poll() method when closing
flag is true. This prevents commitAsync() and other
coordinator-dependent operations from completing, causing close() to
hang until timeout.

Solution:
Modified the closing flag check in poll() method of
CommitRequestManager to be more targeted:
- When both coordinators are unknown and the consumer is closing, only
return EMPTY
- When this condition is met, proactively fail all pending commit
requests with CommitFailedException
- This allows coordinator lookup to continue when coordinator is
available during shutdown, while preventing indefinite hangs when
coordinator is unreachable

Reviewers: PoAn Yang <payang@apache.org>, Andrew Schofield
 <aschofield@confluent.io>, TengYao Chi <kitingiao@gmail.com>, Kirk True
 <kirk@kirktrue.pro>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Lan Ding
 <isDing_L@163.com>, TaiJuWu <tjwu1217@gmail.com>, Ken Huang
 <s7133700@gmail.com>, KuoChe <kuoche1712003@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-09-29 23:06:56 +08:00
Patrik Nagy 3c0843961b
KAFKA-19739 Upgrade commons-validator to 1.10.0 (#20601)
CI / build (push) Waiting to run Details
In [KAFKA-19359](https://issues.apache.org/jira/browse/KAFKA-19359), the
commons-beanutils transitive dependency was force bumped in the project
to avoid related CVEs. The commons-validator already has a new release,
which solves this problem:

https://github.com/apache/commons-validator/tags

The workaround could be deleted as part of the version bump.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-29 18:30:17 +08:00
Lan Ding c2aeec46a2
MINOR: Remove logContext arrtibute from StreamsGroup and CoordinatorRuntime (#20572)
CI / build (push) Waiting to run Details
The `logContext` attribute in `StreamsGroup` and `CoordinatorRuntime` is
not used anymore.  This patch removes it.

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-29 01:37:58 +08:00
Lan Ding e27ea8d4db
KAFKA-19702 Move MetadataVersionConfigValidator and related test code to metadata module (#20526)
1. Move `MetadataVersionConfigValidator` to metadata module.
2. Move `MetadataVersionConfigValidatorTest` to metadata module.
3. Remove `KafkaConfig#validateWithMetadataVersion`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-29 01:32:27 +08:00
Ken Huang 7d098cfbbd
KAFKA-17876/ KAFKA-19150 Rename AssignmentsManager and RemoteStorageThreadPool metrics (#20265)
Rename org.apache.kafka.server:type=AssignmentsManager and
org.apache.kafka.storage.internals.log.RemoteStorageThreadPool metrics
for the consist, these metrics should be

- `kafka.log.remote:type=...`
- `kafka.server:type=...`

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-29 01:24:38 +08:00
Lan Ding 60ad638a35
KAFKA-19617: ConsumerPerformance#ConsumerPerfRebListener get corrupted value when the number of partitions is increased (#20388)
With changes to the consumer protocol, rebalance may not necessarily
result in a "stop the world".  Thus, the method for calculating pause
time in `ConsumerPerformance#ConsumerPerfRebListener` needs to be
modified.

Stop time is only recorded if `assignedPartitions` is empty.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-09-28 16:50:25 +01:00
Sanskar Jhajharia d2a699954d
MINOR: Cleanup `toString` methods in Storage Module (#20432)
Getting rid of a bunch of `toString` functions in record classes in
Storage Module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-28 23:15:28 +08:00
Ken Huang 41611b4bd2
MINOR: Followup KAFKA-19112 document updated (#20492)
Some sections are not very clear, and we need to update the
documentation.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Jun Rao
 <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-28 19:06:06 +08:00
Lucas Brutschy fb0518c34e
KAFKA-19730: StreamsGroupDescribe result is missing topology (#20574)
When toology not configured.

In the streams group heartbeat, we validate the topology set for the
group against the topic metadata, to generate the "configured topology"
which has a specific number of partitions for each topic.

In streams group describe, we use the configured topology to expose this
information to the user. However, we leave the topology information as
null in the streams group describe response, if the topology is not
configured. This triggers an IllegalStateException in the admin client
implementation.

Instead, we should expose the unconfigured topology when the configured
topology is not available, which will still expose useful information.

Reviewers: Matthias J. Sax <matthias@confluent.io>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-26 19:40:47 +02:00
Sanskar Jhajharia ac495f9ef7
MINOR: Clean Javadoc for BrokerReconfigurable interface (#20593)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, TengYao Chi <frankvicky@apache.org>
2025-09-26 12:44:02 +02:00
Ken Huang 527467d053
KAFKA-18356: Explicitly set up instrumentation for inline mocking (Java 21+) (#18339)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2025-09-26 10:31:14 +02:00
Lianet Magrans 9e9d2a23ef
MINOR: fix flaky sys test for static membership (#20594)
Fixing flakiness seen on this test, where static consumers could not
join as expected after shutting down previous consumers with the same
instance ID, and logs showed `UnreleasedInstanceIdException`.

I expect the flakiness could happen if a consumer with instanceId1 is
closed but not effectively removed from the group due to leave group
fail/delayed (the leave group request is sent on a best effort, not
retried if fails or times out).

Fix by adding check to ensure the group is empty before attempting to
reuse the instance ID

Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-09-25 15:29:35 -04:00
Kevin Wu 857b1e92cc
KAFKA-19719: --no-initial-controllers should not assume kraft.version=1 (#20551)
Just because a controller node sets --no-initial-controllers flag does
not mean it is necessarily running kraft.version=1. The more precise
meaning is that the controller node being formatted does not know what
kraft version the cluster should be in, and therefore it is only safe to
assume kraft.version=0. Only by setting
--standalone,--initial-controllers, or --no-initial-controllers
AND not specifying the controller.quorum.voters static config, is it
known kraft.version > 0.

For example, it is a valid configuration (although confusing) to run a
static   quorum defined by controller.quorum.voters but have all the
controllers   format with --no-initial-controllers. In this case,
specifying --no-initial-controllers alongside a metadata version that
does not  support kraft.version=1 causes formatting to fail, which is
a  regression.

Additionally, the formatter should not check the kraft.version against
the release version, since kraft.version does not actually depend on any
release version. It should only check the kraft.version against the
static voters config/format arguments.

This PR also cleans up the integration test framework to match the
semantics of formatting an actual cluster.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Kuan-Po Tseng
 <brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, José
 Armando García Sancio <jsancio@apache.org>
2025-09-25 12:56:16 -04:00
Lan Ding f4e00e9cf0
MINOR: Remove unnecessary check in ReplicaManager (#20588)
`setTopics` is executed at before, so the check is unnecessary.

Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-09-25 23:52:46 +08:00
Sanskar Jhajharia 97c8c6b595
KAFKA-19733 Fix arguments to assertEquals() in clients module (#20586)
The given PR mostly fixes the order of arguments in `assertEquals()` for
the Clients module. Some minor cleanups were included with the same too.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-25 23:37:02 +08:00
Jinhe Zhang 14917ae727
MINOR: Handle envelope response in AutoTopicCreationManager (#20569)
In the create topic request we send a CreateTopic request in an
Envelope, so we need to also unpack the response correctly

Reviewers: Lucas Brutschy <lucasbru@apache.org>
2025-09-25 11:06:22 +02:00
Mickael Maison 444ceeb325
MINOR: Tidy up the Connect docs (#20531)
Remove invalid mentions of default values for group.id,
config.storage.topic, offset.storage.topic, status.storage.topic

Reviewers: Luke Chen <showuon@gmail.com>, Ken Huang <s7133700@gmail.com>
2025-09-25 09:39:37 +02:00
Maros Orsak 563338c0e9
MINOR: Refactor on DelegationTokenManager follow up with KAFKA-18711 (#20579)
Follow-up PR of KAFKA-18711. The motivation and reason for this change
are outlined in [1].

[1] - https://github.com/apache/kafka/pull/20475#discussion_r2375608168

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-25 02:25:44 +08:00
Lan Ding ac63ce9789
KAFKA-19544 Improve `MetadataVersion.fromVersionString()` to take an enableUnstableFeature flag (#20248)
Improve `MetadataVersion.fromVersionString()` to take an
`enableUnstableFeature` flag,   and enable `FeatureCommand` and
`StorageTool` to leverage the exception message from
`fromVersionString`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-25 01:06:54 +08:00
Shivsundar R 348e64c57d
MINOR: Add unit tests for verifying --formatter-property in console tools. (#20560)
*What*
In the implementation of KIP-1147 for console tools -

https://github.com/apache/kafka/pull/20479/files#diff-85b87c675a4b933e8e0e05c654d35d60e9cfd36cebe3331af825191b2cc688ee,
we missed adding unit tests for verifying the new
"`--formatter-property`" option.
Thanks to @Yunyung for pointing this out.

PR adds unit tests to both `ConsoleConsumerOptionsTest` and
`ConsoleShareConsumerOptionsTest` to verify the same.

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-09-25 00:15:05 +08:00
Dongnuo Lyu cbea4f69bd
KAFKA-19546: Rebalance should be triggered by subscription change during group protocol downgrade (#20417)
During online downgrade, when a static member using the consumer
protocol which is also the last member using the consumer protocol is
replaced by another static member using the classic protocol with the
same instance id, the latter will take the assignment of the former and
an online downgrade will be triggered.

In the current implementation, if the replacing static member has a
different subscription, no rebalance will be triggered when the
downgrade happens. The patch checks whether the static member has
changed subscription and triggers a rebalance when it does.

Reviewers: Sean Quah <squah@confluent.io>, David Jacot
 <djacot@confluent.io>
2025-09-24 15:40:45 +02:00
majialong 55020f909d
MINOR: Improve the documentation content for DistributedConfig (#20576)
1. Fix doc of `inter.worker.signature.algorithm` config in
`DistributedConfig`.
2. Improve the style of the `inter.worker.verification.algorithms` and
`worker.unsync.backoff.ms` config.
3. `INTER_WORKER_KEY_TTL_MS_MS_DOC` -> `INTER_WORKER_KEY_TTL_MS_DOC`.

Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-09-24 14:56:13 +02:00
Abhi Tiwari 1e4b8a1a6b
KAFKA-6333: java.awt.headless should not be on commandline (#20044)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-09-24 14:52:27 +02:00
Maros Orsak 486b991f22
KAFKA-18711 Move DelegationTokenPublisher to metadata module (#20475)
Basically, one of the refactor tasks. In this PR, I have moved
`DelegationTokenPublisher` to the metadata module. Similar to the
`ScramPublisher` migration (commit feee50f476), I have moved
`DelegationTokenManager` to the server-common module, as it would
otherwise create a circular dependency. Moreover, I have made multiple
changes throughout the codebase to reference `DelegationTokenManager`
from server-common instead of the server module.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-09-24 20:19:08 +08:00
Ken Huang 8036e49a6e
KAFKA-17554 Flaky testFutureCompletionOutsidePoll in ConsumerNetworkClientTest (#18298)
Jira: https://issues.apache.org/jira/browse/KAFKA-17554

In the previous workflow, the test passes under two conditions:

1. The `t1` thread is waiting for the main thread's `client.wakeup()`.
If successful, `t1` will wake up `t2`, allowing `t2` to complete the
future.
2. If `t1` fails to receive the `client.wakeup()` from the main thread,
`t2` will be woken up by the main thread.

In the previous implementation, we used a `CountDownLatch` to control
the execution of three threads, but it often led to race conditions.
Currently, we have modified it to use two threads to test this scenario.

I run `I=0;  while ./gradlew :clients:test --tests
ConsumerNetworkClientTest.testFutureCompletionOutsidePoll --rerun
--fail-fast; do  (( I=$I+1 )); echo "Completed run: $I"; sleep 1; done`
and pass 3000+ times.

![image](https://github.com/user-attachments/assets/3b8d804e-fbe0-4030-8686-4960fc717d07)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-24 18:42:25 +08:00
Lucas Brutschy 1f7631c8c6
MINOR: Fix StreamsRebalanceListenerInvoker (#20575)
StreamsRebalanceListenerInvoker was implemented to match the behavior of
ConsumerRebalanceListenerInvoker, however StreamsRebalanceListener has a
subtly different interface than ConsumerRebalanceListener - it does not
throw exceptions, but returns it as an Optional.

In the interest of consistency, this change fixes this mismatch by
changing the StreamsRebalanceListener interface to behave more like the
ConsumerRebalanceListener - throwing exceptions directly.

In another minor fix, the StreamsRebalanceListenerInvoker is changed to
simply skip callback execution instead of throwing an
IllegalStateException when no streamRebalanceListener is defined. This
can happen when the consumer is closed before Consumer.subscribe is
called.

Reviewers: Lianet Magrans <lmagrans@confluent.io>, Matthias J. Sax
 <matthias@confluent.io>
2025-09-24 09:03:07 +02:00
Ritika Reddy 0a483618b9
KAFKA-19690-Add epoch check before verification guard check to prevent unexpected fatal error (#20534)
We are seeing cases where a Kafka Streams (KS) thread stalls for ~20
seconds. During this stall, the broker correctly aborts the open
transaction (triggered by the 10-second transaction timeout).   However,
when the KS thread resumes, instead of receiving the expected
InvalidProducerEpochException (which we already handle gracefully as
part of transaction abort), the client is instead hit with an
InvalidTxnStateException. KS currently treats this as a fatal error,
causing the application to fail.

To fix this, we've added an epoch check before the verification check to
send the recoverable  InvalidProducerEpochException instead of the fatal
InvalidTxnStateException. This helps safeguard both tv1 and tv2 clients

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-09-23 13:45:42 -07:00
ally heev dbe9d34e47
KAFKA-19624: Improve consistency of command-line arguments for consumer performance tests (#20385)
resolves https://issues.apache.org/jira/browse/KAFKA-19624

Reviewers: @brandboat, @AndrewJSchofield, @m1a2st
2025-09-23 10:01:40 +01:00
Matthias J. Sax 71efb89290
MINOR: fix incorrect offset reset logging (#20558)
We need to only pass in the reset strategy, as the `logMessage`
parameter was removed.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Lucas Brutschy
 <lbrutschy@confluent.io>
2025-09-22 18:54:50 +02:00
Uladzislau Blok f16d1f3c9d
KAFKA-19299: Fix race condition in RemoteIndexCacheTest (#19927)
This MR should be couple of race conditions in RemoteIndexCacheTest.

1. There was a race condition between cache-cleanup-thread and test
thread, which wants to check that cache is gone. This was fixed with
TestUtils#waitForCondition
2. After each test we check that there is not thread leak. This check
wasn't working properly, because live of thread status is set by JVM
level, we can only set interrupted status (using private native void
interrupt0(); method under the hood), but we don't really know when JVM
will change the live status of thread. To fix this I've refactored
TestUtils#assertNoLeakedThreadsWithNameAndDaemonStatus method to use
TestUtils#waitForCondition. This fix should also affect few other tests,
which were flaky because of this check. See gradle run on

[develocity](https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Europe%2FLondon&tests.container=org.apache.kafka.storage.internals.log.RemoteIndexCacheTest&tests.sortField=FLAKY)

After fix test were run 10000 times with repeated test annotation:

`./gradlew clean storage:test --tests

org.apache.kafka.storage.internals.log.RemoteIndexCacheTest.testCacheEntryIsDeletedOnRemoval`
...  `Gradle Test Run :storage:test > Gradle Test Executor 20 >
RemoteIndexCacheTest > testCacheEntryIsDeletedOnRemoval() > repetition
9998 of 10000 PASSED`  `Gradle Test Run :storage:test > Gradle Test
Executor 20 > RemoteIndexCacheTest > testCacheEntryIsDeletedOnRemoval()
> repetition 9999 of 10000 PASSED`  `Gradle Test Run :storage:test >
Gradle Test Executor 20 > RemoteIndexCacheTest >
testCacheEntryIsDeletedOnRemoval() > repetition 10000 of 10000 PASSED`
`BUILD SUCCESSFUL in 20m 9s`  `148 actionable tasks: 148 executed`

Reviewers: Lianet Magrans <lmagrans@confluent.io>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-09-22 11:20:14 -04:00
Ken Huang da6a562f6d
KAFKA-17834: Improvements to Dockerfile (#17554)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-22 17:04:42 +02:00
Ken Huang a0640f9517
KAFKA-18351: Remove top-level version field from docker-compose.yml files (#18322)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Sylwester Lachiewicz <slachiewicz@apache.org>, Dávid Szigecsán
2025-09-22 16:57:30 +02:00
Ken Huang 01fccd3513
KAFKA-15186 AppInfo metrics don't contain the client-id (#20493)
All Kafka component register AppInfo metrics to track the application
start time or commit-id etc. These metrics are useful for monitoring and
debugging. However, the AppInfo doesn't provide client-id, which is an
important information for custom metrics reporter.

The AppInfoParser class registers a JMX MBean with the provided
client-id, but when it adds metrics to the Metrics registry, the
client-id is not included. This KIP aims to add the client-id as a tag.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-21 16:28:03 +08:00
Ken Huang 07b786e5bf
KAFKA-19681 Improve MetadataShell tool by skipping missing children and removing zkMigrationState (#20504)
The current `metadata-shell` find command throws an exception due to
child node `zkMigrationState`.
This interrupts the output and makes the CLI less usable.

Additionally, `zkMigrationState` is no longer used in Kafka 4.x, but it
is still defined under image/features, which causes misleading error
messages.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-21 12:35:46 +08:00
Chang-Chi Hsu 5919762009
MINOR: Remove exitMessage.set() call in TopicBasedRemoteLogMetadataManagerTest (#20563)
- **Reasons:** In this case, the `exit(int statusCode)` method invokes
`exit(statusCode, null)`, which means
the `message` argument is always `null` in this code path. As a result,
assigning
`exitMessage` has no effect and can be safely removed.

- **Changes:** Remove a redundant field assignment.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-20 18:04:10 +08:00
Now c49ab6b4ae
MINOR: Optimize map lookup efficiency with getOrDefault (#20229)
Optimized `getRemainingRecords()` method by replacing inefficient
`containsKey() + get()` pattern with `getOrDefault()` to reduce map
lookups from 2 to 1 per partition.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-20 11:52:29 +08:00
Jhen-Yung Hsu 57e9f98e15
KAFKA-19644 Enhance the documentation for producer headers and integration tests (#20524)
- Improve the docs for Record Headers.
- Add integration tests to verify that the order of headers in a record
is preserved when producing and consuming.
- Add unit tests for RecordHeaders.java.

Reviewers: Ken Huang <s7133700@gmail.com>, Hong-Yi Chen
 <apalan60@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-20 11:46:19 +08:00
Lianet Magrans 848e3d0092
KAFKA-19722: Adding missing metric assigned-partitions for new consumer (#20557)
Adding the missing metric to track the number of partitions assigned.
This metric should be registered whenever the consumer is using a
groupId, and should track the number of partitions from the subscription
state, regardless of the subscription type (manual or automatic).

This PR registers the missing metric as part of the
ConsumerRebalanceMetricsManager setup. This manager is created if there
is a group ID, and reused by the consumer membershipMgr and the
streamsMemberhipMgr, so we ensure that the metric is registered for the
new consumer and streams.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, TengYao Chi
 <frankvicky@apache.org>
2025-09-19 12:42:43 -04:00
Lan Ding cfa0b416ef
MINOR: Remove metrics attribute from StreamsGroup (#20559)
The `metrics` attribute in `StreamsGroup` is not used anymore. This
patch removes it.

Reviewers: Ken Huang <s7133700@gmail.com>, Lucas Brutschy
 <lbrutschy@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-19 16:32:41 +08:00
Sean Quah d067c6c040
KAFKA-19716: Clear out coordinator snapshots periodically while loading (#20547)
When nested Timeline collections are created and discarded while loading
a coordinator partition, references to them accumulate in the current
snapshot. Allow the GC to reclaim them by starting a new snapshot and
discarding previous snapshots every 16,384 records.

Small intervals degrade loading times for non-transactional offset
commit workloads while large intervals degrade loading times for
transactional workloads. A default of 16,384 was chosen as a compromise.

Also add a benchmark for group coordinator loading.

Reviewers: David Jacot <djacot@confluent.io>
2025-09-19 09:44:07 +02:00