Commit Graph

495 Commits

Author SHA1 Message Date
Kevin Wu 012e4ca6d8
KAFKA-19719 --no-initial-controllers should not assume kraft.version=1 (#20604)
CI / build (push) Has been cancelled Details
```
commit ec37eb538b (HEAD ->
KAFKA-19719-cherry-pick-41, origin/KAFKA-19719-cherry-pick-41)
Author: Kevin Wu <kevin.wu2412@gmail.com>
Date:   Thu Sep 25 11:56:16 2025 -0500

    KAFKA-19719: --no-initial-controllers should not assume
kraft.version=1 (#20551)

    Just because a controller node sets --no-initial-controllers flag
does     not mean it is necessarily running kraft.version=1. The more
precise     meaning is that the controller node being formatted does not
know what     kraft version the cluster should be in, and therefore it
is only safe to     assume kraft.version=0. Only by setting
--standalone,--initial-controllers, or --no-initial-controllers     AND
not specifying the controller.quorum.voters static config, is it
known kraft.version > 0.

    For example, it is a valid configuration (although confusing) to run
a     static   quorum defined by controller.quorum.voters but have all
the     controllers   format with --no-initial-controllers. In this
case,     specifying --no-initial-controllers alongside a metadata
version that     does not  support kraft.version=1 causes formatting to
fail, which is     does not  support kraft.version=1 causes formatting
to fail, which is     a  regression.

    Additionally, the formatter should not check the kraft.version
against     the release version, since kraft.version does not actually
depend on any     release version. It should only check the
kraft.version against the     static voters config/format arguments.

    This PR also cleans up the integration test framework to match the
semantics of formatting an actual cluster.

    Reviewers: TengYao Chi <kitingiao@gmail.com>, Kuan-Po Tseng
<brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, José Armando
García Sancio <jsancio@apache.org>      Conflicts:
core/src/main/scala/kafka/tools/StorageTool.scala Minor conflicts. Keep
changes from cherry-pick.
core/src/test/java/kafka/server/ReconfigurableQuorumIntegrationTest.java
Remove auto-join tests, since 4.1 does not support it. docs/ops.html
Keep docs section from cherry-pick.
metadata/src/test/java/org/apache/kafka/metadata/storage/FormatterTest.java
Minor conflicts. Keep cherry-picked changes.
test-common/test-common-runtime/src/main/java/org/apache/kafka/common/test/KafkaClusterTestKit.java
Conflicts due to integration test framework changes. Keep new changes.

commit 02d58b176c (upstream/4.1)
```

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-30 22:30:23 +08:00
Luke Chen cdc7a4e2b7 MINOR: improve the min.insync.replicas doc (#20237)
Along with the change: https://github.com/apache/kafka/pull/17952

([KIP-966](https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas)),
the semantics of `min.insync.replicas` config has small change, and add
some constraints. We should document them clearly.

Reviewers: Jun Rao <junrao@gmail.com>, Calvin Liu <caliu@confluent.io>,
 Mickael Maison <mickael.maison@gmail.com>, Paolo Patierno
 <ppatierno@live.com>, Federico Valeri <fedevaleri@gmail.com>, Chia-Ping
 Tsai <chia7712@gmail.com>
2025-08-05 00:27:04 +08:00
Calvin Liu e4e2dce2eb KAFKA-19522: avoid electing fenced lastKnownLeader (#20200)
CI / build (push) Waiting to run Details
This patch fixes the bug that allows the last known leader to be elected as a partition leader while still in a fenced state, before the next heartbeat removes the fence.
https://issues.apache.org/jira/browse/KAFKA-19522

Reviewers: Jun Rao <junrao@gmail.com>, TengYao Chi
<frankvicky@apache.org>
2025-07-20 16:55:45 +08:00
Calvin Liu 98cb8df7a5
MINOR: Bump LATEST_PRODUCTION to 4.1IV1 and Use MV to enable ELR (#20174)
CI / build (push) Waiting to run Details
Removing the isEligibleLeaderReplicasV1Enabled to let ELR be enabled if
MV is at least 4.1IV1. Also bump the Latest Prod MV to 4.1IV1

Reviewers: Jun Rao <junrao@gmail.com>
2025-07-15 20:23:53 -07:00
Calvin Liu b80aa15c17
KAFKA-19383: Handle the deleted topics when applying ClearElrRecord (#20033)
https://issues.apache.org/jira/browse/KAFKA-19383 When applying the
ClearElrRecord, it may pick up the topicId in the image without checking
if the topic has been deleted. This can cause the creation of a new
TopicRecord with an old topic ID.

Reviewers: Alyssa Huang <ahuang@confluent.io>, Artem Livshits <alivshits@confluent.io>, Colin P. McCabe <cmccabe@apache.org>

No conflicts.
2025-06-24 17:04:45 -07:00
Alyssa Huang 3f0ae7fd53 KAFKA-19411: Fix deleteAcls bug which allows more deletions than max records per user op (#19974)
If there are more deletion filters after we initially hit the
`MAX_RECORDS_PER_USER_OP` bound, we will add an additional deletion
record ontop of that for each additional filter.

The current error message returned to the client is not useful either,
adding logic so client doesn't just get `UNKNOWN_SERVER_EXCEPTION` with
no details returned.
2025-06-24 15:51:30 -07:00
José Armando García Sancio 88eced0c0f KAFKA-14145; Faster KRaft HWM replication (#19800)
This change compares the remote replica's HWM with the leader's HWM and
completes the FETCH request if the remote HWM is less than the leader's
HWM. When the leader's HWM is updated any pending FETCH RPC is
completed.

Reviewers: Alyssa Huang <ahuang@confluent.io>, David Arthur
 <mumrah@gmail.com>, Andrew Schofield <aschofield@confluent.io>
(cherry picked from commit 742b327025)
2025-06-17 13:27:11 -04:00
Hong-Yi Chen aaed164be6
MINOR: Refactor brokerContactTimesMs and brokerRegistrationStates to use Long and Integer (#19888)
This PR simplifies two ConcurrentHashMap fields by removing their Atomic
wrappers:

- Change `brokerContactTimesMs` from `ConcurrentHashMap<Integer,
AtomicLong>` to `ConcurrentHashMap<Integer, Long>`.

- Change `brokerRegistrationStates` from `ConcurrentHashMap<Integer,
AtomicInteger>` to `ConcurrentHashMap<Integer, Integer>`.

This removes mutable holders without affecting thread safety (see
discussion in #19828).

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, TengYao Chi
<frankvicky@apache.org>, Kevin Wu <kevin.wu2412@gmail.com>, Ken Huang
<s7133700@gmail.com>
2025-06-06 16:03:23 +08:00
David Arthur 70b672b808
KAFKA-19347 Deduplicate ACLs when creating (#19898)
In #19840, we broke de-duplication during ACL creation. This patch fixes
that and adds a test to cover this case.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2025-06-04 12:38:54 -04:00
Hong-Yi Chen 8b49130b92
KAFKA-19355 Remove interBrokerListenerName from ClusterControlManager (#19866)
CI / build (push) Waiting to run Details
Following the removal of the ZK-to-KRaft migration code in commit
85bfdf4, controller-to-broker communication is now handled by the
control-plane listener (`controller.listener.names`). The
`interBrokerListenerName` parameter in `ClusterControlManager` is no
longer referenced on the controller side and can be safely removed as
dead code.

Reviewers: Lan Ding <isDing_L@163.com>, Ken Huang <s7133700@gmail.com>,
Chia-Ping Tsai <chia7712@gmail.com>
2025-06-02 01:18:15 +08:00
Kevin Wu 8731c96122
MINOR: fixing updateBrokerContactTime (#19828)
Fix `updateBrokerContactTime` so that existing brokers still have their
contact time updated when they are already tracked. Also, update the
unit test to test this case.

Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Yung
 <yungyung7654321@gmail.com>, TengYao Chi <frankvicky@apache.org>, Ken
 Huang <s7133700@gmail.com>
2025-05-29 11:58:09 +08:00
David Arthur 9dd4cff2d7
KAFKA-19347 Don't update timeline data structures in createAcls (#19840)
CI / build (push) Waiting to run Details
This patch fixes a problem in AclControlManager where we are updating
the timeline data structures prematurely.

Reviewers: Alyssa Huang <ahuang@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, Andrew Schofield <aschofield@confluent.io>,
2025-05-28 09:40:19 -07:00
Ken Huang bcda92b5b9
KAFKA-19080 The constraint on segment.ms is not enforced at topic level (#19371)
CI / build (push) Waiting to run Details
The main issue was that we forgot to set
`TopicConfig.SEGMENT_BYTES_CONFIG` to at least `1024 * 1024`, which
caused problems in tests with small segment sizes.

To address this, we introduced a new internal config:
`LogConfig.INTERNAL_SEGMENT_BYTES_CONFIG`, allowing us to set smaller
segment bytes specifically for testing purposes.

We also updated the logic so that if a user configures the topic-level
segment bytes without explicitly setting the internal config, the
internal value will no longer be returned to the user.

In addition, we removed
`MetadataLogConfig#METADATA_LOG_SEGMENT_MIN_BYTES_CONFIG` and added
three new internal configurations:
- `INTERNAL_MAX_BATCH_SIZE_IN_BYTES_CONFIG`
- `INTERNAL_MAX_FETCH_SIZE_IN_BYTES_CONFIG`
- `INTERNAL_DELETE_DELAY_MILLIS_CONFIG`

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-05-25 20:57:22 +08:00
jimmy b44bfca408
KAFKA-16717 [2/N]: Add AdminClient.alterShareGroupOffsets (#18929)
[KAFKA-16720](https://issues.apache.org/jira/browse/KAFKA-16720) aims to
finish the AlterShareGroupOffsets RPC.

Reviewers: Andrew Schofield <aschofield@confluent.io>

---------

Co-authored-by: jimmy <wangzhiwang@qq.com>
2025-05-23 09:05:48 +01:00
Jing-Jia Hung bc797b077f
KAFKA-19314 Remove unnecessary code of closing snapshotWriter (#19763)
- Remove redundant close of `snapshotWriter`.
- `snapshotWriter` is already closed by `RaftSnapshotWriter#writer`.

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Ken Huang
 <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-05-23 03:57:14 +08:00
Kevin Wu 37963256d1
KAFKA-18666: Controller-side monitoring for broker shutdown and startup (#19586)
CI / build (push) Waiting to run Details
This PR introduces the following per-broker metrics:

-`kafka.controller:type=KafkaController,name=BrokerRegistrationState,broker=X`

-`kafka.controller:type=KafkaController,name=TimeSinceLastHeartbeatReceivedMs,broker=X`

and this metric:

`kafka.controller:type=KafkaController,name=ControlledShutdownBrokerCount`

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2025-05-14 10:59:47 -07:00
Bolin Lin 6eafe407bd
MINOR: Fix unchecked type warnings in several test classes (#19679)
* In ConsoleShareConsumerTest, add `@SuppressWarnings("unchecked")`
annotation in method shouldUpgradeDeliveryCount
* In ListConsumerGroupOffsetsHandlerTest, add generic parameters to
HashSet constructors
* In TopicsImageTest, add explicit generic type to Collections.EMPTY_MAP
to fix raw type usage

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-05-13 14:59:22 +08:00
Nick Guo 707a44a6cb
KAFKA-19068 Eliminate the duplicate type check in creating ControlRecord (#19346)
CI / build (push) Waiting to run Details
jira: https://issues.apache.org/jira/browse/KAFKA-19068

`RecordsIterator#decodeControlRecord` do the type check and then
`ControlRecord` constructor does that again.

we should add a static method to ControlRecord to create `ControlRecord`
with type check, and then `ControlRecord` constructor should be changed
to private to ensure all instance is created by the static method.

Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-05-11 00:07:00 +08:00
Calvin Liu 3094ce2c20
KAFKA-19212: Correct the unclean leader election metric calculation (#19590)
CI / build (push) Waiting to run Details
The current ElectionWasClean checks if the new leader is in the previous
ISR. However, there is a corner case in the partition reassignment.  The
partition reassignment can change the partition replicas. If the new
preferred leader (the first one in the new replicas) is the last one to
join ISR, this preferred leader will be elected in the same partition
change.

For example:  In the previous state, the partition is  Leader: 0,
Replicas (2,1,0), ISR (1,0), Adding(2), removing(0).  Then replica 2
joins the ISR. The new partition would be like:  Leader: 2, Replicas
(2,1), ISR(1,2).  The new leader 2 is not in the previous ISR (1,0) but
it is still a clean election.

Reviewers: Jun Rao <junrao@gmail.com>
2025-05-07 13:26:53 -07:00
Kevin Wu 6cb6aa2030
MINOR; Add `--standalone --ignore-formatted` formatter test (#19643)
CI / build (push) Waiting to run Details
This PR adds an additional test case to `FormatterTest` that checks that
formatting with `--standalone` and then formatting again with
`--standalone --ignore-formatted` is indeed a no-op.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2025-05-07 10:41:18 -04:00
José Armando García Sancio 2df14b1190
MINOR; Log message for unexpected buffer allocation (#19596)
Log a message when reading a batch that is larger than the currently
allocated batch.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, PoAn Yang
 <payang@apache.org>
2025-05-06 12:01:49 -04:00
yunchi bff5ba4ad9
MINOR: replace .stream().forEach() with .forEach() (#19626)
CI / build (push) Waiting to run Details
replace all applicable `.stream().forEach()` in codebase with just
`.forEach()`.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-05-04 20:39:55 +08:00
YuChia Ma 979f49f967
KAFKA-19146 Merge OffsetAndEpoch from raft to server-common (#19475)
CI / build (push) Waiting to run Details
1. remove org.apache.kafka.raft.OffsetAndEpoch
2. rewrite org.apache.kafka.server.common.OffsetAndEpoch by record
keyword
3. rename OffsetAndEpoch#leaderEpoch to OffsetAndEpoch#epoch

Reviewers: PoAn Yang <payang@apache.org>, Xuan-Zhang Gong
 <gongxuanzhangmelt@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-05-02 03:12:52 +08:00
Andrew Schofield 2022b4c480
KAFKA-16894 Correct definition of ShareVersion (#19606)
The ShareVersion feature does not make any metadata version changes. As
a result, `SV_1` does not depend on any MV level, and no MV needs to be
defined for the preview of KIP-932.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-05-02 01:30:00 +08:00
Matthias J. Sax b0a26bc2f4
KAFKA-19173: Add `Feature` for "streams" group (#19509)
Add new StreamsGroupFeature, disabled by default,  and add "streams" as
default value to `group.coordinator.rebalance.protocols`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot
<david.jacot@gmail.com>, Lucas Brutschy <lbrutschy@confluent.io>,
Justine Olshan <jolshan@confluent.io>, Andrew Schofield
<aschofield@confluent.io>, Jun Rao <jun@confluent.io>
2025-04-29 22:51:10 -07:00
PoAn Yang 81881dee83
KAFKA-18760: Deprecate Optional<String> and return String from public Endpoint#listener (#19191)
* Deprecate org.apache.kafka.common.Endpoint#listenerName.
* Add org.apache.kafka.common.Endpoint#listener to replace
org.apache.kafka.common.Endpoint#listenerName.
* Replace org.apache.kafka.network.EndPoint with
org.apache.kafka.common.Endpoint.
* Deprecate org.apache.kafka.clients.admin.RaftVoterEndpoint#name
* Add org.apache.kafka.clients.admin.RaftVoterEndpoint#listener to
replace org.apache.kafka.clients.admin.RaftVoterEndpoint#name

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, TaiJuWu
 <tjwu1217@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao
 Chi <frankvicky@apache.org>, Ken Huang <s7133700@gmail.com>, Bagda
 Parth  , Kuan-Po Tseng <brandboat@gmail.com>

---------

Signed-off-by: PoAn Yang <payang@apache.org>
2025-04-30 12:15:33 +08:00
Ken Huang 676e0f2ad6
KAFKA-19139 Plugin#wrapInstance should use LinkedHashMap instead of Map (#19519)
CI / build (push) Waiting to run Details
There will be an update to the PluginMetrics#metricName method: the type
of the tags parameter will be changed
from Map to LinkedHashMap.
This change is necessary because the order of metric tags is important
1. If the tag order is inconsistent, identical metrics may be treated as
distinct ones by the metrics backend
2. KAFKA-18390 is updating metric naming to use LinkedHashMap. For
consistency, we should follow the same approach here.

Reviewers: TengYao Chi <frankvicky@apache.org>, Jhen-Yung Hsu
 <jhenyunghsu@gmail.com>, lllilllilllilili
2025-04-30 10:43:01 +08:00
Colin Patrick McCabe 22b89b6413
KAFKA-19192; Old bootstrap checkpoint files cause problems updated servers (#19545)
Old bootstrap.metadata files cause problems with server that include
KAFKA-18601. When the server tries to read the bootstrap.checkpoint
file, it will fail if the metadata.version is older than 3.3-IV3
(feature level 7). This causes problems when these clusters are
upgraded.

This PR makes it possible to represent older MVs in BootstrapMetadata
objects without causing an exception. An exception is thrown only if we
attempt to access the BootstrapMetadata. This ensures that only the code
path in which we start with an empty metadata log checks that the
metadata version is 7 or newer.

Reviewers: José Armando García Sancio <jsancio@apache.org>, Ismael Juma
 <ismael@juma.me.uk>, PoAn Yang <payang@apache.org>, Liu Zeyu
 <zeyu.luke@gmail.com>, Alyssa Huang <ahuang@confluent.io>
2025-04-24 15:43:35 -04:00
José Armando García Sancio b97a130c08
KAFKA-16538; Enable upgrading kraft version for existing clusters (#19416)
This change implements upgrading the kraft version from 0 to 1 in existing clusters.
Previously, clusters were formatted with either version 0 or version 1, and could not
be moved between them.

The kraft version for the cluster metadata partition is recorded using the
KRaftVersion control record. If there is no KRaftVersion control record
the default kraft version is 0.

The kraft version is upgraded using the UpdateFeatures RPC. These RPCs
are handled by the QuorumController and FeatureControlManager. This
change adds special handling in the FeatureControlManager so that
upgrades to the kraft.version are directed to
RaftClient#upgradeKRaftVersion.

To allow the FeatureControlManager to call
RaftClient#upgradeKRaftVersion is a non-blocking fashion, the kraft
version upgrade uses optimistic locking. The call to
RaftClient#upgradeKRaftVersion does validations of the version change.
If the validations succeeds, it generates the necessary control records
and adds them to the BatchAccumulator.

Before the kraft version can be upgraded to version 1, all of the
brokers and controllers in the cluster need to support kraft version 1.
The check that all brokers support kraft version 1 is done by the
FeatureControlManager. The check that all of the controllers support
kraft version is done by KafkaRaftClient and LeaderState.

When the kraft version is 0, the kraft leader starts by assuming that
all voters do not support kraft version 1. The leader discovers which
voters support kraft version 1 through the UpdateRaftVoter RPC. The
KRaft leader handles UpdateRaftVoter RPCs by storing the updated
information in-memory until the kraft version is upgraded to version 1.
This state is stored in LeaderState and contains the latest directory
id, endpoints and supported kraft version for each voter.

Only when the KRaft leader has received an UpdateRaftVoter RPC from all
of the voters will it allow the upgrade from kraft.version 0 to 1.

Reviewers: Alyssa Huang <ahuang@confluent.io>, Colin P. McCabe <cmccabe@apache.org>
2025-04-22 16:02:51 -07:00
Colin Patrick McCabe c465abc458
KAFKA-19130: Do not add fenced brokers to BrokerRegistrationTracker on startup (#19454)
When the controller starts up (or becomes active after being inactive), we add all of the registered brokers to BrokerRegistrationTracker so that they will not be accidentally fenced the next time we are looking for a broker to fence. We do this because the state in BrokerRegistrationTracker is "soft state" (it doesn't appear in the metadata log), and the newly active controller starts off with no soft state. (Its soft state will be populated by the brokers sending heartbeat requests to it over time.)

In the case of fenced brokers, we are not worried about accidentally fencing the broker due to it being missing from
BrokerRegistrationTracker for a while (it's already fenced). Therefore, it should be reasonable to just not add fenced brokers to the tracker initially.

One case where this change will have a positive impact is for people running single-node demonstration clusters in combined KRaft mode. In that case, when the single-node cluster is taken down and restarted, it currently will have to wait about 9 seconds for the broker to come up and re-register. With this change, the broker should be able to re-register immediately (assuming the previous shutdown happened cleanly through controller shutdown.)

One possible negative impact is that if there is a controller failover, it will open a small window where a broker with the same ID as a fenced broker could re-register. However, our detection of duplicate broker IDs is best-effort (and duplicate broker IDs are an administrative mistake), so this downside seems acceptable.

Reviewers: Alyssa Huang <ahuang@confluent.io>, José Armando García Sancio <jsancio@apache.org>
2025-04-16 11:57:35 -07:00
Mickael Maison fb2ce76b49
KAFKA-18888: Add KIP-877 support to Authorizer (#19050)
This also adds metrics to StandardAuthorizer

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ken Huang
 <s7133700@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TaiJuWu
 <tjwu1217@gmail.com>
2025-04-15 19:40:24 +02:00
PoAn Yang 42771b6144
KAFKA-18845 Remove flaky tag on QuorumControllerTest#testUncleanShutdownBrokerElrEnabled (#19403)
It has been around two weeks since fixing
QuorumControllerTest#testUncleanShutdownBrokerElrEnabled PR
https://github.com/apache/kafka/pull/19240 was merged. There is no flaky
result after 2025/03/21, so it has enough evidence to prove the flaky is
fixed. It's good to remove flaky tag.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-04-15 23:53:08 +08:00
Xuan-Zhang Gong 17ab374bb5
KAFKA-15371 MetadataShell is stuck when bootstrapping (#19419)
issue link https://issues.apache.org/jira/browse/KAFKA-15371

## conclusion

This issue isn’t caused by differences between the `log` file and the
`checkpoint` file, but rather by the order in which asynchronous events
occur.

## reliably reproduce
In the current version, you can reliably reproduce this issue by adding
a small sleep in `SnapshotFileReader#handleNextBatch` , like this:
```
 private void handleNextBatch() {
        if (!batchIterator.hasNext()) {
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
            beginShutdown("done");
            return;
        }
        FileChannelRecordBatch batch = batchIterator.next();
        if (batch.isControlBatch()) {
            handleControlBatch(batch);
        } else {
            handleMetadataBatch(batch);
        }
        scheduleHandleNextBatch();
        lastOffset = batch.lastOffset();
    }
```

you can download a test file [test checkpoint

file](https://github.com/user-attachments/files/19659636/00000000000000007169-0000000001.checkpoint.log)

⚠️: Please remove the .log extension after downloading, since GitHub
doesn’t allow uploading checkpoint files directly.

After change code  and gradle build ,  you can run
`bin/kafka-metadata-shell.sh --snapshot   ${your file path}`

You will only see a loading message in the console like this:  <img
width="248" alt="image"

src="https://github.com/user-attachments/assets/fe4b4eba-7a6a-4cee-9b56-c82a5fa02c89"
/>

## Cause of the Bug
After the `SnapshotFileReader startup`, it will enqueue the iterator’s
events to its own kafkaQueue.
The impontent method is: `SnapshotFileReader#scheduleHandleNextBatch`

When processing each batch of the iterator, it adds metadata events for
the batch to the kafkaQueue(different from the SnapshotFileReader.) of
the metadataLoader.  The impontent method is
`SnapshotFileReader#handleMetadataBatch` and
`MetadataLoader#handleCommit`

When the MetadataLoader processes a MetadataDelta, it checks whether the
high watermark has been updated. If not, it skips processing    The
impontent method is `MetadataLoader#maybePublishMetadata` and
`maybePublishMetadata#stillNeedToCatchUp`

The crucial high watermark update happens after the SnapshotFileReader’s
iterator finishes reading, using the cleanup task of its kafkaQueue.

So, if the MetadataLoader finishes processing all batches before the
high watermark is updated, the main thread will keep waiting.  <img
width="1088" alt="image"

src="https://github.com/user-attachments/assets/03daa288-ff39-49a3-bbc7-e7b5831a858b"
/>

<img width="867" alt="image"

src="https://github.com/user-attachments/assets/fc0770dd-de54-4f69-b669-ab4e696bd2a7"
/>

## Solution
If we’ve reached the last batch in the iteration, we update the high
watermark first before adding events to the MetadataLoader, ensuring
that MetadataLoader runs at least once after the watermark is updated.

After modifying the code, you’ll see the normal shell execution
behavior.

<img width="337" alt="image"

src="https://github.com/user-attachments/assets/2791d03c-81ae-4762-a015-4d6d9e526455"
/>

Reviewers: PoAn Yang <payang@apache.org>, Jhen-Yung Hsu
<jhenyunghsu@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-04-14 14:13:50 +08:00
Andrew Schofield 21a080f08c
KAFKA-16894: Define feature to enable share groups (#19293)
This PR proposes a switch to enable share groups for 4.1 (preview) and
4.2 (GA).

* `share.version=1` to indicate that share groups are enabled. This is
used as the switch for turning share groups on and off.

In 4.1, the default will be `share.version=0`. Then a user wanting to
evaluate the preview of KIP-932 would use `bin/kafka-features.sh
--bootstrap.server xxxx upgrade --feature share.version=1`.

In 4.2, the default will be `share.version=1`.

Reviewers: Jun Rao <junrao@gmail.com>
2025-04-11 12:14:38 +01:00
Alyssa Huang 6e446f0b05
KAFKA-19047: Allow quickly re-registering brokers that are in controlled shutdown (#19296)
Allow re-registration of brokers with active sessions if the previous broker registration was in controlled shutdown.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Reviewers: José Armando García Sancio <jsancio@apache.org>, David Mao <dmao@confluent.io>
2025-04-08 13:39:04 -07:00
PoAn Yang dcf6f9d4c9
MINOR: remove unused function BrokerRegistration#isMigratingZkBroker (#19330)
The `BrokerRegistration#isMigratingZkBroker` is not used by any
production function. Remove it.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-04-09 01:56:46 +08:00
Dmitry Werner 4144290335
MINOR: Cleanup metadata module (#18937)
Removed unused code and fixed IDEA warnings.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-31 17:46:21 +08:00
José Armando García Sancio 82de719fff
MINOR; Improve error message for the storage format command (#19210)
Minor improvement on the error output for the storage format command. Suggests changes for a valid storage format command.

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-03-23 19:30:57 -04:00
David Arthur 8fa3856473
MINOR Mar 19 flaky tests (#19248)
CoordinatorRequestManagerTest#testMarkCoordinatorUnknownLoggingAccuracy
has become flaky again. Last 30 days report shows a sudden re-occurrence


https://develocity.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=github,trunk,not:flaky,not:new&search.tasks=test&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.clients.consumer.internals.CoordinatorRequestManagerTest&tests.sortField=FLAKY#

Also mark QuorumControllerTest.testMinIsrUpdateWithElr as flaky.

Reviewers: Lianet Magrans <lmagrans@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-03-21 09:26:13 -04:00
Calvin Liu 1c582a4a35
KAFKA-18954: Add ELR election rate metric (#19180)
Add a metric to track the number of election is done using ELR.
https://issues.apache.org/jira/browse/KAFKA-18954

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Justine Olshan
<jolshan@confluent.io>
2025-03-20 15:37:49 -07:00
PoAn Yang 71875ec58e
KAFKA-18845: Fix flaky QuorumControllerTest#testUncleanShutdownBrokerElrEnabled (#19240)
There're two root causes:
1. When we unclean shutdown `brokerToBeTheLeader`, we didn't wait for
the result. That means when we send heartbeat to unfence broker, it has
chance to use stale broker epoch to send the request. [0]
2. We use different replica directory to unclean shutdown broker. Even
if broker is unfenced, it cannot get an online directory, so the
`brokerToBeTheLeader` cannot be elected as a new leader. [1]


[0]
a5325e029e/metadata/src/test/java/org/apache/kafka/controller/QuorumControllerTest.java (L484-L497)

[1]
a5325e029e/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java (L2470-L2477)

Reviewers: David Arthur <mumrah@gmail.com>
2025-03-20 13:37:09 -04:00
PoAn Yang da46cf6e79
KAFKA-17565 Move MetadataCache interface to metadata module (#18801)
### Changes

* Move MetadataCache interface to metadata module and change Scala
function to Java.
* Remove functions `getTopicPartitions`, `getAliveBrokers`,
`topicNamesToIds`, `topicIdInfo`, and `getClusterMetadata` from
MetadataCache interface, because these functions are only used in test
code.

### Performance

* ReplicaFetcherThreadBenchmark
  ```
./jmh-benchmarks/jmh.sh -f 1 -i 2 -wi 2
org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark
  ```
  * trunk
  ```
Benchmark (partitionCount) Mode Cnt Score Error Units
ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 2 4775.490 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 2 25730.790 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 2 55334.206 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 2 488427.547 ns/op
  ```
  * branch
  ```
Benchmark (partitionCount) Mode Cnt Score Error Units
ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 2 4825.219 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 2 25985.662 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 2 56056.005 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 2 497138.573 ns/op
  ```

* KRaftMetadataRequestBenchmark
  ```
./jmh-benchmarks/jmh.sh -f 1 -i 2 -wi 2
org.apache.kafka.jmh.metadata.KRaftMetadataRequestBenchmark
  ```
  * trunk
  ```
Benchmark (partitionCount) (topicCount) Mode Cnt Score Error Units
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 10 500
avgt 2 884933.558 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 10 1000
avgt 2 1910054.621 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 10 5000
avgt 2 21778869.337 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 20 500
avgt 2 1537550.670 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 20 1000
avgt 2 3168237.805 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 20 5000
avgt 2 29699652.466 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 50 500
avgt 2 3501483.852 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 50 1000
avgt 2 7405481.182 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 50 5000
avgt 2 55839670.124 ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 10 500 avgt 2 333.667
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 10 1000 avgt 2 339.685
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 10 5000 avgt 2 334.293
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 20 500 avgt 2 329.899
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 20 1000 avgt 2 347.537
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 20 5000 avgt 2 332.781
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 50 500 avgt 2 327.085
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 50 1000 avgt 2 325.206
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 50 5000 avgt 2 316.758
ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 10 500 avgt 2 7.569 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 10 1000 avgt 2 7.565 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 10 5000 avgt 2 7.574 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 20 500 avgt 2 7.568 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 20 1000 avgt 2 7.557 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 20 5000 avgt 2 7.585 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 50 500 avgt 2 7.560 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 50 1000 avgt 2 7.554 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 50 5000 avgt 2 7.574 ns/op
  ```
  * branch
  ```
Benchmark (partitionCount) (topicCount) Mode Cnt Score Error Units
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 10 500
avgt 2 910337.770 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 10 1000
avgt 2 1902351.360 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 10 5000
avgt 2 22215893.338 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 20 500
avgt 2 1572683.875 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 20 1000
avgt 2 3188560.081 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 20 5000
avgt 2 29984751.632 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 50 500
avgt 2 3413567.549 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 50 1000
avgt 2 7303174.254 ns/op
KRaftMetadataRequestBenchmark.testMetadataRequestForAllTopics 50 5000
avgt 2 54293721.640 ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 10 500 avgt 2 318.335
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 10 1000 avgt 2 331.386
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 10 5000 avgt 2 332.944
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 20 500 avgt 2 340.322
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 20 1000 avgt 2 330.294
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 20 5000 avgt 2 342.154
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 50 500 avgt 2 341.053
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 50 1000 avgt 2 335.458
ns/op
KRaftMetadataRequestBenchmark.testRequestToJson 50 5000 avgt 2 322.050
ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 10 500 avgt 2 7.538 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 10 1000 avgt 2 7.548 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 10 5000 avgt 2 7.545 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 20 500 avgt 2 7.597 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 20 1000 avgt 2 7.567 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 20 5000 avgt 2 7.558 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 50 500 avgt 2 7.559 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 50 1000 avgt 2 7.615 ns/op
KRaftMetadataRequestBenchmark.testTopicIdInfo 50 5000 avgt 2 7.562 ns/op
  ```

* PartitionMakeFollowerBenchmark
  ```
./jmh-benchmarks/jmh.sh -f 1 -i 2 -wi 2
org.apache.kafka.jmh.partition.PartitionMakeFollowerBenchmark
  ```
  * trunk
  ```
Benchmark Mode Cnt Score Error Units
PartitionMakeFollowerBenchmark.testMakeFollower avgt 2 158.816 ns/op
  ```
  * branch
  ```
Benchmark Mode Cnt Score Error Units
PartitionMakeFollowerBenchmark.testMakeFollower avgt 2 160.533 ns/op
  ```

* UpdateFollowerFetchStateBenchmark
  ```
./jmh-benchmarks/jmh.sh -f 1 -i 2 -wi 2
org.apache.kafka.jmh.partition.UpdateFollowerFetchStateBenchmark
  ```
  * trunk
  ```
Benchmark Mode Cnt Score Error Units
UpdateFollowerFetchStateBenchmark.updateFollowerFetchStateBench avgt 2
4975.261 ns/op
UpdateFollowerFetchStateBenchmark.updateFollowerFetchStateBenchNoChange
avgt 2 4880.880 ns/op
  ```
  * branch
  ```
Benchmark Mode Cnt Score Error Units
UpdateFollowerFetchStateBenchmark.updateFollowerFetchStateBench avgt 2
5020.722 ns/op
UpdateFollowerFetchStateBenchmark.updateFollowerFetchStateBenchNoChange
avgt 2 4878.855 ns/op
  ```


* CheckpointBench
  ```
./jmh-benchmarks/jmh.sh -f 1 -i 2 -wi 2
org.apache.kafka.jmh.server.CheckpointBench
  ```
  * trunk
  ```
Benchmark (numPartitions) (numTopics) Mode Cnt Score Error Units
CheckpointBench.measureCheckpointHighWatermarks 3 100 thrpt 2 0.997
ops/ms
CheckpointBench.measureCheckpointHighWatermarks 3 1000 thrpt 2 0.703
ops/ms
CheckpointBench.measureCheckpointHighWatermarks 3 2000 thrpt 2 0.486
ops/ms
CheckpointBench.measureCheckpointLogStartOffsets 3 100 thrpt 2 1.038
ops/ms
CheckpointBench.measureCheckpointLogStartOffsets 3 1000 thrpt 2 0.734
ops/ms
CheckpointBench.measureCheckpointLogStartOffsets 3 2000 thrpt 2 0.637
ops/ms
  ```
  * branch
  ```
Benchmark (numPartitions) (numTopics) Mode Cnt Score Error Units
CheckpointBench.measureCheckpointHighWatermarks 3 100 thrpt 2 0.990
ops/ms
CheckpointBench.measureCheckpointHighWatermarks 3 1000 thrpt 2 0.659
ops/ms
CheckpointBench.measureCheckpointHighWatermarks 3 2000 thrpt 2 0.508
ops/ms
CheckpointBench.measureCheckpointLogStartOffsets 3 100 thrpt 2 0.923
ops/ms
CheckpointBench.measureCheckpointLogStartOffsets 3 1000 thrpt 2 0.736
ops/ms
CheckpointBench.measureCheckpointLogStartOffsets 3 2000 thrpt 2 0.637
ops/ms
  ```

* PartitionCreationBench
  ```
./jmh-benchmarks/jmh.sh -f 1 -i 2 -wi 2
org.apache.kafka.jmh.server.PartitionCreationBench
  ```
  * trunk
  ```
Benchmark (numPartitions) (useTopicIds) Mode Cnt Score Error Units
PartitionCreationBench.makeFollower 20 false avgt 2 5.997 ms/op
PartitionCreationBench.makeFollower 20 true avgt 2 6.961 ms/op
  ```
  * branch
  ```
Benchmark (numPartitions) (useTopicIds) Mode Cnt Score Error Units
PartitionCreationBench.makeFollower 20 false avgt 2 6.212 ms/op
PartitionCreationBench.makeFollower 20 true avgt 2 7.005 ms/op
  ```

Reviewers: Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-17 23:59:11 +08:00
Sanskar Jhajharia 766caaa551
MINOR: Clean up metadata module (#19069)
Given that now we support Java 17 on our brokers, this PR replace the
use of the following in metadata module:

Collections.singletonList() and Collections.emptyList() with List.of()
Collections.singletonMap() and Collections.emptyMap() with Map.of()
Collections.singleton() and Collections.emptySet() with Set.of()

Reviewers: David Arthur <mumrah@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-16 03:09:40 +08:00
José Armando García Sancio d04efca493
KAFKA-18979; Report correct kraft.version in ApiVersions (#19205)
Skip kraft.version when applying FeatureLevelRecord records. The kraft.version is stored as control records and not as metadata records. This solution has the benefits of removing from snapshots any FeatureLevelRecord for kraft.version that was incorrectly written to the log and allows ApiVersions to report the correct finalized kraft.version.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2025-03-13 18:39:24 -04:00
PoAn Yang d171ff08a7
KAFKA-18858 Refactor FeatureControlManager to avoid using uninitialized MV (#19040)
The `FeatureControlManager` used `MetadataVersion#LATEST_PRODUCTION` as uninitialized MV. This makes other component may get a stale MV. In production code, the `FeatureControlManager` set MV when replaying `FeatureLevelRecord`, so we can set `Optional.empty()` as uninitialized MV. If other components get an empty result, the `FeatureLevelRecord` throws an exception like `FeaturesImage`.

Unit test:
* FeatureControlManagerTest#testMetadataVersion: test getting MetadataVersion
* before and after replaying FeatureLevelRecord.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-13 23:37:41 +08:00
David Arthur 0ebc3e83c5
MINOR Mar 12 Flaky tests (#19190)
Mark the following tests as flaky:

* StickyAssignorTest > testLargeAssignmentAndGroupWithUniformSubscription
* DeleteSegmentsByRetentionTimeTest
* QuorumControllerTest > testUncleanShutdownBrokerElrEnabled

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-03-12 13:47:35 -04:00
Lucas Brutschy fc2e3dfce9
MINOR: Disallow unused local variables (#18963)
Recently, we found a regression that could have been detected by static
analysis, since a local variable wasn't being passed to a method during
a refactoring, and was left unused. It was fixed in
[7a749b5](7a749b589f),
but almost slipped into 4.0. Unused variables are typically detected by
IDEs, but this is insufficient to prevent these kinds of bugs. This
change enables unused local variable detection in checkstyle for Kafka.

A few notes on the usage:
- There are two situations in which people actually want to have a local
variable but not use it. First, there are `for (Type ignored:
collection)` loops which have to loop `collection.length` number of
times, but that do not use `ignored` in the loop body. These are
typically still easier to read than a classical `for` loop. Second, some
IDEs detect it if a return value of a function such as `File.delete` is
not being used. In this case, people sometimes store the result in an
unused local variable to make ignoring the return value explicit and to
avoid the squiggly lines.
- In Java 22, unsued local variables can be omitted by using a single
underscore `_`. This is supported by checkstyle. In pre-22 versions,
IntelliJ allows such variables to be named `ignored` to suppress the
unused local variable warning. This pattern is often (but not
consistently) used in the Kafka codebase. This is, however, not
supported by checkstyle.

Since we cannot switch to Java 22, yet, and we want to use automated
detection using checkstyle, we have to resort to prefixing the unused
local variables with `@SuppressWarnings("UnusedLocalVariable")`. We have
to apply this in 11 cases across the Kafka codebase. While not being
pretty, I'd argue it's worth it to prevent bugs like the one fixed in
[7a749b5](7a749b589f).

Reviewers: Andrew Schofield <aschofield@confluent.io>, David Arthur
<mumrah@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Bruno
Cadonna <cadonna@apache.org>, Kirk True <ktrue@confluent.io>
2025-03-10 09:37:35 +01:00
PoAn Yang a5e5e2dcd5
KAFKA-18706 Move AclPublisher to metadata module (#18802)
Move AclPublisher to org.apache.kafka.metadata.publisher package.

Reviewers: Christo Lolov <lolovc@amazon.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-09 21:00:33 +08:00
Colin Patrick McCabe 343bc995f4
KAFKA-18920: The kcontrollers must set kraft.version in ApiVersionsResponse (#19127)
The kafka controllers need to set kraft.version in their
ApiVersionsResponse messages according to the current kraft.version
reported by the Raft layer. Instead, currently they always set it to 0.

Also remove FeatureControlManager.latestFinalizedFeatures. It is not
needed and it does a lot of copying.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-07 13:46:46 -08:00
Calvin Liu db38bef076
KAFKA-18940: fix electionWasClean (#19156)
The electionWasClean should also consider if the election is done
through ELR. Otherwise, the metric uncleanLeaderElection will wrongly
count the ELR election
https://issues.apache.org/jira/browse/KAFKA-18940

Reviewers: Jun Rao <junrao@gmail.com>
2025-03-07 11:04:06 -08:00