Simplify the last known elr update logic. This way can make a more
robust logic.
Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
This change adds the metric ControllerEventManager::AvgIdleRatio which
measures the amount of time the controller spends blocked waiting for
events vs the amount of time spent processing events. A value of 1.0
means that the controller spent the entire interval blocked waiting for
events.
Reviewers: José Armando García Sancio <jsancio@apache.org>, Kevin Wu
<kevin.wu2412@gmail.com>, Alyssa Huang <ahuang@confluent.io>, TengYao
Chi <frankvicky@apache.org>, Reviewers: Chia-Ping Tsai
<chia7712@apache.org>
Just because a controller node sets --no-initial-controllers flag does
not mean it is necessarily running kraft.version=1. The more precise
meaning is that the controller node being formatted does not know what
kraft version the cluster should be in, and therefore it is only safe to
assume kraft.version=0. Only by setting
--standalone,--initial-controllers, or --no-initial-controllers
AND not specifying the controller.quorum.voters static config, is it
known kraft.version > 0.
For example, it is a valid configuration (although confusing) to run a
static quorum defined by controller.quorum.voters but have all the
controllers format with --no-initial-controllers. In this case,
specifying --no-initial-controllers alongside a metadata version that
does not support kraft.version=1 causes formatting to fail, which is
a regression.
Additionally, the formatter should not check the kraft.version against
the release version, since kraft.version does not actually depend on any
release version. It should only check the kraft.version against the
static voters config/format arguments.
This PR also cleans up the integration test framework to match the
semantics of formatting an actual cluster.
Reviewers: TengYao Chi <kitingiao@gmail.com>, Kuan-Po Tseng
<brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, José
Armando García Sancio <jsancio@apache.org>
Improve `MetadataVersion.fromVersionString()` to take an
`enableUnstableFeature` flag, and enable `FeatureCommand` and
`StorageTool` to leverage the exception message from
`fromVersionString`.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Basically, one of the refactor tasks. In this PR, I have moved
`DelegationTokenPublisher` to the metadata module. Similar to the
`ScramPublisher` migration (commit feee50f476), I have moved
`DelegationTokenManager` to the server-common module, as it would
otherwise create a circular dependency. Moreover, I have made multiple
changes throughout the codebase to reference `DelegationTokenManager`
from server-common instead of the server module.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
The current `metadata-shell` find command throws an exception due to
child node `zkMigrationState`.
This interrupts the output and makes the CLI less usable.
Additionally, `zkMigrationState` is no longer used in Kafka 4.x, but it
is still defined under image/features, which causes misleading error
messages.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
- Move the `RaftManager` interface to raft module, and remove the
`register` and `leaderAndEpoch` methods since they are already part of
the RaftClient APIs.
- Rename RaftManager.scala to KafkaRaftManager.scala.
Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This PR is a follow-up from https://github.com/apache/kafka/pull/20468.
Basically makes two things:
1. Moves the variable to the catch block as it is used only there.
2. Refactor FeaturesPublisher to handle exceptions the same as
ScramPublisher or other publishers :)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
---------
Signed-off-by: see-quick <maros.orsak159@gmail.com>
The original name is confusing which could cause engineers to make a
mistake and confuse the `batchSize` with some other unit like number of
records.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
There is a small logic bug in topic replay. If a topic is created and
then removed before the TopicsDelta is applied, we end up with the
deleted topic in createdTopics on the delta. Tis issue is fixed by
removing the topicName from createTopics when replaying
RemoveTopicRecord to make sure the deleted topics won't appear in
createTopics.
Reviewers: José Armando García Sancio <jsancio@apache.org>, Kevin Wu
<kevin.wu2412@gmail.com>, Alyssa Huang <ahuang@confluent.io>
https://issues.apache.org/jira/browse/KAFKA-18699
This PR aims at cleaning up the `metadata` module further by getting rid
of some extra code which can be replaced by record
Reviewers: Ken Huang <s7133700@gmail.com>, Ming-Yen Chung
<mingyen066@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Follow-up to
[KAFKA-18486](https://issues.apache.org/jira/browse/KAFKA-18486)
* Replace PartitionState with PartitionRegistration in
makeFollower/makeLeader
* Remove PartitionState.java since it is no longer referenced
Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>,
Chia-Ping Tsai <chia7712@gmail.com>
### Summary
Adds comprehensive test coverage for the StorageTool format command
feature validation, including tests for valid feature overrides, invalid
feature detection, and multiple feature specifications. Also adds debug
output to help with troubleshooting format operations.
### Changes Made
#### Test Coverage Improvements
- **`testFormatWithReleaseVersionAndFeatureOverride()`**: Tests that
feature overrides work correctly when specified with `--feature` flag
- **`testFormatWithInvalidFeatureThrowsError()`**: Tests error handling
for unsupported features
- **`testFormatWithMultipleFeatures()`**: Tests multiple feature
specifications in a single format command
#### Debug Output Enhancement
- **Formatter.java**: Added debug output to print bootstrap metadata
during format operations
- Helps with troubleshooting format issues by showing the complete
bootstrap metadata being written
- Improves visibility into what features and configurations are being
applied
#### Test Updates
- **FormatterTest.java**: Updated existing tests to account for the new
debug output\
### Related
- KIP-1022: [Formatting and Updating Features
](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1022%3A+Formatting+and+Updating+Features)
Reviewers: Kevin Wu <kevin.wu2412@gmail.com>, Justine Olshan
<jolshan@confluent.io>
Along with the change: https://github.com/apache/kafka/pull/17952
([KIP-966](https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas)),
the semantics of `min.insync.replicas` config has small change, and add
some constraints. We should document them clearly.
Reviewers: Jun Rao <junrao@gmail.com>, Calvin Liu <caliu@confluent.io>,
Mickael Maison <mickael.maison@gmail.com>, Paolo Patierno
<ppatierno@live.com>, Federico Valeri <fedevaleri@gmail.com>, Chia-Ping
Tsai <chia7712@gmail.com>
This patch fixes the bug that allows the last known leader to be elected as a partition leader while still in a fenced state, before the next heartbeat removes the fence.
https://issues.apache.org/jira/browse/KAFKA-19522
Reviewers: Jun Rao <junrao@gmail.com>, TengYao Chi
<frankvicky@apache.org>
The MetadataImage has a lot of stuff in it and it gets passed around in
many places in the new GroupCoordinator. This makes it difficult to
understand what metadata the group coordinator actually relies on and
makes it too easy to use metadata in ways it wasn't meant to be used.
This change encapsulate the MetadataImage in an interface
(`CoordinatorMetadataImage`) that indicates and controls what metadata
the group coordinator actually uses. Now it is much easier at a glance
to see what dependencies the GroupCoordinator has on the metadata. Also,
now we have a level of indirection that allows more flexibility in how
the GroupCoordinator is provided the metadata it needs.
Removing the isEligibleLeaderReplicasV1Enabled to let ELR be enabled if
MV is at least 4.1IV1. Also bump the Latest Prod MV to 4.1IV1
Reviewers: Paolo Patierno <ppatierno@live.com>, Jun Rao <junrao@gmail.com>
- Updated `ClientQuotaImage` and `TopicImage` by using
`Collections.unmodifiableMap` or `ImmutableMap` to prevent accidental or
intentional mutations after construction.
Reviewers: Alyssa Huang <ahuang@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>
This PR adds the following metrics for each of the supported production
features (`metadata.version`, `kraft.version`, `transaction.version`,
etc.):
`kafka.server:type=MetadataLoader,name=FinalizedLevel,featureName=X`
`kafka.server:type=node-metrics,name=maximum-supported-level,feature-name=X`
`kafka.server:type=node-metrics,name=minimum-supported-level,feature-name=X`
Reviewers: Josep Prat <josep.prat@aiven.io>, PoAn Yang
<payang@apache.org>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Lan Ding
<isDing_L@163.com>, Chia-Ping Tsai <chia7712@gmail.com>
The PR do following:
1. Remove ReplicaManager#becomeLeaderOrFollower.
2. Remove `LeaderAndIsrRequest` and `LeaderAndIsrResponse`
3. Migrate `LeaderAndIsrRequest.PartitionState` to server-common module
and change to `PartitionState`
4. Remove `ControllerEpoch` from PartitionState
5. Remove `isShuttingDown` from BrokerServer and ReplicaManager
Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
https://issues.apache.org/jira/browse/KAFKA-19383 When applying the
ClearElrRecord, it may pick up the topicId in the image without checking
if the topic has been deleted. This can cause the creation of a new
TopicRecord with an old topic ID.
Reviewers: Alyssa Huang <ahuang@confluent.io>, Artem Livshits <alivshits@confluent.io>, Colin P. McCabe <cmccabe@apache.org>
If there are more deletion filters after we initially hit the
`MAX_RECORDS_PER_USER_OP` bound, we will add an additional deletion
record ontop of that for each additional filter.
The current error message returned to the client is not useful either,
adding logic so client doesn't just get `UNKNOWN_SERVER_EXCEPTION` with
no details returned.
This change compares the remote replica's HWM with the leader's HWM and
completes the FETCH request if the remote HWM is less than the leader's
HWM. When the leader's HWM is updated any pending FETCH RPC is
completed.
Reviewers: Alyssa Huang <ahuang@confluent.io>, David Arthur
<mumrah@gmail.com>, Andrew Schofield <aschofield@confluent.io>
`CreateTopicPolicy#validate` may throw unexpected exception other than
`PolicyViolationException`. We should handle this case as well.
Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
Simplify Set initialization and reduce the overhead of creating extra
collections.
The changes mostly include:
- new HashSet<>(List.of(...))
- new HashSet<>(Arrays.asList(...)) / new HashSet<>(asList(...))
- new HashSet<>(Collections.singletonList()) / new
HashSet<>(singletonList())
- new HashSet<>(Collections.emptyList())
- new HashSet<>(Set.of())
This change takes the following into account, and we will not change to
Set.of in these scenarios:
- Require `mutability` (UnsupportedOperationException).
- Allow `duplicate` elements (IllegalArgumentException).
- Allow `null` elements (NullPointerException).
- Depend on `Ordering`. `Set.of` does not guarantee order, so it could
make tests flaky or break public interfaces.
Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang
<payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
This PR simplifies two ConcurrentHashMap fields by removing their Atomic
wrappers:
- Change `brokerContactTimesMs` from `ConcurrentHashMap<Integer,
AtomicLong>` to `ConcurrentHashMap<Integer, Long>`.
- Change `brokerRegistrationStates` from `ConcurrentHashMap<Integer,
AtomicInteger>` to `ConcurrentHashMap<Integer, Integer>`.
This removes mutable holders without affecting thread safety (see
discussion in #19828).
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, TengYao Chi
<frankvicky@apache.org>, Kevin Wu <kevin.wu2412@gmail.com>, Ken Huang
<s7133700@gmail.com>
In #19840, we broke de-duplication during ACL creation. This patch fixes
that and adds a test to cover this case.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Following the removal of the ZK-to-KRaft migration code in commit
85bfdf4, controller-to-broker communication is now handled by the
control-plane listener (`controller.listener.names`). The
`interBrokerListenerName` parameter in `ClusterControlManager` is no
longer referenced on the controller side and can be safely removed as
dead code.
Reviewers: Lan Ding <isDing_L@163.com>, Ken Huang <s7133700@gmail.com>,
Chia-Ping Tsai <chia7712@gmail.com>
Fix `updateBrokerContactTime` so that existing brokers still have their
contact time updated when they are already tracked. Also, update the
unit test to test this case.
Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Yung
<yungyung7654321@gmail.com>, TengYao Chi <frankvicky@apache.org>, Ken
Huang <s7133700@gmail.com>
This patch fixes a problem in AclControlManager where we are updating
the timeline data structures prematurely.
Reviewers: Alyssa Huang <ahuang@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, Andrew Schofield <aschofield@confluent.io>,
The main issue was that we forgot to set
`TopicConfig.SEGMENT_BYTES_CONFIG` to at least `1024 * 1024`, which
caused problems in tests with small segment sizes.
To address this, we introduced a new internal config:
`LogConfig.INTERNAL_SEGMENT_BYTES_CONFIG`, allowing us to set smaller
segment bytes specifically for testing purposes.
We also updated the logic so that if a user configures the topic-level
segment bytes without explicitly setting the internal config, the
internal value will no longer be returned to the user.
In addition, we removed
`MetadataLogConfig#METADATA_LOG_SEGMENT_MIN_BYTES_CONFIG` and added
three new internal configurations:
- `INTERNAL_MAX_BATCH_SIZE_IN_BYTES_CONFIG`
- `INTERNAL_MAX_FETCH_SIZE_IN_BYTES_CONFIG`
- `INTERNAL_DELETE_DELAY_MILLIS_CONFIG`
Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
[KAFKA-16720](https://issues.apache.org/jira/browse/KAFKA-16720) aims to
finish the AlterShareGroupOffsets RPC.
Reviewers: Andrew Schofield <aschofield@confluent.io>
---------
Co-authored-by: jimmy <wangzhiwang@qq.com>
- Remove redundant close of `snapshotWriter`.
- `snapshotWriter` is already closed by `RaftSnapshotWriter#writer`.
Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This PR introduces the following per-broker metrics:
-`kafka.controller:type=KafkaController,name=BrokerRegistrationState,broker=X`
-`kafka.controller:type=KafkaController,name=TimeSinceLastHeartbeatReceivedMs,broker=X`
and this metric:
`kafka.controller:type=KafkaController,name=ControlledShutdownBrokerCount`
Reviewers: Colin P. McCabe <cmccabe@apache.org>
* In ConsoleShareConsumerTest, add `@SuppressWarnings("unchecked")`
annotation in method shouldUpgradeDeliveryCount
* In ListConsumerGroupOffsetsHandlerTest, add generic parameters to
HashSet constructors
* In TopicsImageTest, add explicit generic type to Collections.EMPTY_MAP
to fix raw type usage
Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
jira: https://issues.apache.org/jira/browse/KAFKA-19068
`RecordsIterator#decodeControlRecord` do the type check and then
`ControlRecord` constructor does that again.
we should add a static method to ControlRecord to create `ControlRecord`
with type check, and then `ControlRecord` constructor should be changed
to private to ensure all instance is created by the static method.
Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai
<chia7712@gmail.com>
The current ElectionWasClean checks if the new leader is in the previous
ISR. However, there is a corner case in the partition reassignment. The
partition reassignment can change the partition replicas. If the new
preferred leader (the first one in the new replicas) is the last one to
join ISR, this preferred leader will be elected in the same partition
change.
For example: In the previous state, the partition is Leader: 0,
Replicas (2,1,0), ISR (1,0), Adding(2), removing(0). Then replica 2
joins the ISR. The new partition would be like: Leader: 2, Replicas
(2,1), ISR(1,2). The new leader 2 is not in the previous ISR (1,0) but
it is still a clean election.
Reviewers: Jun Rao <junrao@gmail.com>
This PR adds an additional test case to `FormatterTest` that checks that
formatting with `--standalone` and then formatting again with
`--standalone --ignore-formatted` is indeed a no-op.
Reviewers: José Armando García Sancio <jsancio@apache.org>
Log a message when reading a batch that is larger than the currently
allocated batch.
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, PoAn Yang
<payang@apache.org>
replace all applicable `.stream().forEach()` in codebase with just
`.forEach()`.
Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
1. remove org.apache.kafka.raft.OffsetAndEpoch
2. rewrite org.apache.kafka.server.common.OffsetAndEpoch by record
keyword
3. rename OffsetAndEpoch#leaderEpoch to OffsetAndEpoch#epoch
Reviewers: PoAn Yang <payang@apache.org>, Xuan-Zhang Gong
<gongxuanzhangmelt@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
The ShareVersion feature does not make any metadata version changes. As
a result, `SV_1` does not depend on any MV level, and no MV needs to be
defined for the preview of KIP-932.
Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
Add new StreamsGroupFeature, disabled by default, and add "streams" as
default value to `group.coordinator.rebalance.protocols`.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot
<david.jacot@gmail.com>, Lucas Brutschy <lbrutschy@confluent.io>,
Justine Olshan <jolshan@confluent.io>, Andrew Schofield
<aschofield@confluent.io>, Jun Rao <jun@confluent.io>