Commit Graph

440 Commits

Author SHA1 Message Date
kevin-wu24 167e2f71f0
KAFKA-17713: Don't generate snapshot when published metadata is not batch aligned (#17398)
When MetadataBatchLoader handles a BeginTransactionRecord, it will publish the metadata that has seen so far and not publish again until the transaction is ended or aborted. This means a partial record batch can be published. If a snapshot is generated during this time, the currently published metadata may not align with the end of a record batch. This causes problems with Raft replication which expects a snapshot's offset to exactly precede a record batch boundary.

This patch enhances SnapshotGenerator to refuse to generate a snapshot if the metadata is not batch aligned.

Reviewers: David Arthur <mumrah@gmail.com>
2024-10-10 13:23:14 -04:00
TengYao Chi 924c1081dc
KAFKA-17415 Avoid overflow of expired timestamp (#17026)
Both ZK and KRaft modes do not handle overflow, so setting a large max lifetime results in a negative expired timestamp and negative max timestamp, which is unexpected behavior.

In this PR, we are only fixing the KRaft code since ZK will be removed soon.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-07 01:43:43 +08:00
Colin Patrick McCabe 85bfdf4127
KAFKA-17613: Remove ZK migration code (#17293)
Remove the controller machinery for doing ZK migration in Kafka 4.0.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>
2024-10-03 12:01:14 -07:00
Justine Olshan 49d7ea6c6a
KAFKA-16308 [3/N]: Introduce feature dependency validation to UpdateFeatures command (#16443)
This change includes:

1. Dependency checking when updating the feature (all request versions)
2. Returning top level error and no feature level errors if any feature failed to update and using this error for all the features in the response. (all request versions)
3. Returning only top level none error for v2 and beyond

Reviewers: Jun Rao <jun@confluent.io>
2024-10-01 14:21:38 -07:00
Chung, Ming-Yen e136d7611c
KAFKA-17656 Replace string concatenation with parameterized logging for PartitionChangeBuilder (#17334)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-02 01:53:39 +08:00
Alyssa Huang 68b9770506
KAFKA-17608, KAFKA-17604, KAFKA-16963; KRaft controller crashes when active controller is removed (#17146)
This change fixes a few issues.

KAFKA-17608; KRaft controller crashes when active controller is removed
When a control batch is committed, the quorum controller currently increases the last stable offset but fails to create a snapshot for that offset. This causes an issue if the quorum controller renounces and needs to revert to that offset (which has no snapshot present). Since the control batches are no-ops for the quorum controller, it does not need to update its offsets for control records. We skip handle commit logic for control batches.

KAFKA-17604; Describe quorum output missing added voters endpoints
Describe quorum output will miss endpoints of voters which were added via AddRaftVoter. This is due to a bug in LeaderState's updateVoterAndObserverStates which will pull replica state from observer states map (which does not include endpoints). The fix is to populate endpoints from the lastVoterSet passed into the method.

Reviewers: José Armando García Sancio <jsancio@apache.org>, Colin P. McCabe <cmccabe@apache.org>, Chia-Ping Tsai <chia7712@apache.org>
2024-09-26 13:56:19 -04:00
Colin Patrick McCabe 7c429f3514
KAFKA-17612 Remove some tests that only apply to ZK mode or migration (#17276)
Reviewers: David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-09-26 20:41:29 +08:00
Colin Patrick McCabe d3936365bf
KAFKA-16468: verify that migrating brokers provide their inter.broker.listener (#17159)
When brokers undergoing ZK migration register with the controller, it should verify that they have
provided a way to contact them via their inter.broker.listener. Otherwise the migration will fail
later on with a more confusing error message.

Reviewers: David Arthur <mumrah@gmail.com>
2024-09-13 09:18:24 -07:00
David Arthur 0e30209f01
KAFKA-17506 KRaftMigrationDriver initialization race (#17147)
There is a race condition between KRaftMigrationDriver running its first poll() and being notified by Raft about a leader change. If onControllerChange is called before RecoverMigrationStateFromZKEvent is run, we will end up getting stuck in the INACTIVE state.

This patch fixes the race by enqueuing a RecoverMigrationStateFromZKEvent from onControllerChange if the driver has not yet initialized. If another RecoverMigrationStateFromZKEvent was already enqueued, the second one to run will just be ignored.

Reviewers: Luke Chen <showuon@gmail.com>
2024-09-11 10:41:49 -04:00
David Arthur 1fd1646eb9
KAFKA-15648 Update leader volatile before handleLeaderChange in LocalLogManager (#17118)
Update the leader before calling handleLeaderChange and use the given epoch in LocalLogManager#prepareAppend. This should hopefully fix several flaky QuorumControllerTest tests.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2024-09-06 13:54:03 -04:00
David Jacot c977bfdd3c
KAFKA-17413; Re-introduce `group.version` feature flag (#17013)
This patch re-introduces the `group.version` feature flag and gates the new consumer rebalance protocol with it. The `group.version` feature flag is attached to the metadata version `4.0-IV0` and it is marked as production ready. This allows system tests to pick it up directly by default without requiring to set `unstable.feature.versions.enable` in all of them. This is fine because we don't plan to do any incompatible changes before 4.0.

Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-29 01:22:54 -07:00
Colin Patrick McCabe ca0cc355f6
KAFKA-12670: Support configuring unclean leader election in KRaft (#16866)
Previously in KRaft mode, we could request an unclean leader election for a specific topic using
the electLeaders API. This PR adds an additional way to trigger unclean leader election when in
KRaft mode via the static controller configuration and various dynamic configurations.

In order to support all possible configuration methods, we have to do a multi-step configuration
lookup process:

1. check the dynamic topic configuration for the topic.
2. check the dynamic node configuration.
3. check the dynamic cluster configuration.
4. check the controller's static configuration.

Fortunately, we already have the logic to do this multi-step lookup in KafkaConfigSchema.java.
This PR reuses that logic. It also makes setting a configuration schema in
ConfigurationControlManager mandatory. Previously, it was optional for unit tests.

Of course, the dynamic configuration can change over time, or the active controller can change
to a different one with a different configuration. These changes can make unclean leader
elections possible for partitions that they were not previously possible for. In order to address
this, I added a periodic background task which scans leaderless partitions to check if they are
eligible for an unclean leader election.

Finally, this PR adds the UncleanLeaderElectionsPerSec metric.

Co-authored-by: Luke Chen showuon@gmail.com

Reviewers: Igor Soarez <soarez@apple.com>, Luke Chen <showuon@gmail.com>
2024-08-28 14:13:20 -07:00
TengYao Chi 4a485ddb71
KAFKA-17315 Fix the behavior of delegation tokens that expire immediately upon creation in KRaft mode (#16858)
In kraft mode, expiring delegation token (`expiryTimePeriodMs` < 0) has following different behavior to zk mode.

1. `ExpiryTimestampMs` is set to "expiryTimePeriodMs" [0] rather than "now" [1]
2. it throws exception directly if the token is expired already [2]. By contrast, zk mode does not. [3]

[0] 49fc14f611/metadata/src/main/java/org/apache/kafka/controller/DelegationTokenControlManager.java (L316)
[1] 49fc14f611/core/src/main/scala/kafka/server/DelegationTokenManagerZk.scala (L292)
[2] 49fc14f611/metadata/src/main/java/org/apache/kafka/controller/DelegationTokenControlManager.java (L305)
[3] 49fc14f611/core/src/main/scala/kafka/server/DelegationTokenManagerZk.scala (L293)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-25 07:29:58 +08:00
Dmitry Werner 6cad2c0d67
KAFKA-17370 Move LeaderAndIsr to metadata module (#16943)
isrWithBrokerEpoch = addBrokerEpochToIsr(isrToSend.toL
2024-08-22 15:47:09 +08:00
Alyssa Huang 0bb2aee838
KAFKA-17305; Check broker registrations for missing features (#16848)
When a broker tries to register with the controller quorum, its registration should be rejected if it doesn't support a feature that is currently enabled. (A feature is enabled if it is set to a non-zero feature level.) This is important for the newly added kraft.version feature flag.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@apache.org>
2024-08-21 11:14:56 -07:00
TengYao Chi 81f0b13a70
KAFKA-17238 Move VoterSet and ReplicaKey from raft.internals to raft (#16775)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-16 00:24:51 +08:00
José Armando García Sancio 0f7cd4dcde
KAFKA-17304; Make RaftClient API for writing to log explicit (#16862)
RaftClient API is changed to separate the batch accumulation (RaftClient#prepareAppend) from scheduling the append of accumulated batches (RaftClient#schedulePrepatedAppend) to the KRaft log. This change is needed to better match the controller's flow of replaying the generated records before replicating them. When the controller replay records it needs to know the offset associated with the record. To compute a table offset the KafkaClient needs to be aware of the records and their log position.

The controller uses this new API by generated the cluster metadata records, compute their offset using RaftClient#prepareAppend, replay the records in the state machine, and finally allowing KRaft to append the records with RaftClient#schedulePreparedAppend.

To implement this API the BatchAccumulator is changed to also support this access pattern. This is done by adding a drainOffset to the implementation. The batch accumulator is allowed to return any record and batch that is less than the drain offset.

Lastly, this change also removes some functionality that is no longer needed like non-atomic appends and validation of the base offset.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>
2024-08-14 15:42:04 -04:00
DL1231 3a0efa2845
KAFKA-14510; Extend DescribeConfigs API to support group configs (#16859)
This patch extends the DescribeConfigs API to support group configs.

Reviewers: Andrew Schofield <aschofield@confluent.io>, David Jacot <djacot@confluent.io>
2024-08-14 06:37:57 -07:00
Colin Patrick McCabe 132e0970fb
KAFKA-17018: update MetadataVersion for the Kafka release 3.9 (#16841)
- Mark 3.9-IV0 as stable. Metadata version 3.9-IV0 should return Fetch version 17.

- Move ELR to 4.0-IV0. Remove 3.9-IV1 since it's no longer needed.

- Create a new 4.0-IV1 MV for KIP-848.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Justine Olshan <jolshan@confluent.io>
2024-08-12 16:30:43 -07:00
Colin Patrick McCabe e1b2adea07
KAFKA-17190: AssignmentsManager gets stuck retrying on deleted topics (#16672)
In MetadataVersion 3.7-IV2 and above, the broker's AssignmentsManager sends an RPC to the
controller informing it about which directory we have chosen to place each new replica on.
Unfortunately, the code does not check to see if the topic still exists in the MetadataImage before
sending the RPC. It will also retry infinitely. Therefore, after a topic is created and deleted in
rapid succession, we can get stuck including the now-defunct replica in our subsequent
AssignReplicasToDirsRequests forever.

In order to prevent this problem, the AssignmentsManager should check if a topic still exists (and
is still present on the broker in question) before sending the RPC. In order to prevent log spam,
we should not log any error messages until several minutes have gone past without success.
Finally, rather than creating a new EventQueue event for each assignment request, we should simply
modify a shared data structure and schedule a deferred event to send the accumulated RPCs. This
will improve efficiency.

Reviewers: Igor Soarez <i@soarez.me>, Ron Dagostino <rndgstn@gmail.com>
2024-08-10 12:31:45 +01:00
Josep Prat 4e862c0903
KAFKA-15875: Stops leak Snapshot in public methods (#16807)
* KAFKA-15875: Stops leak Snapshot in public methods

The Snapshot class is package protected but it's returned in
several public methods in SnapshotRegistry.
To prevent this accidental leakage, these methods are made
package protected as well. For getOrCreateSnapshot a new
method called IdempotentCreateSnapshot is created that returns void.
* Make builer package protected, replace <br> with <p>

Reviewers: Greg Harris <greg.harris@aiven.io>
2024-08-08 20:05:47 +02:00
TengYao Chi 8438c4339e
KAFKA-17245: Revert TopicRecord changes. (#16780)
Revert KAFKA-16257 changes because KIP-950 doesn't need it anymore.

Reviewers: Luke Chen <showuon@gmail.com>
2024-08-03 20:15:51 +08:00
Colin Patrick McCabe 02f541d4ea
KAFKA-16518: Implement KIP-853 flags for storage-tool.sh (#16669)
As part of KIP-853, storage-tool.sh now has two new flags: --standalone, and --initial-voters. This PR implements these two flags in storage-tool.sh.

There are currently two valid ways to format a cluster:

The pre-KIP-853 way, where you use a statically configured controller quorum. In this case, neither --standalone nor --initial-voters may be specified, and kraft.version must be set to 0.

The KIP-853 way, where one of --standalone and --initial-voters must be specified with the initial value of the dynamic controller quorum. In this case, kraft.version must be set to 1.

This PR moves the formatting logic out of StorageTool.scala and into Formatter.java. The tool file was never intended to get so huge, or to implement complex logic like generating metadata records. Those things should be done by code in the metadata or raft gradle modules. This is also useful for junit tests, which often need to do formatting. (The 'info' and 'random-uuid' commands remain in StorageTool.scala, for now.)

Reviewers: José Armando García Sancio <jsancio@apache.org>
2024-08-02 15:47:45 -07:00
Chung, Ming-Yen 7c0a96d08d
KAFKA-17185 Declare Loggers as static to prevent multiple logger instances (#16680)
As discussed in #16657 (comment) , we should make logger as static to avoid creating multiple logger instances.
I use the regex private.*Logger.*LoggerFactory to search and check all the results if certain logs need to be static.

There are some exceptions that loggers don't need to be static:
1) The logger in the inner class. Since java8 doesn't support static field in the inner class.
        https://github.com/apache/kafka/blob/trunk/clients/src/test/java/org/apache/kafka/clients/consumer/internals/FetchRequestManagerTest.java#L3676

2) Custom loggers for each instance (non-static + non-final). In this case, multiple logger instances is actually really needed.
        https://github.com/apache/kafka/blob/trunk/storage/src/test/java/org/apache/kafka/server/log/remote/storage/LocalTieredStorage.java#L166

3) The logger is initialized in constructor by LogContext. Many non-static but with final modifier loggers are in this category, that's why I use .*LoggerFactory to only check the loggers that are assigned initial value when declaration.
    
4) protected final Logger log = Logger.getLogger(getClass())
    This is for subclass can do logging with subclass name instead of superclass name.
    But in this case, if the log access modifier is private, the purpose cannot be achieved since subclass cannot access the log defined in superclass. So if access modifier is private, we can replace getClass() with <className>.class

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-31 02:37:36 +08:00
Luke Chen 1b11fef5bb
KAFKA-17205: Allow topic config validation in controller level in KRaft mode (#16693)
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Christo Lolov <lolovc@amazon.com>
2024-07-30 17:07:09 +01:00
Justine Olshan a0f6e6f816
KAFKA-16192: Introduce transaction.version and usage of flexible records to coordinators (#16183)
This change includes adding transaction.version (part of KIP-1022)

New transaction version 1 is introduced to support writing flexible fields in transaction state log messages.

Transaction version 2 is created in anticipation for further KIP-890 changes.

Neither are made production ready. Tests for the new transaction version and new MV are created.

Also include change to not report a feature as supported if the range is 0-0.

Reviewers: Jun Rao <junrao@apache.org>, David Jacot <djacot@confluent.io>, Artem Livshits <alivshits@confluent.io>, Colin P. McCabe <cmccabe@apache.org>
2024-07-26 11:38:44 -07:00
Logan Zhu 3589f45656
MINOR: Replace lambda expressions with method references for ReplicationControlManager (#16547)
Reviewers: Xuan-Zhang Gong <gongxuanzhangmelt@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-24 19:44:54 +08:00
Mickael Maison 90b779b7bb
MINOR: Various cleanups in metadata (#16610)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-22 10:26:09 +08:00
Chung, Ming-Yen 66655ab49a
KAFKA-17095 Fix the typo from "CreateableTopicConfig" to "CreatableTopicConfig" (#16623)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-19 11:09:08 +08:00
Colin Patrick McCabe 4d3e366bc2
KAFKA-16772: Introduce kraft.version to support KIP-853 (#16230)
Introduce the KRaftVersion enum to describe the current value of kraft.version. Change a bunch of places in the code that were using raw shorts over to using this new enum.

In BrokerServer.scala, fix a bug that could cause null pointer exceptions during shutdown if we tried to shut down before fully coming up.

Do not send finalized features that are finalized as level 0, since it is a no-op.

Reviewers: dengziming <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@apache.org>
2024-07-16 09:31:10 -07:00
David Arthur 8aee314a46
KAFKA-16667 Avoid stale read in KRaftMigrationDriver (#15918)
When becoming the active KRaftMigrationDriver, there is another race condition similar to KAFKA-16171. This time, the race is due to a stale read from ZK. After writing to /controller and /controller_epoch, it is possible that a read on /migration is not linearized with the writes that were just made. In other words, we get a stale read on /migration. This leads to an inability to sync metadata to ZK due to incorrect zkVersion on the migration ZNode.

The non-linearizability of reads is in fact documented behavior for ZK, so we need to handle it.

To fix the stale read, this patch adds a write to /migration after updating /controller and /controller_epoch. This allows us to learn the correct zkVersion for the migration ZNode before leaving the BECOME_CONTROLLER state.

This patch also adds a check on the current leader epoch when running certain events in KRaftMigrationDriver. Historically, we did not include this check because it is not necessary for correctness. Writes to ZK are gated on the /controller_epoch zkVersion, and RPCs sent to brokers are gated on the controller epoch. However, during a time of rapid failover, there is a lot of processing happening on the controller (i.e., full metadata sync to ZK and full UMRs sent to brokers), so it is best to avoid running events we know will fail.

There is also a small fix in here to improve the logging of ZK operations. The log message are changed to past tense to reflect the fact that they have already happened by the time the log message is created.

Reviewers: Igor Soarez <soarez@apple.com>
2024-07-15 09:32:06 -04:00
Colin Patrick McCabe ebaa108967
KAFKA-16968: Introduce 3.8-IV0, 3.9-IV0, 3.9-IV1
Create 3 new metadata versions:

- 3.8-IV0, for the upcoming 3.8 release.
- 3.9-IV0, to add support for KIP-1005.
- 3.9-IV1, as the new release vehicle for KIP-966.

Create ListOffsetRequest v9, which will be used in 3.9-IV0 to support KIP-1005. v9 is currently an unstable API version.

Reviewers: Jun Rao <junrao@gmail.com>, Justine Olshan <jolshan@confluent.io>
2024-06-27 14:03:03 -07:00
Murali Basani 87f8147ed0
KAFKA-16855 : Part 1 - New fields tieredEpoch and tieredState (#16257)
Add field tieredEpoch to RemoteLogSegmentMetadata
Update relevant tests
Add two fields tieredEpoch and tieredState to TopicRecord.json

Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>
2024-06-25 15:00:12 +01:00
TingIāu "Ting" Kì b2758f4ac6
KAFKA-16989 Use StringBuilder instead of String concatenation (#16385)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-19 09:19:32 +08:00
Colin Patrick McCabe 2fd00ce536
KAFKA-16952: Do not bump broker epoch when re-registering the same incarnation (#16333)
* KAFKA-16952: Do not bump broker epoch when re-registering the same incarnation

As part of KIP-858 (Handle JBOD broker disk failure in KRaft), we added some code that caused the
broker to re-register itself when transitioning from a MetadataVersion that did not support broker
directory IDs, to one that did. This code was necessary because otherwise the controller would not
be aware of what directories the broker held.

However, prior to this PR, the re-registration process acted exactly like a full registration. That
is, it bumped the broker epoch (which is meant to only be bumped on broker restart). This PR fixes
the code to keep the broker epoch the same if the incarnation ID is the same.

There are some other minor improvements here:

- The previous logic relied on a complicated combination of request version and previous broker
  epoch to understand if the request came from the same broker or not. This is not needed: either
  the incarnation ID is the same and it's the same process, or it is not and it isn't.

- We now log whether we're amending a registration, registering a previously unknown broker, or
  replacing a previous registration.

- Move changes to the HeartbeatManager to the end of the function, so that we will not do them if
  any validation step fails. Log4j messages are also generated at the end, for the same reason.

Reviewers: Ron Dagostino <rndgstn@gmail.com>
2024-06-18 07:03:15 -07:00
ChickenchickenLove 1a7ba667ad
MINOR improve startup log in QuorumController (#15926)
Reviewers: David Arthur <mumrah@gmail.com>
2024-06-17 11:04:12 -04:00
TingIāu "Ting" Kì 92d8d4bd1f
KAFKA-16970 Fix hash implementation of `ScramCredentialValue`, `ScramCredentialData`, and `ContextualRecord` (#16359)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-17 22:29:22 +08:00
gongxuanzhang 4e846038a6
KAFKA-10787 Apply spotless to `metadata` and `server` and `storage` module (#16297)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-16 05:28:50 +08:00
Kuan-Po (Cooper) Tseng 2e5cd0b476
MINOR: Refine javadoc in TopicsDelta TopicDelta LocalReplicaChanges (#16195)
Add more description to TopicsDelta TopicDelta LocalReplicaChanges

Reviewers: Luke Chen <showuon@gmail.com>
2024-06-14 11:22:19 +08:00
gongxuanzhang 596b945072
KAFKA-16643 Add ModifierOrder checkstyle rule (#15890)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-13 15:39:32 +08:00
Sanskar Jhajharia 226f3c57e3
MINOR: Code cleanup in metadata module (#16065)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-06-06 15:18:23 +02:00
TingIāu "Ting" Kì d652f5cf54
MINOR: Add topicIds and directoryIds to the return value of the toString method. (#16189)
Add topicIds and directoryIds to the return value of the toString method.

Reviewers: Luke Chen <showuon@gmail.com>
2024-06-05 07:52:06 +08:00
Igor Soarez 7e0caad96e
MINOR: Cleanup unused references in core (#16192)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-05 05:12:33 +08:00
Igor Soarez 16359e70d3
KAFKA-16583: Handle PartitionChangeRecord without directory IDs (#16118)
When PartitionRegistration#merge() reads a PartitionChangeRecord
from an older MetadataVersion, with a replica assignment change
and without #directories() set, it produces a direcotry assignment
of DirectoryId.UNASSIGNED. This is problematic because the MetadataVersion
may not yet support directory assignments, leading to a
UnwritableMetadataException in PartitionRegistration#toRecord.

Since the Controller always sets directories on PartitionChangeRecord
if the MetadataVersion supports it, via PartitionChangeBuilder,
there's no need for PartitionRegistration#merge() to populate
directories upon a replica assignment change.

Reviewers: Luke Chen <showuon@gmail.com>
2024-06-04 15:37:20 +01:00
David Jacot 53d592e369
MINOR: Fix type in MetadataVersion.IBP_4_0_IV0 (#16181)
This patch fixes a typo in MetadataVersion.IBP_4_0_IV0. It should be 0 not O.

Reviewers: Justine Olshan <jolshan@confluent.io>, Jun Rao <junrao@gmail.com>,  Chia-Ping Tsai <chia7712@gmail.com>
2024-06-03 20:48:04 -07:00
Colin Patrick McCabe 8ace33b47f
KAFKA-16757: Fix broker re-registration issues around MV 3.7-IV2 (#15945)
When upgrading from a MetadataVersion older than 3.7-IV2, we need to resend the broker registration, so that the controller can record the storage directories. The current code for doing this has several problems, however. One is that it tends to trigger even in cases where we don't actually need it. Another is that when re-registering the broker, the broker is marked as fenced.

This PR moves the handling of the re-registration case out of BrokerMetadataPublisher and into BrokerRegistrationTracker. The re-registration code there will only trigger in the case where the broker sees an existing registration for itself with no directories set.  This is much more targetted than the original code.

Additionally, in ClusterControlManager, when re-registering the same broker, we now preserve its fencing and shutdown state, rather than clearing those. (There isn't any good reason re-registering the same broker should clear these things... this was purely an oversight.) Note that we can tell the broker is "the same" because it has the same IncarnationId.

Reviewers: Gaurav Narula <gaurav_narula2@apple.com>, Igor Soarez <soarez@apple.com>
2024-06-01 23:51:39 +01:00
David Jacot ba61ff0cd9
KAFKA-16860; [1/2] Introduce group.version feature flag (#16120)
This patch introduces the `group.version` feature flag with one version:
1) Version 1 enables the new consumer group rebalance protocol (KIP-848).

Reviewers: Justine Olshan <jolshan@confluent.io>
2024-05-31 12:48:55 -07:00
Justine Olshan 7c1bb1585f
KAFKA-16308 [2/N]: Allow unstable feature versions and rename unstable metadata config (#16130)
As per KIP-1022, we will rename the unstable metadata versions enabled config to support all feature versions.

Features is also updated to return latest production and latest testing versions of each feature.

A feature is production ready when the corresponding metadata version (bootstrapMetadataVersion) is production ready.

Adds tests for the feature usage of the unstableFeatureVersionsEnabled config

Reviewers: David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
2024-05-30 14:52:50 -07:00
Justine Olshan 5e3df22095
KAFKA-16308 [1/N]: Create FeatureVersion interface and add `--feature` flag and handling to StorageTool (#15685)
As part of KIP-1022, I have created an interface for all the new features to be used when parsing the command line arguments, doing validations, getting default versions, etc.

I've also added the --feature flag to the storage tool to show how it will be used.

Created a TestFeatureVersion to show an implementation of the interface (besides MetadataVersion which is unique) and added tests using this new test feature.

I will add the unstable config and tests in a followup.

Reviewers: David Mao <dmao@confluent.io>, David Jacot <djacot@confluent.io>, Artem Livshits <alivshits@confluent.io>, Jun Rao <junrao@apache.org>
2024-05-29 16:36:06 -07:00
Mickael Maison affe8da54c
KAFKA-7632: Support Compression Levels (KIP-390) (#15516)
Reviewers: Jun Rao <jun@confluent.io>,  Luke Chen <showuon@gmail.com>
Co-authored-by: Lee Dongjin <dongjin@apache.org>
2024-05-21 17:58:49 +02:00
Ken Huang 81e6098021
KAFKA-16797 A bit cleanup of FeatureControlManager (#15997)
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-20 17:19:01 +08:00
Gaurav Narula 412b05df00
KAFKA-16789 Fix thread leak detection for event handler threads (#15984)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-19 18:21:56 +08:00
José Armando García Sancio bfe81d6229
KAFKA-16207; KRaft's internal log listener to update voter set (#15671)
Adds support for the KafkaRaftClient to read the control records KRaftVersionRecord and VotersRecord in the snapshot and log. As the control records in the KRaft partition are read, the replica's known set of voters are updated. This change also contains the necessary changes to include the control records when a snapshot is generated by the KRaft state machine.

It is important to note that this commit changes the code and the in-memory state to track the sets of voters but it doesn't change any data that is externally exposed. It doesn't change the RPCs, data stored on disk or configuration.

When the KRaft replica starts the PartitionListener reads the latest snapshot and then log segments up to the LEO, updating the in-memory state as it reads KRaftVersionRecord and VotersRecord. When the replica (leader and follower) appends to the log, the PartitionListener catches up to the new LEO. When the replica truncates the log because of a diverging epoch, the PartitionListener also truncates the in-memory state to the new LEO. When the state machine generate a new snapshot the PartitionListener trims any prefix entries that are not needed. This is all done to minimize the amount of data tracked in-memory and to make sure that it matches the state on disk.

To implement the functionality described above this commit also makes the following changes:

Adds control records for KRaftVersionRecord and VotersRecord. KRaftVersionRecord describes the finalized kraft.version supported by all of the replicas. VotersRecords describes the set of voters at a specific offset.

Changes Kafka's feature version to support 0 as the smallest valid value. This is needed because the default value for kraft.version is 0.

Refactors FileRawSnapshotWriter so that it doesn't directly call the onSnapshotFrozen callback. It adds NotifyingRawSnapshotWriter for calling such callbacks. This reorganization is needed because in this change both the KafkaMetadataLog and the KafkaRaftClient need to react to snapshots getting frozen.

Cleans up KafkaRaftClient's initialization. Removes initialize from RaftClient - this is an implementation detail that doesn't need to be exposed in the interface. Removes RaftConfig.AddressSpec and simplifies the bootstrapping of the static voter's address. The bootstrapping of the address is delayed because of tests. We should be able to simplify this further in future commits.

Update the DumpLogSegment CLI to support the new control records KRaftVersionRecord and VotersRecord.

Fix the RecordsSnapshotReader implementations so that the iterator includes control records. RecordsIterator is extended to support reading the new control records.
Improve the BatchAccumulator implementation to allow multiple control records in one control batch. This is needed so that KRaft can make sure that VotersRecord is included in the same batch as the control record (KRaftVersionRecord) that upgrades the kraft.version to 1.

Add a History interface and default implementation TreeMapHistory. This is used to track all of the sets of voters between the latest snapshot and the LEO. This is needed so that KafkaRaftClient can query for the latest set of voters and so that KafkaRaftClient can include the correct set of voters when the state machine generates a new snapshot at a given offset.

Add a builder pattern for RecordsSnapshotWriter. The new builder pattern also implements including the KRaftVersionRecord and VotersRecord control records in the snapshot as necessary. A KRaftVersionRecord should be appended if the kraft.version is greater than 0 at the snapshot's offset. Similarly, a VotersRecord should be appended to the snapshot with the latest value up to the snapshot's offset.

Reviewers: Jason Gustafson <jason@confluent.io>
2024-05-04 12:43:16 -07:00
Colin Patrick McCabe a3f2414990
KAFKA-16624: Don't generate useless PartitionChangeRecord on older MV (#15810)
Fix a case where we could generate useless PartitionChangeRecords on metadata versions older than
3.6-IV0. This could happen in the case where we had an ISR with only one broker in it, and we were
trying to go down to a fully empty ISR. In this case, PartitionChangeBuilder would block the record
to going down to a fully empty ISR (since that is not valid in these pre-KIP-966 metadata
versions), but it would still emit the record, even though it had no effect.

Reviewers: Igor Soarez <soarez@apple.com>
2024-05-02 09:23:25 -07:00
mannoopj 31355ef8f9
KAFKA-16475: add more tests to TopicImageNodeTest (#15735)
Add more test cases to TopicImageNodeTest.java.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-04-30 14:59:00 -07:00
Luke Chen ec151c8278
KAFKA-16563: retry pollEvent in KRaftMigrationDriver for retriable errors (#15732)
When running ZK migrating to KRaft process, we encountered an issue that the migrating is hanging and the ZkMigrationState cannot move to MIGRATION state. And it is because the pollEvent didn't retry with the retriable MigrationClientException (ZK client retriable errors) while it should. This PR fixes it and add test. And because of this, the poll event will not poll anymore, which causes the KRaftMigrationDriver hanging.

Reviewers: Luke Chen <showuon@gmail.com>, Igor Soarez<soarez@apple.com>, Akhilesh C <akhileshchg@users.noreply.github.com>
2024-04-29 17:44:47 +08:00
Mickael Maison df4ef5a621
MINOR: Various cleanups in metadata (#15806)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-26 05:50:40 +08:00
TingIāu "Ting" Kì 864744ffd4
KAFKA-16610 Replace "Map#entrySet#forEach" by "Map#forEach" (#15795)
Reviewers: Apoorv Mittal <amittal@confluent.io>, Igor Soarez <soarez@apple.com>
2024-04-25 01:52:24 +01:00
Omnia Ibrahim cfe5ab5cf2
KAFKA-15853 Move quota configs into server-common package (#15774)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-04-24 13:05:18 +08:00
ilyazr b254e787cb
KAFKA-16466: Log exception message for non-fault errors in QuorumController (#15701)
The generic error handler of QuorumController didn't log the exception message for non-fault errors, which includes very useful debugging info.

Reviewers: Igor Soarez <soarez@apple.com>
2024-04-23 02:01:36 +01:00
Johnny Hsu 5193eb9323
KAFKA-16475: add test for TopicImageNodeTest (#15720)
Add a unit test for TopicImageNodeTest.

Co-authored-by: Johnny Hsu <johnnyhsu@fb.com>
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-04-16 10:20:34 -07:00
Kuan-Po (Cooper) Tseng 315cd83048
MINOR: remove redundant argument in log (#15699)
remove redundant argument in log

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-04-12 08:36:33 +08:00
Igor Soarez f6c9feea76
KAFKA-16297: Race condition while promoting future replica (#15557)
If a future replica doesn't get promoted, any directory reassignment sent to the controller should be reversed.

The current logic is already addressing the case when a replica hasn't yet been promoted and the controller hasn't yet acknowledged the directory reassignment. However, it doesn't cover the case where the replica does not get promoted due to a directory failure after the controller has acknowledged the reassignment but before the future replica catches up again and is promoted to main replica.

Reviewers: Luke Chen <showuon@gmail.com>
2024-04-10 17:57:05 +08:00
Calvin Liu 6de58d2731
MINOR; Missing minISR config should log a debug message (#15529)
Log a debug message when the min isr configuration is missing for a topic.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2024-04-06 17:41:32 -07:00
Calvin Liu 376e9e20db
KAFKA-15586: Clean shutdown detection - server side (#14706)
If the broker registers with the same broker epoch as the previous session, it is recognized as a clean shutdown. Otherwise, it is an unclean shutdown. This replica will be removed from any ELR.

Reviewers: Artem Livshits <alivshits@confluent.io>, David Arthur <mumrah@gmail.com>
2024-04-04 09:12:05 -04:00
Alyssa Huang 4ccbf1634a
MINOR: Metadata image test improvements (#15373)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-03-28 11:22:02 +01:00
PoAn Yang 6f8d4fe26b
KAFKA-15949: Unify metadata.version format in log and error message (#15505)
There were different words for metadata.version like metadata version or metadataVersion. Unify format as metadata.version.

Reviewers: Luke Chen <showuon@gmail.com>
2024-03-26 20:09:29 +08:00
Alyssa Huang 03f7b5aa3a
KAFKA-16206: Fix unnecessary topic config deletion during ZK migration (#14206)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ron Dagostino <rndgstn@gmail.com>
2024-03-21 15:38:42 +01:00
Kuan-Po (Cooper) Tseng 12a1d85362
KAFKA-12187 replace assertTrue(obj instanceof X) with assertInstanceOf (#15512)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-20 10:36:25 +08:00
Colin Patrick McCabe 2c1943d836
MINOR: remove test constructor for PartitionAssignment (#15435)
Remove the test constructor for PartitionAssignment and remove the TODO.
Also add KRaftClusterTest.testCreatePartitions to get more coverage for
createPartitions.

Reviewers: David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-03-05 12:02:19 -08:00
Luke Chen 98fb3bd304
MINOR: log error when initialLoadFuture is not done in authorizer (#14953)
Currently, when initializing StandardAuthorizer, it'll wait until all ACL loaded and complete the initialLoadFuture. So, checking logs, we'll see:

2023-12-06 14:07:50,325 INFO [StandardAuthorizer 1] Initialized with 6 acl(s). (org.apache.kafka.metadata.authorizer.StandardAuthorizerData) [kafka-1-metadata-loader-event-handler]
2023-12-06 14:07:50,325 INFO [StandardAuthorizer 1] Completed initial ACL load process. (org.apache.kafka.metadata.authorizer.StandardAuthorizerData) [kafka-1-metadata-loader-event-handler]

But then, when shutting down the node, we will also see this error:

2023-12-06 14:12:32,752 ERROR [StandardAuthorizer 1] Failed to complete initial ACL load process. (org.apache.kafka.metadata.authorizer.StandardAuthorizerData) [kafka-1-metadata-loader-event-handler]
java.util.concurrent.TimeoutException
	at kafka.server.metadata.AclPublisher.close(AclPublisher.scala:98)
	at org.apache.kafka.image.loader.MetadataLoader.closePublisher(MetadataLoader.java:568)
	at org.apache.kafka.image.loader.MetadataLoader.lambda$removeAndClosePublisher$7(MetadataLoader.java:528)
	at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
	at java.base/java.lang.Thread.run(Thread.java:840)

It's confusing. And it's because we'll try to complete authorizer initialLoad, and complete the initialLoadFuture if not done. But we'll log the error no matter it's completed or not. This patch improves the logging.

Reviewers: Josep Prat <josep.prat@aiven.io>
2024-02-17 13:58:11 +08:00
Calvin Liu 756f44a3e5
KAFKA-15665: Enforce partition reassignment should complete when all target replicas are in ISR (#15359)
When completing the partition reassignment, the new ISR should have all the target replicas.

Reviewers: Justine Olshan <jolshan@confluent.io>, David Mao <dmao@confluent.io>
2024-02-16 10:27:43 -08:00
David Arthur c000b1fae2
MINOR: Fix some MetadataDelta handling issues during ZK migration (#15327)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-02-07 12:54:59 -08:00
David Arthur 12ce9c7f98 KAFKA-16216: Reduce batch size for initial metadata load during ZK migration
During migration from ZK mode to KRaft mode, there is a step where the kcontrollers load all of the
data from ZK into the metadata log. Previously, we were using a batch size of 1000 for this, but
200 seems better. This PR also adds an internal configuration to control this batch size, for
testing purposes.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-02-01 15:48:52 -08:00
David Arthur 16ed7357b1 KAFKA-16171: Fix ZK migration controller race #15238
This patch causes the active KRaftMigrationDriver to reload the /migration ZK state after electing
itself as the leader in ZK. This closes a race condition where the previous active controller could
make an update to /migration after the new leader was elected. The update race was not actually a
problem regarding the data since both controllers would be syncing the same state from KRaft to ZK,
but the change to the znode causes the new controller to fail on the zk version check on
/migration.

This patch also fixes a as-yet-unseen bug where the active controllers failing to elect itself via
claimControllerLeadership would not retry.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-01-29 13:51:45 -08:00
Calvin Liu 7e5ef9b509
KAFKA-15585: Implement DescribeTopicPartitions RPC on broker (#14612)
This patch implements the new DescribeTopicPartitions RPC as defined in KIP-966 (ELR). Additionally, this patch adds a broker config "max.request.partition.size.limit" which limits the number of partitions returned by the new RPC.

Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>, David Arthur <mumrah@gmail.com>
2024-01-24 15:16:09 -05:00
Justine Olshan e00d36b9c0
KAFKA-15468 [1/2]: Prevent transaction coordinator reloads on already loaded leaders (#15139)
This originally was #14489 which covered 2 aspects -- reloading on partition epoch changes where leader epoch did not change and reloading when leader epoch changed but we were already the leader.

I've cut out the second part of the change since the first part is much simpler.

Redefining the TopicDelta fields to better distinguish when a leader is elected (leader epoch bump) vs when a leader has isr/replica changes (partition epoch bump). There are some cases where we bump the partition epoch but not the leader epoch. We do not need to do operations that only care about the leader epoch bump. (ie -- onElect callbacks)

Reviewers: Artem Livshits <alivshits@confluent.io>, José Armando García Sancio <jsancio@apache.org>
2024-01-23 14:58:53 -08:00
David Arthur 7bf7fd99a5
KAFKA-16078: Be more consistent about getting the latest MetadataVersion
This PR creates MetadataVersion.latestTesting to represent the highest metadata version (which may be unstable) and MetadataVersion.latestProduction to represent the latest version that should be used in production. It fixes a few cases where the broker was advertising that it supported the testing versions even when unstable metadata versions had not been configured.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
2024-01-17 14:59:22 -08:00
Divij Vaidya 65424ab484
MINOR: New year code cleanup - include final keyword (#15072)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Sagar Rao <sagarmeansocean@gmail.com>
2024-01-11 17:53:35 +01:00
Igor Soarez f385ef468b
KAFKA-15364: Replay BrokerRegistrationChangeRecord.logDirs (#14998)
Any directory changes must be considered when replaying
BrokerRegistrationChangeRecord. This is necessary
to persist directory failures in the cluster metadata,
which #14902 missed.

Reviewers: Omnia G.H Ibrahim <o.g.h.ibrahim@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
2023-12-18 15:43:28 -05:00
David Arthur 7f763d327f
KAFKA-16007 Merge batch records during ZK migration (#15007)
To avoid creating lots of small KRaft batches during the ZK migration, this patch adds a mechanism to merge batches into sizes of at least 1000. This has the effect of reducing the number of batches sent to Raft which reduces the amount of time spent blocking.

Since migrations use metadata transactions, the batch boundaries for migrated records are not significant. Even in light of that, this implementation does not break up existing batches. It will only combine them into a larger batch to meet the minimum size.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2023-12-15 19:33:15 -05:00
Proven Provenzano b0e99b5593
KAFKA-15922: Bump MetadataVersion to support JBOD with KRaft (#14984)
Moves ELR from MetadataVersion IBP_3_7_IV3 into the new IBP_3_8_IV0 because the ELR feature was not completed before 3.7 reached feature freeze.  Leaves IBP_3_7_IV3 empty -- it is a no-op and is not reused for anything.  Adds the new MetadataVersion IBP_3_7_IV4 for the FETCH request changes from KIP-951, which were mistakenly never associated with a MetadataVersion.  Updates the LATEST_PRODUCTION MetadataVersion to IBP_3_7_IV4 to declare both KRaft JBOD and the KIP-951 changes ready for production use.

Reviewers: Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>, Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@apache.org>, Justine Olshan <jolshan@confluent.io>
2023-12-14 10:08:54 -05:00
Omnia Ibrahim 07490b929b
KAFKA-15365: Broker-side replica management changes (#14881)
Reviewers: Igor Soarez <soarez@apple.com>, Ron Dagostino <rndgstn@gmail.com>, Proven Provenzano <pprovenzano@confluent.io>
2023-12-11 09:34:22 -05:00
Colin Patrick McCabe 32fdb8d173
KAFKA-15956: MetadataShell must take the log directory lock when reading (#14899)
MetadataShell should take an advisory lock on the .lock file of the directory it is reading from.
Add an integration test of this functionality in MetadataShellIntegrationTest.java.

Note: in build.gradle, I had to add some dependencies on server-common's test files in order to use
MockFaultHandler, etc.

MetadataBatchLoader.java: fix a case where a log message was incorrect.  The intention was to print
the number equivalent to (offset + index).  Instead it was printing the offset, followed by the
index. So if the offset was 100 and the index was 1, 1001 would be printed rather than 101.

Co-authored-by: Igor Soarez <i@soarez.me>
Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>
2023-12-10 19:18:34 -08:00
Igor Soarez 93b6df6173
KAFKA-15364: Handle log directory failure in the Controller (#14902)
When log directories fail, the broker will send a heartbeat listing the failed directories. This
PR implements processing offline directories in the controller's broker heartbeat handling. We
update broker registrations and generate leadership/ISR changes as necessary.

Co-authored-by: Colin P. McCabe <cmccabe@apache.org>
Reviewers: Ron Dagostino <rndgstn@gmail.com>
2023-12-08 14:44:14 -08:00
Igor Soarez c515bf51f8 KAFKA-15426: Process and persist directory assignments
Handle AssignReplicasToDirs requests, persist metadata changes
with new directory assignments and possible leader elections.

Reviewers: Proven Provenzano <pprovenzano@confluent.io>, Ron Dagostino <rndgstn@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2023-12-07 11:44:45 -08:00
Igor Soarez 32576f61ce
MINOR: always register before touch in BrokerHeartbeatManager (#14934)
BrokerHeartbeatManager should require a call to register(brokerId) before touch(brokerId)

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ron Dagostino <rndgstn@gmail.com>
2023-12-07 10:13:39 -08:00
Colin Patrick McCabe 969bc7749c
KAFKA-15980: Add the CurrentControllerId metric (#14749)
Add the CurrentControllerId metric as described in KIP-1001. This gives us an easy way to identify the current controller by looking at the metrics of any Kafka node (broker or controller).

Reviewers: David Arthur <mumrah@gmail.com>
2023-12-06 21:03:33 -08:00
Igor Soarez f467f6bb4f
KAFKA-15361: Process and persist dir info with broker registration (#14838)
Part of JBOD KIP-858, https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>, Ron Dagostino <rdagostino@confluent.io>
2023-12-06 16:40:43 -05:00
Colin Patrick McCabe ebae7b26b5
MINOR: fix bug where we weren't registering SnapshotEmitterMetrics (#14918)
Fix a bug where we weren't properly exposing SnapshotEmitterMetrics. Add a test.

Reviewers: David Arthur <mumrah@gmail.com>
2023-12-04 21:32:12 -08:00
Igor Soarez 6b87c85291 KAFKA-15886: Always specify directories for new partition registrations
When creating partition registrations directories must always be defined.

If creating a partition from a PartitionRecord or PartitionChangeRecord from an older version that
does not support directory assignments, then DirectoryId.MIGRATING is assumed.

If creating a new partition, or triggering a change in assignment, DirectoryId.UNASSIGNED should be
specified, unless the target broker has a single online directory registered, in which case the
replica should be assigned directly to that single directory.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-11-30 14:10:47 -08:00
Colin Patrick McCabe a94bc8d6d5
KAFKA-15922: Add a MetadataVersion for JBOD (#14860)
Assign MetadataVersion.IBP_3_7_IV2 to JBOD.

Move KIP-966 support to MetadataVersion.IBP_3_7_IV3.

Create MetadataVersion.LATEST_PRODUCTION as the latest metadata version that can be used when formatting a
new cluster, or upgrading a cluster using kafka-features.sh. This will allow us to clearly distinguish between stable
and unstable metadata versions for the first time.

Reviewers: Igor Soarez <soarez@apple.com>, Ron Dagostino <rndgstn@gmail.com>, Calvin Liu <caliu@confluent.io>, Proven Provenzano <pprovenzano@confluent.io>
2023-11-30 10:35:13 -08:00
Jason Gustafson a35e021925
MINOR: Fix flaky `MetadataLoaderTest.testNoPublishEmptyImage` (#14875)
There is a race in the assertion on `capturedImages`. Since the future is signaled first, it is still possible to see an empty list. By adding to the collection first, we can ensure the assertion will succeed.

Reviewers: Reviewers: David Jacot <djacot@confluent.io>
2023-11-30 09:50:19 -08:00
yuyli 937578be65
MINOR: Rename method sendBrokerHeartbeat #14658
Rename sendBrokerHeartbeat to sendBrokerHeartbeatToUnfenceBrokers.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-11-27 13:30:40 -08:00
Colin Patrick McCabe 209f268d6c
KAFKA-15860: ControllerRegistration must be written out to the metadata image (#14807)
The ControllerRegistration records added in KIP-919 should be written out to the metadata
image, not just the log.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2023-11-22 21:25:21 -08:00
Igor Soarez e90692246a
KAFKA-15362: Resolve offline replicas in metadata cache (#14737)
The metadata cache now considers registered log directories
and directory assignments when determining offline replicas.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>
2023-11-21 09:40:04 -08:00
Igor Soarez a03a71d7b5 KAFKA-15357: Aggregate and propagate assignments
A new AssignmentsManager accumulates, batches, and sends KIP-858
assignment events to the Controller. Assignments are sent via
AssignReplicasToDirs requests.

Move QuorumTestHarness.formatDirectories into TestUtils so it can be
used in other test contexts.

Fix a bug in ControllerRegistration.java where the wrong version of the
record was being generated in ControllerRegistration.toRecord.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>, Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>
2023-11-16 16:19:49 -08:00
Mickael Maison 832627fc78
MINOR: Various cleanups in metadata (#14734)
- Remove unused code, suppression
- Simplify/fix test assertions
- Javadoc cleanups

Reviewers: Josep Prat <josep.prat@aiven.io>
2023-11-14 09:25:09 +01:00
Proven Provenzano fa472d26a5 MINOR: Update BrokerRegistration to use a Builder
Update BrokerRegistration to use a Builder. This fixes the proliferation of different constructors,
and makes it clear what arguments are being used where.

Reviewers: Colin P. McCabe <cmccabe@confluent.io>
2023-11-09 10:08:31 -08:00
Colin Patrick McCabe 7060c08d6f
MINOR: Rewrite the meta.properties handling code in Java and fix some issues #14628 (#14628)
meta.properties files are used by Kafka to identify log directories within the filesystem.
Previously, the code for handling them was in BrokerMetadataCheckpoint.scala. This PR rewrites the
code for handling them as Java and moves it to the apache.kafka.metadata.properties namespace. It
also gets rid of the separate types for v0 and v1 meta.properties objects. Having separate types
wasn't so bad back when we had a strict rule that zk clusters used v0 and kraft clusters used v1.
But ZK migration has blurred the lines. Now, a zk cluster may have either v0 or v1, if it is
migrating, and a kraft cluster may have either v0 or v1, at any time.

The new code distinguishes between an individual meta.properties file, which is represented by
MetaProperties, and a collection of meta.properties files, which is represented by
MetaPropertiesEnsemble. It is useful to have this distinction, because in JBOD mode, even if some
log directories are inaccessible, we can still use the ensemble to extract needed information like
the cluster ID. (Of course, even when not in JBOD mode, KRaft servers have always been able to
configure a metadata log directory separate from the main log directory.)

Since we recently added a unique directory.id to each meta.properties file, the previous convention
of passing a "canonical" MetaProperties object for the cluster around to various places in the code
needs to be revisited. After all, we can no longer assume all of the meta.properties files are the
same. This PR fixes these parts of the code. For example, it fixes the constructors of
ControllerApis and RaftManager to just take a cluster ID, rather than a MetaProperties object. It
fixes some other parts of the code, like the constructor of SharedServer, to take a
MetaPropertiesEnsemble object.

Another goal of this PR was to centralize meta.properties validation a bit more and make it
unit-testable. For this purpose, the PR adds MetaPropertiesEnsemble.verify, and a few other
verification methods. These enforce invariants like "the metadata directory must be readable," and
so on.

Reviewers: Igor Soarez <soarez@apple.com>, David Arthur <mumrah@gmail.com>, Divij Vaidya <diviv@amazon.com>, Proven Provenzano <pprovenzano@confluent.io>
2023-11-09 09:32:35 -08:00