KIP-853 adds support for dynamic KRaft quorums. This means that the quorum topology is
no longer statically determined by the controller.quorum.voters configuration. Instead, it
is contained in the storage directories of each controller and broker.
Users of dynamic quorums must format at least one controller storage directory with either
the --initial-controllers or --standalone flags. If they fail to do this, no quorum can be
established. This PR changes the storage tool to warn about the case where a KIP-853 flag has
not been supplied to format a KIP-853 controller. (Note that broker storage directories
can continue to be formatted without a KIP-853 flag.)
There are cases where we don't want to specify initial voters when formatting a controller. One
example is where we format a single controller with --standalone, and then dynamically add 4
more controllers with no initial topology. In this case, we want the 4 later controllers to grab
the quorum topology from the initial one. To support this case, this PR adds the
--no-initial-controllers flag.
Reviewers: José Armando García Sancio <jsancio@apache.org>, Federico Valeri <fvaleri@redhat.com>
Change the configurations under config/kraft to use controller.quorum.bootstrap.servers instead of controller.quorum.voters. Add comments explaining how to use the older static quorum configuration where appropriate.
In docs/ops.html, remove the reference to "tentative timelines for ZooKeeper removal" and "Tiered storage is considered as an early access feature" since they are no longer up-to-date. Add KIP-853 information.
In docs/quickstart.html, move the ZK instructions to be after the KRaft instructions. Update the KRaft instructions to use KIP-853.
In docs/security.html, add an explanation of --bootstrap-controller and document controller.quorum.bootstrap.servers instead of controller.quorum.voters.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Alyssa Huang <ahuang@confluent.io>, Colin P. McCabe <cmccabe@apache.org>
When reverting the ZK migration, we must also remove the /migration ZNode in order to allow the migration to be re-attempted in the future.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
All public interface changes should be briefly mentioned in the
upgrade guide.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Anna Sophie Blee-Goldman <sophie@responsive.dev>, Nick Telford <nick.telford@gmail.com>
This trivial PR makes clear when it's the right time to switch from AclAuthorizer to StandardAuthorizer during the migration process.
Reviewers: Luke Chen <showuon@gmail.com>
During ZK migrating to KRaft, before entering dual-write mode, the KRaft controller will send RPCs (i.e. UpdateMetadataRequest, LeaderAndIsrRequest, and StopReplicaRequest) to the brokers. Currently, we use the inter broker listener to send the RPC to brokers from the controller. But in the doc, we didn't provide this info to users because the normal KRaft controller won't use inter.broker.listener.names.
This PR adds the missing config in the ZK migrating to KRaft doc.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Paolo Patierno <ppatierno@live.com>
This PR fixes a couple of things related to the #15193 PR.
When you complete "Enter Migration Mode on the brokers", we are actually in "Enabling the migration on the brokers" referring to the migration guide and the broker doesn't really have node.id yet but still broker.id, so the PR removes a statement saying to replace the one with the other.
Also, during rollback it's not enough just deleting the /controller znode quickly after shutting down controllers because the controller election doesn't start yet until at least one broker is rolled back with the right configuration. Until the rolling and when controllers are down, the brokers just log something like this even if you deleted the znode "quickly":
[2024-01-30 09:27:52,394] DEBUG [zk-broker-0-to-controller-quorum-channel-manager]: Controller isn't cached, looking for local metadata changes (kafka.server.BrokerToControllerRequestThread)
[2024-01-30 09:27:52,394] INFO [zk-broker-0-to-controller-quorum-channel-manager]: Recorded new controller, from now on will use node localhost:9093 (id: 1 rack: null) (kafka.server.BrokerToControllerRequestThread)
You have to reduce the amount of time between deleting the znode and rolling at least one broker, so that an election can start.
Reviewers: Luke Chen <showuon@gmail.com>
This PR fixes some bugs in the KRaft migration documentation and reorganizes it to be easier to read. (Specifically, there were some steps that were previously out of order.)
In order to keep it all straight, the revert documentation is now in the form of a table which maps the latest migration state to the actions which the system administrator should perform.
Reviewers: Luke Chen <showuon@gmail.com>, David Arthur <mumrah@gmail.com>, Liu Zeyu <zeyu.luke@gmail.com>, Paolo Patierno <ppatierno@live.com>
- Only use https links
- Fix broken HTML tags
- Replace usage of <tt> which is deprecated with <code>
- Replace hardcoded version numbers
Reviewers: Chris Egerton <fearthecellos@gmail.com>, Greg Harris <gharris1727@gmail.com>
- Remove the outdated statement that delegation tokens aren't supported by KRaft.
- Add an invitation to report migration bugs on JIRA.
- Define terminology such as "zk migration phases".
- Mention MV can't be changed during migration.
- Explain how to revert to ZK mode.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, David Arthur <mumrah@gmail.com>
The task-level commit metrics were removed without deprecation in KIP-447 and the corresponding PR #8218. However, we forgot to update the docs and to remove the code that creates the task-level commit sensor.
This PR removes the task-level commit metrics from the docs and removes the code that creates the task-level commit sensor. The code was effectively dead since it was only used in tests.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Some users complained they don't have a way to determine if there is something wrong in the RSM plug-in they implemented, or there's something wrong in Kafka itself. Also, if there are users who just want to try the tiered storage feature out before implementing anything, it would be good we have an RSM implementation by default.
Per the discussion in the KIP, there will be no default RSM implementation in Kafka, but we can use the LocalTieredStorage implemented for integration test, to resolve the issues above.
Reviewers: Christo Lolov <lolovc@amazon.com>, Divij Vaidya <diviv@amazon.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Satish Duggana <satishd@apache.org>
I've added details for VerificationFailureRate and VerificationTimeMs.
I considered adding the documentation for the AddPartitionsToTxnVerification metrics, but I noticed that all the request metrics simply listed Produce|FetchConsumer|FetchFollower. If we don't already report the AddPartitionsToTxn request metrics in this file, it doesn't make sense to add the verification variant. (As well as all the other APIs we report)
Filed a followup jira if we want to redo that whole section.
Reviewers: Reviewers: Divij Vaidya <diviv@amazon.com>
In "ZooKeeper to KRaft Migration" documentation, we are still reporting 3.4 as metadata version. Reworking that phrase to make it more clear and avoid the need to update it in the future.
Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
Reviewers: Luke Chen <showuon@gmail.com>
Reviewers: Ismael Juma <ismael@confluent.io>, Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>, Matthias J. Sax <matthias@confluent.io>