When reverting the ZK migration, we must also remove the /migration ZNode in order to allow the migration to be re-attempted in the future.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
All public interface changes should be briefly mentioned in the
upgrade guide.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Anna Sophie Blee-Goldman <sophie@responsive.dev>, Nick Telford <nick.telford@gmail.com>
This trivial PR makes clear when it's the right time to switch from AclAuthorizer to StandardAuthorizer during the migration process.
Reviewers: Luke Chen <showuon@gmail.com>
During ZK migrating to KRaft, before entering dual-write mode, the KRaft controller will send RPCs (i.e. UpdateMetadataRequest, LeaderAndIsrRequest, and StopReplicaRequest) to the brokers. Currently, we use the inter broker listener to send the RPC to brokers from the controller. But in the doc, we didn't provide this info to users because the normal KRaft controller won't use inter.broker.listener.names.
This PR adds the missing config in the ZK migrating to KRaft doc.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Paolo Patierno <ppatierno@live.com>
This PR fixes a couple of things related to the #15193 PR.
When you complete "Enter Migration Mode on the brokers", we are actually in "Enabling the migration on the brokers" referring to the migration guide and the broker doesn't really have node.id yet but still broker.id, so the PR removes a statement saying to replace the one with the other.
Also, during rollback it's not enough just deleting the /controller znode quickly after shutting down controllers because the controller election doesn't start yet until at least one broker is rolled back with the right configuration. Until the rolling and when controllers are down, the brokers just log something like this even if you deleted the znode "quickly":
[2024-01-30 09:27:52,394] DEBUG [zk-broker-0-to-controller-quorum-channel-manager]: Controller isn't cached, looking for local metadata changes (kafka.server.BrokerToControllerRequestThread)
[2024-01-30 09:27:52,394] INFO [zk-broker-0-to-controller-quorum-channel-manager]: Recorded new controller, from now on will use node localhost:9093 (id: 1 rack: null) (kafka.server.BrokerToControllerRequestThread)
You have to reduce the amount of time between deleting the znode and rolling at least one broker, so that an election can start.
Reviewers: Luke Chen <showuon@gmail.com>
This PR fixes some bugs in the KRaft migration documentation and reorganizes it to be easier to read. (Specifically, there were some steps that were previously out of order.)
In order to keep it all straight, the revert documentation is now in the form of a table which maps the latest migration state to the actions which the system administrator should perform.
Reviewers: Luke Chen <showuon@gmail.com>, David Arthur <mumrah@gmail.com>, Liu Zeyu <zeyu.luke@gmail.com>, Paolo Patierno <ppatierno@live.com>
- Only use https links
- Fix broken HTML tags
- Replace usage of <tt> which is deprecated with <code>
- Replace hardcoded version numbers
Reviewers: Chris Egerton <fearthecellos@gmail.com>, Greg Harris <gharris1727@gmail.com>
- Remove the outdated statement that delegation tokens aren't supported by KRaft.
- Add an invitation to report migration bugs on JIRA.
- Define terminology such as "zk migration phases".
- Mention MV can't be changed during migration.
- Explain how to revert to ZK mode.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, David Arthur <mumrah@gmail.com>
The task-level commit metrics were removed without deprecation in KIP-447 and the corresponding PR #8218. However, we forgot to update the docs and to remove the code that creates the task-level commit sensor.
This PR removes the task-level commit metrics from the docs and removes the code that creates the task-level commit sensor. The code was effectively dead since it was only used in tests.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <matthias@confluent.io>
Some users complained they don't have a way to determine if there is something wrong in the RSM plug-in they implemented, or there's something wrong in Kafka itself. Also, if there are users who just want to try the tiered storage feature out before implementing anything, it would be good we have an RSM implementation by default.
Per the discussion in the KIP, there will be no default RSM implementation in Kafka, but we can use the LocalTieredStorage implemented for integration test, to resolve the issues above.
Reviewers: Christo Lolov <lolovc@amazon.com>, Divij Vaidya <diviv@amazon.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Satish Duggana <satishd@apache.org>
I've added details for VerificationFailureRate and VerificationTimeMs.
I considered adding the documentation for the AddPartitionsToTxnVerification metrics, but I noticed that all the request metrics simply listed Produce|FetchConsumer|FetchFollower. If we don't already report the AddPartitionsToTxn request metrics in this file, it doesn't make sense to add the verification variant. (As well as all the other APIs we report)
Filed a followup jira if we want to redo that whole section.
Reviewers: Reviewers: Divij Vaidya <diviv@amazon.com>
In "ZooKeeper to KRaft Migration" documentation, we are still reporting 3.4 as metadata version. Reworking that phrase to make it more clear and avoid the need to update it in the future.
Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
Reviewers: Luke Chen <showuon@gmail.com>
Reviewers: Ismael Juma <ismael@confluent.io>, Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>, Matthias J. Sax <matthias@confluent.io>
Most of the contents in the README.md was already covered in the docs therefore only had to add the section for Exactly Once support.
Reviewers: Luke Chen <showuon@gmail.com>
Currently, the current-state KRaft related metric reports follower state for a broker while technically it should be reported as an observer as the kafka-metadata-quorum tool does.
Reviewers: Luke Chen <showuon@gmail.com>, dengziming <dengziming1993@gmail.com>
This is the PR for the implementation of KIP-847: https://cwiki.apache.org/confluence/display/KAFKA/KIP-847%3A+Add+ProducerIdCount+metrics
Add ProducerIdCount metric at the broker level:
kafka.server:type=ReplicaManager,name=ProducerIdCount
Added unit tests below to ensure the metric reported the count correctly.
---------
Co-authored-by: Artem Livshits <84364232+artemlivshits@users.noreply.github.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <christo_lolov@yahoo.com>, Alexandre Dupriez <alexandre.dupriez@gmail.com>, Justine Olshan <jolshan@confluent.io>
This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html).
Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>
Document process for recovering and formatting the metadata log directory for the KRaft controller.
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
Implementation of KIP-846: Source/sink node metrics for Consumed/Produced throughput in Streams
Adds the following INFO topic-level metrics for the total bytes/records consumed and produced:
bytes-consumed-total
records-consumed-total
bytes-produced-total
records-produced-total
Reviewers: Kvicii <Karonazaba@gmail.com>, Guozhang Wang <guozhang@apache.org>, Bruno Cadonna <cadonna@apache.org>
This PR adds some RocksDB metrics that could not be added in KIP-471 due to RocksDB version. The new metrics are extracted using Histogram data provided by RocksDB API, and the old ones were extracted using Tickers. The new metrics added are memtable-flush-time-(avg|min|max) and compaction-time-(avg|min|max).
Reviewer: Bruno Cadonnna <cadonna@apache.org>
KAFKA-13243: KIP-773 Differentiate metric latency measured in ms and ns
Implementation of KIP-773
Deprecates inconsistent metrics bufferpool-wait-time-total,
io-waittime-total, and iotime-total.
Introduces new metrics bufferpool-wait-time-ns-total,
io-wait-time-ns-total, and io-time-ns-total with the same semantics as
before.
Adds metrics (old and new) in ops.html.
Adds upgrade guide for these metrics.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Tom Bentley <tbentley@redhat.com>
KAFKA-12393: Document multi-tenancy considerations
Addressed review feedback by @dajac and @rajinisivaram
Ported from apache/kafka-site#334
Reviewers: Bill Bejeck <bbejeck@apache.org>
Add docs for KIP-663.
Reviewers: Jim Galasyn <jim.galasyn@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This adds a new user-facing documentation "Geo-replication (Cross-Cluster Data Mirroring)" section to the Kafka Operations documentation that covers MirrorMaker v2.
Was already merged to kafka-site via apache/kafka-site#324.
Reviewers: Bill Bejeck <bbejeck@apache.org>
Document the new properties-based metrics for RocksDB
Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
During the AK website upgrade, changes made to kafka-site weren't migrated back to kafka-docs.
This PR is an initial attempt at porting the changes to kafka/docs, but it does not include the streams changes. Those will come in a separate PR.
For the most part, the bulk of the changes in the PR are cosmetic. Only the introduction.html has substantial changes, but it's a direct port from the live documentation.
For testing:
I reviewed the PR diffs
Rendered the changes locally
Reviewers: Matthias J. Sax <mjsax@apache.org>
Java 11 has been recommended for a while, but the ops section had not been updated.
Also added `-XX:+ExplicitGCInvokesConcurrent` which has been in `kafka-run-class`
for a while. Finally, tweaked the text slightly to read better.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
1. fix typo: `atleast` -> `at least`
2. add missing `--` from `--bootstrap-servers` argument to be consistent
3. rephrase a sentence, to make it more clear:
before: `LinkedIn is currently running JDK 1.8 u5 (looking to upgrade to a newer version) with the G1 collector`
It will misguide the users to use JDK 1.8 u5, while the JDK 1.8 u251 is already released, which will include many important bug fixes. I did some rephrase as below:
after: `At the time this is written, LinkedIn is running JDK 1.8 u5 (looking to upgrade to a newer version) with the G1 collector`
Reviewers: Konstantine Karantasis <konstantine@confluent.io>
The href attribute missed the starting double quote, so the hyperlink is interpreted as https://docs.oracle.com/.../agent.html", with a redundant tailing double quote.
Add the missing starting double quote back to fix this issue.
Reviewers: Konstantine Karantasis <konstantine@confluent.io>
This patch modifies the loading time metric to account for time spent waiting for the loading time task to be scheduled.
Reviewers: Jason Gustafson <jason@confluent.io>
This PR is the counterpart of apache/kafka-site#253.
cc/ omkreddy
Author: Lee Dongjin <dongjin@apache.org>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8149 from dongjinleekr/feature/KAFKA-9586
Author: Bruno Cadonna <bruno@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#7490 from cadonna/AK8942-docs-rocksdb_metrics
Minor comments
This is the implementation for [KIP-503](https://cwiki.apache.org/confluence/display/KAFKA/KIP-503%3A+Add+metric+for+number+of+topics+marked+for+deletion)
When deleting a large number of topics, the Controller can get quite bogged down. One problem with this is the lack of visibility into the progress of the Controller. We can look into the ZK path for topics marked for deletion, but in a production environment this is inconvenient. This PR adds a JMX metric `kafka.controller:type=KafkaController,name=TopicsToDeleteCount` to make it easier to see how many topics are being deleted.
Reviewers: Stanislav Kozlovski <stanislav@confluent.io>, Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
[JIRA](https://issues.apache.org/jira/browse/KAFKA-6263)
- Add metrics to provide visibility for how long group metadata and transaction metadata take to load in order to understand some inactivity seen in the consumer groups
- Tests include mocking load times by creating a delay after each are loaded and ensuring the measured JMX metric is as it should be
Author: anatasiavela <anastasiavela@berkeley.edu>
Reviewers: Gwen Shapira, Jason Gustafson
Closes#7045 from anatasiavela/KAFKA-6263
this PR will fix a typo related to docs:
http://kafka.apache.org/21/documentation.html#rep-throttle
```bash
$ bin/kafka-reassign-partitions.sh --zookeeper myhost:2181--execute --reassignment-json-file bigger-cluster.json —throttle 50000000
```
I think `myhost:2181` should be `localhost:2181` and followed by a `space`
Author: opera443399 <pc@pcswo.com>
Reviewers: Gwen Shapira
Closes#6704 from opera443399/docs-ops-typo