kafka

Commit Graph

Author	SHA1	Message	Date
Colin P. McCabe	5a1aa1a670	MINOR: Standardize controller log4j output for replaying records Standardize controller log4j output for replaying important records. The log message should include word "replayed" to make it clear that this is a record replay. Log the replay of records for ACLs, client quotas, and producer IDs, which were previously not logged. Also fix a case where we weren't logging changes to broker registrations. AclControlManager, ClientQuotaControlManager, and ProducerIdControlManager didn't previously have a log4j logger object, so this PR adds one. It also converts them to using Builder objects. This makes junit tests more readable because we don't need to specify paramaters where the test can use the default (like LogContexts). Throw an exception in replay if we get another TopicRecord for a topic which already exists.	2023-07-13 10:18:34 -07:00
Ron Dagostino	edd64fa251	MINOR: more KRaft Metadata Image tests (#13724 ) Adds additional testing for the various KRaft *Image classes. For every image that we create we already test that we can get there by applying all the records corresponding to that image written out as a list of records. This patch adds more tests to confirm that we can get to each such final image with intermediate stops at all possible intermediate images along the way. Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>	2023-07-10 10:01:10 -04:00
David Arthur	726d277c0a	MINOR: Move some things around in KRaftMigrationDriver (#13978 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-07-10 09:05:46 -04:00
andymg3	1223b79973	KAFKA-15149: Fix handling of new partitions in dual-write mode (#13968 ) Fixes a bug where we don't send UMR and LISR requests in dual-write mode when new partitions are created. Prior to this patch, KRaftMigrationZkWriter was mutating the internal data-structures of TopicDelta which prevented MigrationPropagator from sending UMR and LISR for the changed partitions. Reviewers: David Arthur <mumrah@gmail.com>	2023-07-07 10:16:51 -04:00
David Arthur	fc7d912e8b	KAFKA-15109 Ensure the leader epoch bump occurs for older MetadataVersions (#13910 ) This fixes a regression introduced by the previous KAFKA-15109 commit (`d0457f7360` on trunk). Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@apache.org>	2023-06-27 11:49:20 -04:00
David Arthur	1bf7039999	KAFKA-15098 Allow authorizers to be configured in ZK migration (#13895 ) Reviewers: Ron Dagostino <rdagostino@confluent.io>	2023-06-22 09:34:49 -04:00
David Arthur	d0457f7360	KAFKA-15109 Don't skip leader epoch bump while in migration mode (#13890 ) While in migration mode, the KRaft controller must always bump the leader epoch when shrinking an ISR. This is required to maintain compatibility with the ZK brokers. Without the epoch bump, the ZK brokers will ignore the partition state change present in the LeaderAndIsrRequest since it would not contain a new leader epoch. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-06-21 13:09:05 -04:00
minjian.cai	ba5e1acdfb	MINOR: fix typos for metadata (#13889 ) Reviewers: Divij Vaidya <diviv@amazon.com>, Deqi Hu <deqi.hu@shopee.com>	2023-06-21 15:09:15 +02:00
Colin P. McCabe	cd3c0ab1a3	KAFKA-15060: fix the ApiVersionManager interface This PR expands the scope of ApiVersionManager a bit to include returning the current MetadataVersion and features that are in effect. This is useful in general because that information needs to be returned in an ApiVersionsResponse. It also allows us to fix the ApiVersionManager interface so that all subclasses implement all methods of the interface. Having subclasses that don't implement some methods is dangerous because they could cause exceptions at runtime in unexpected scenarios. On the KRaft controller, we were previously performing a read operation in the QuorumController thread to get the current metadata version and features. With this PR, we now read a volatile variable maintained by a separate MetadataVersionContextPublisher object. This will improve performance and simplify the code. It should not change the guarantees we are providing; in both the old and new scenarios, we need to be robust against version skew scenarios during updates. Add a Features class which just has a 3-tuple of metadata version, features, and feature epoch. Remove MetadataCache.FinalizedFeaturesAndEpoch, since it just duplicates the Features class. (There are some additional feature-related classes that can be consolidated in in a follow-on PR.) Create a java class, EndpointReadyFutures, for managing the futures associated with individual authorizer endpoints. This avoids code duplication between ControllerServer and BrokerServer and makes this code unit-testable. Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>	2023-06-19 16:46:44 -07:00
Luke Chen	d3e0b27b24	KAFKA-15040: trigger onLeadershipChange under KRaft mode (#13807 ) When received LeaderAndIsr request, we'll notify remoteLogManager about this leadership changed to trigger the following workflow. But LeaderAndIsr won't be sent in KRaft mode, instead, the topicDelta will be received. This PR fixes this issue by getting leader change and follower change from topicDelta, and triggering rlm.onLeadershipChange to notify remote log manager. Adding tests for remote storage enabled cases. Reviewers: Satish Duggana <satishd@apache.org>	2023-06-09 09:53:46 +08:00
José Armando García Sancio	8ad0ed3e61	KAFKA-15021; Skip leader epoch bump on ISR shrink (#13765 ) When the KRaft controller removes a replica from the ISR because of the controlled shutdown there is no need for the leader epoch to be increased by the KRaft controller. This is accurate as long as the topic partition leader doesn't add the removed replica back to the ISR. This change also fixes a bug when computing the HWM. When computing the HWM, replicas that are not eligible to join the ISR but are caught up should not be included in the computation. Otherwise, the HWM will never increase for replica.lag.time.max.ms because the shutting down replica is not sending FETCH request. Without this additional fix PRODUCE requests would timeout if the request timeout is greater than replica.lag.time.max.ms. Because of the bug above the KRaft controller needs to check the MV to guarantee that all brokers support this bug fix before skipping the leader epoch bump. Reviewers: David Mao <47232755+splett2@users.noreply.github.com>, Divij Vaidya <diviv@amazon.com>, David Jacot <djacot@confluent.io>	2023-06-07 07:20:40 -07:00
andymg3	db9d845702	KAFKA-14791; Create a builder for PartitionRegistration (#13788 ) This creates a builder for PartitionRegistration. The motivation for the builder is that the constructor of PartitionRegistration has four arguments all of type int[] which makes it easy to make a mistake when using it. Reviewers: José Armando García Sancio <jsancio@apache.org>	2023-06-06 07:58:23 -07:00
Dimitar Dimitrov	0d5cf4c385	KAFKA-15052 Fix the flaky QuorumControllerTest.testBalancePartitionLeaders (#13804 ) In this test broker session timeout is configured aggressively low (to 1 second) so that fencing can happen without much waiting. Then in the final portion of the test when brokers should not be fenced heartbeats are sent roughly 2 times in a session timeout window. However the first time that's done there's other code between sending the heartbeat and taking the timestamp, and in local tests that code can take up to 0.5 seconds (1/2 of the session timeout). That then can result in all brokers being fenced again which would fail the test. This change sends a heartbeat just when a timestamp is taken, which in local tests results flaky failures from 4 out of 50 to 0 out of 50. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-06-04 11:02:27 -07:00
Colin Patrick McCabe	146a6976ae	KAFKA-15048: Improve handling of unexpected quorum controller errors (#13799 ) When the active quorum controller encounters an "unexpected" error, such as a NullPointerException, it currently resigns its leadership. This PR fixes it so that in addition to doing that, it also increments the metadata error count metric. This will allow us to better track down these errors. This PR also fixes a minor bug where performing read operations on a standby controller would result in an unexpected RuntimeException. The bug happened because the standby controller does not take in-memory snapshots, and read operations were attempting to read from the epoch of the latest committed offset. The fix is for the standby controller to simply read the latest value of each data structure. This is always safe, because standby controllers don't contain uncommitted data. Also, fix a bug where listPartitionReassignments was reading the latest data, rather than data from the last committed offset. Reviewers: dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>	2023-06-02 12:51:15 -07:00
David Arthur	f499662923	KAFKA-15003: Fix ZK sync logic for partition assignments (#13735 ) Fixed the metadata change events in the Migration component to check correctly for the diff in existing topic changes and replicate the metadata to the Zookeeper. Also, made the diff check exhaustive enough to handle the partial writes in Zookeeper when we're try to replicate changes using a snapshot in the event of Controller failover. Add migration client and integration tests to verify the change. Co-authored-by: Akhilesh Chaganti <akhileshchg@users.noreply.github.com>	2023-06-01 15:43:41 -07:00
David Arthur	d27ba5bfba	KAFKA-15010 ZK migration failover support (#13758 ) This patch adds snapshot reconciliation during ZK to KRaft migration. This reconciliation happens whenever a snapshot is loaded by KRaft, or during a controller failover. Prior to this patch, it was possible to miss metadata updates coming from KRaft when dual-writing to ZK. Internally this adds a new state SYNC_KRAFT_TO_ZK to the KRaftMigrationDriver state machine. The controller passes through this state after the initial ZK migration and each time a controller becomes active. Logging during dual-write was enhanced to include a count of write operations happening. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-06-01 10:25:46 -04:00
Ron Dagostino	e74e5e7ac5	KAFKA-15039: Reduce logging level to trace in PartitionChangeBuilder.… (#13780 ) …tryElection() A CPU profile in a large cluster showed PartitionChangeBuilder.tryElection() taking significant CPU due to logging. We adjust the logging statements in that method for clean elections from DEBUG level to TRACE to mitigate the impact of this logging under normal operations. Unclean elections are now logged at the INFO level rather than DEBUG. Reviewers: Jason Gustafson <jason@confluent.io>, Colin P. McCabe <cmccabe@apache.org>	2023-05-31 16:26:01 -04:00
Proven Provenzano	731c8c967e	KAFKA-15017 Fix snapshot load in dual write mode for ClientQuotas and SCRAM (#13757 ) This patch fixes the case where a ClientQuota or SCRAM credential was added in KRaft, but not written back to ZK. This missed write only occurred when handling a KRaft snapshot. If the changed quota was processed in a metadata delta (which is the typical case), it would be written to ZK. Reviewers: David Arthur <mumrah@gmail.com>	2023-05-31 15:42:00 -04:00
Colin Patrick McCabe	9b3db6d50a	KAFKA-15019: Improve handling of broker heartbeat timeouts (#13759 ) When the active KRaft controller is overloaded, it will not be able to process broker heartbeat requests. Instead, they will be timed out. When using the default configuration, this will happen if the time needed to process a broker heartbeat climbs above a second for a sustained period. This, in turn, could lead to brokers being improperly fenced when they are still alive. With this PR, timed out heartbeats will still update the lastContactNs and metadataOffset of the broker in the BrokerHeartbeatManager. While we don't generate any records, this should still be adequate to prevent spurious fencing. We also log a message at ERROR level so that this condition will be more obvious. Other small changes in this PR: fix grammar issue in log4j of BrokerHeartbeatManager. Add JavaDoc for ClusterControlManager#zkMigrationEnabled field. Add builder for ReplicationControlTestContext to avoid having tons of constructors. Update ClusterControlManager.DEFAULT_SESSION_TIMEOUT_NS to match the default in KafkaConfig. Reviewers: Ismael Juma <ijuma@apache.org>, Ron Dagostino <rdagostino@confluent.io>	2023-05-31 10:49:05 -07:00
David Arthur	7a679af687	KAFKA-15004: Fix configuration dual-write during migration (#13767 ) This patch fixes several small bugs with configuration dual-write during migration. * Topic configs are not written back to ZK while handling snapshot. * New broker/topic configs in KRaft that did not exist in ZK will not be written to ZK. * The sensitive configs are not encoded while writing them to Zookeeper. * Handle topic configs in ConfigMigrationClient and KRaftMigrationZkWriter#handleConfigsSnapshot Added tests to ensure we no longer have the above mentioned issues. Co-authored-by: Akhilesh Chaganti <akhileshchg@users.noreply.github.com> Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-05-27 17:20:44 -04:00
Colin Patrick McCabe	b74204fa0a	KAFKA-14996: Handle overly large user operations on the kcontroller (#13742 ) Previously, if a user tried to perform an overly large batch operation on the KRaft controller (such as creating a million topics), we would create a very large number of records in memory. Our attempt to write these records to the Raft layer would fail, because there were too many to fit in an atomic batch. This failure, in turn, would trigger a controller failover. (Note: I am assuming here that no topic creation policy was in place that would prevent the creation of a million topics. I am also assuming that the user operation must be done atomically, which is true for all current user operations, since we have not implemented KIP-868 yet.) With this PR, we fail immediately when the number of records we have generated exceeds the threshold that we can apply. This failure does not generate a controller failover. We also now fail with a PolicyViolationException rather than an UnknownServerException. In order to implement this in a simple way, this PR adds the BoundedList class, which wraps any list and adds a maximum length. Attempts to grow the list beyond this length cause an exception to be thrown. Reviewers: David Arthur <mumrah@gmail.com>, Ismael Juma <ijuma@apache.org>, Divij Vaidya <diviv@amazon.com>	2023-05-26 13:16:17 -07:00
Manyanda Chitimbo	a27c98ca61	MINOR: remove unused variable from QuorumMetaLogListener#handleCommit method (#13611 ) The local variable processedRecordsSize as just left over from another commit and can be safely removed. Reviewers: Divij Vaidya <diviv@amazon.com> , José Armando García Sancio <jsancio@apache.org>	2023-05-26 08:21:40 -07:00
Proven Provenzano	79351ec88e	KAFKA-14970: Fix SCRAM during migration dual-write (#13729 ) Fixed a bug during dual write mode where if a user is updating SCRAM records and has no quotas, the SCRAM records will not be written to ZK. Add tests explicitly for this scenario. Reviewers: David Arthur <mumrah@gmail.com>	2023-05-24 17:01:39 -04:00
Colin P. McCabe	12130cfcec	MINOR: Create the MetadataNode classes to introspect MetadataImage Metadata image classes such as MetadataImage, ClusterImage, FeaturesImage, and so forth contain numerous sub-images. This PR adds a structured way of traversing those sub-images. This is useful for the metadata shell, and also for implementing toString functions. In both cases, the previous solution was suboptimal. The metadata shell was previously implemented in an ad-hoc way by mutating text-based tree nodes when records were replayed. This was difficult to keep in sync with changes to the record types (for example, we forgot to do this for SCRAM). It was also pretty low-level, being done at a level below that of the image classes. For toString, it was difficult to keep the implementations consistent previously, and also support both redacted and non-redacted output. The metadata shell directory was getting crowded since we never had submodules for it. This PR creates glob/, command/, node/, and state/ directories to keep things better organized. Reviewers: David Arthur <mumrah@gmail.com>, Ron Dagostino <rdagostino@confluent.io>	2023-05-23 10:11:26 -07:00
Akhilesh C	ea6ce3bf82	KAFKA-15009: Handle new ACLs in KRaft snapshot during migration (#13741 ) When loading a snapshot during dual-write mode, we were missing the logic to detect new ACLs that had been added on the KRaft side. This patch adds support for finding those new ACLs as well as tests to verify the correct behavior. Reviewers: David Arthur <mumrah@gmail.com>	2023-05-23 10:43:02 -04:00
Akhilesh C	6b95581867	KAFKA-15007: Use the correct MetadataVersion in MetadataPropagator (#13732 ) Use the MetadataVersion from the MetadataImage passed to MetadataPropagator. The ensures the propagator sends the right versions of UMR, LISR and StopReplica requests when the migration is in DUAL_WRITE mode. Reviewers: David Arthur <mumrah@gmail.com>	2023-05-22 14:46:53 -04:00
David Mao	d944ef1efb	MINOR: Rename handleSnapshot to handleLoadSnapshot (#13727 ) Rename handleSnapshot to handleLoadSnapshot to make it explicit that it is handling snapshot load, not generation. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>	2023-05-17 09:57:24 -07:00
Divij Vaidya	bb10ae4273	KAFKA-14962: Trim whitespace from ACL configuration (#13670 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Christo Lolov <lolovc@amazon.com>	2023-05-12 23:51:00 +05:30
hudeqi	440bed2391	MINOR:code optimization in QuorumController (#13697 ) 1. add hint in switch item "BROKER_LOGGER" in ConfigResourceExistenceChecker, otherwise, it will be classified as default break and deleted directly. I don’t know if adding hint is better than deleting directly. 2. delete some unused variables and methods. 3. add the "@test" mark to a method in unit test that is forgotten. Reviewers: dengziming <dengziming1993@gmail.com>	2023-05-12 14:03:17 +08:00
dengziming	a7c9842f70	KAFKA-14291: KRaft controller should return right finalized features in ApiVersionResponse (#13679 ) The KRaft controller return empty finalized features in `ApiVersionResponse`, the brokers are not infected by this, so this problem doesn't have any impact currently, but it's worth fixing it to avoid unexpected problems. And there is a bunch of of confusing methods in `ApiVersionResponse` which are only used in test code, I moved them to TestUtils to make the code more clear, and force everyone to pass in the correct parameters instead of the default zero parameters, for example, empty supported features and empty finalized features. Reviewers: Luke Chen <showuon@gmail.com>	2023-05-12 13:46:06 +08:00
David Arthur	0822ce0ed1	KAFKA-14840: Support for snapshots during ZK migration (#13461 ) This patch adds support for handling metadata snapshots while in dual-write mode. Prior to this change, if the active controller loaded a snapshot, it would get out of sync with the ZK state. In order to reconcile the snapshot state with ZK, several methods were added to scan through the metadata in ZK to compute differences with the MetadataImage. Since this introduced a lot of code, I opted to split out a lot of methods from ZkMigrationClient into their own client interfaces, such as TopicMigrationClient, ConfigMigrationClient, and AclMigrationClient. Each of these has some iterator method that lets the caller examine the ZK state in a single pass and without using too much memory. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Luke Chen <showuon@gmail.com>	2023-05-05 01:35:26 -07:00
Colin P. McCabe	97c36f3f31	HOTFIX: fix file deletions left out of MINOR: improve QuorumController logging #13540	2023-05-04 12:20:33 -07:00
Colin P. McCabe	63f9f23ec0	MINOR: improve QuorumController logging #13540 When creating the QuorumController, log whether ZK migration is enabled. When applying a feature level record which sets the metadata version, log the metadata version enum rather than the numeric feature level. Improve the logging when we replay snapshots in QuorumController. Log both the beginning and the end of replay. When TRACE is enabled, log every record that is replayed in QuorumController. Since some records may contain sensitive information, create RecordRedactor to assist in logging only what is safe to put in the log4j file. Add logging to ControllerPurgatory. Successful completions are logged at DEBUG; failures are logged at INFO, and additions are logged at TRACE. Remove SnapshotReason.java, SnapshotReasonTest.java, and QuorumController#generateSnapshotScheduled. They are deadcode now that snapshot generation moved to org.apache.kafka.image.publisher.SnapshotGenerator. Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>	2023-05-04 11:18:03 -07:00
Proven Provenzano	e29942347a	KAFKA-14859: SCRAM ZK to KRaft migration with dual write (#13628 ) Handle migrating SCRAM records in ZK when migrating from ZK to KRaft. This includes handling writing back SCRAM records to ZK while in dual write mode where metadata updates are written to both the KRaft metadata log and to ZK. This allows for rollback of migration to include SCRAM metadata changes. Reviewers: David Arthur <mumrah@gmail.com>	2023-05-01 09:56:04 -04:00
Luke Chen	d796480fe8	KAFKA-14909: check zkMigrationReady tag before migration (#13631 ) 1. add ZkMigrationReady in apiVersionsResponse 2. check all nodes if ZkMigrationReady are ready before moving to next migration state Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>	2023-04-28 14:35:12 +08:00
Colin Patrick McCabe	c708f7ba5f	MINOR: remove spurious call to fatalFaultHandler (#13651 ) Remove a spurious call to fatalFaultHandler accidentally introduced by KAFKA-14805. We should only invoke the fatal fault handller if we are unable to generate the activation records. If we are unable to write the activation records, a controller failover should be sufficient to remedy the situation. Co-authored-by: Luke Chen showuon@gmail.com Reviewers: Luke Chen <showuon@gmail.com>, David Arthur <mumrah@gmail.com>	2023-04-28 10:15:26 +08:00
Colin P. McCabe	7049333617	KAFKA-14943: Fix ClientQuotaControlManager validation Don't allow setting negative or zero values for quotas. Don't allow SCRAM mechanism names to be used as client quota names. SCRAM mechanisms are not client quotas. (The confusion arose because of internal ZK representation details that treated them both as "client configs.") Add unit tests for ClientQuotaControlManager.isValidIpEntity and ClientQuotaControlManager.configKeysForEntityType. This change doesn't affect metadata record application, only input validation. If there are bad client quotas that are set currently, this change will not alter the current behavior (of throwing an exception and ignoring the bad quota).	2023-04-27 10:42:32 -07:00
David Arthur	c1b5c75d92	KAFKA-14805 KRaft controller supports pre-migration mode (#13407 ) This patch adds the concept of pre-migration mode to the KRaft controller. While in this mode, the controller will only allow certain write operations. The purpose of this is to disallow metadata changes when the controller is waiting for the ZK migration records to be committed. The following ControllerWriteEvent operations are permitted in pre-migration mode * completeActivation * maybeFenceReplicas * writeNoOpRecord * processBrokerHeartbeat * registerBroker (only for migrating ZK brokers) * unregisterBroker Raft events and other controller events do not follow the same code path as ControllerWriteEvent, so they are not affected by this new behavior. This patch also add a new metric as defined in KIP-868: kafka.controller:type=KafkaController,name=ZkMigrationState In order to support upgrades from 3.4.0, this patch also redefines the enum value of value 1 to mean MIGRATION rather than PRE_MIGRATION. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2023-04-26 10:20:30 -04:00
Manyanda Chitimbo	dd63d88ac3	MINOR: fix noticed typo in raft and metadata projects (#13612 ) Reviewers: Josep Prat <jlprat@apache.org>	2023-04-21 15:02:06 +02:00
David Jacot	2d0b816150	MINOR: Move `ControllerPurgatory` to `server-common` (#13555 ) This patch renames from `ControllerPurgatory` to `DeferredEventQueue` and moves it from the `metadata` module to `server-common` module. Reviewers: Alexandre Dupriez <alexandre.dupriez@gmail.com>, Ziming Deng <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@apache.org>	2023-04-21 11:19:04 +02:00
Purshotam Chauhan	df13775254	KAFKA-14828: Remove R/W locks using persistent data structures (#13437 ) Currently, StandardAuthorizer uses a R/W lock for maintaining the consistency of data. For the clusters with very high traffic, we will typically see an increase in latencies whenever a write operation comes. The intent of this PR is to get rid of the R/W lock with the help of immutable or persistent collections. Basically, new object references are used to hold the intermediate state of the write operation. After the completion of the operation, the main reference to the cache is changed to point to the new object. Also, for the read operation, the code is changed such that all accesses to the cache for a single read operation are done to a particular cache object only. In the PR description, you can find the performance of various libraries at the time of both read and write. Read performance is checked with the existing AuthorizerBenchmark. For write performance, a new AuthorizerUpdateBenchmark has been added which evaluates the performance of the addAcl operation. Reviewers: Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Divij Vaidya <diviv@amazon.com>	2023-04-21 14:08:23 +05:30
Proven Provenzano	abca86511e	KAFKA-14881: Rework UserScramCredentialRecord (#13513 ) Rework UserScramCredentialRecord to store serverKey and StoredKey rather than saltedPassword. This is necessary to support migration from ZK, since those are the fields we stored in ZK. Update latest MetadataVersion to IBP_3_5_IV2 and make SCRAM support conditional on this version. Moved ScramCredentialData.java from org.apache.kafka.image to org.apache.kafka.metadata, which seems more appropriate. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-04-18 09:41:38 -07:00
Manyanda Chitimbo	b36a170aa3	MINOR: fix typos in MigrationClient, StandardAuthorizer, StandardAuthorizerData and KafkaConfigSchema files (#13593 ) Reviewers: Luke Chen <showuon@gmail.com>	2023-04-18 19:36:56 +08:00
Ron Dagostino	e27926f92b	KAFKA-14735: Improve KRaft metadata image change performance at high … (#13280 ) topic counts. Introduces the use of persistent data structures in the KRaft metadata image to avoid copying the entire TopicsImage upon every change. Performance that was O(<number of topics in the cluster>) is now O(<number of topics changing>), which has dramatic time and GC improvements for the most common topic-related metadata events. We abstract away the chosen underlying persistent collection library via ImmutableMap<> and ImmutableSet<> interfaces and static factory methods. Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>, Purshotam Chauhan <pchauhan@confluent.io>	2023-04-17 17:52:28 -04:00
andymg3	c4ad09e47d	MINOR: Add more KRaft reassignment tests (#13521 ) Although KAFKA-14808 did not affect KRaft mode, it is important to ensure that we have regression tests in KRaft mode to prevent a similar bug from appearing there in the future. This PR adds two tests. First, it adds a test that makes sure we handle what happens when a reassignment completes and none of the new replicas can be made leader. It's important that we dont keep an old replica as leader. Second, it adds a test that makes sure we handle new reassignments that don't include a previous assignment replica that was leader. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-04-12 12:00:35 -07:00
Colin Patrick McCabe	f1f35ef1a8	KAFKA-14894: MetadataLoader must call finishSnapshot after loading a snapshot (#13541 ) The MetadataLoader must call finishSnapshot after loading a snapshot. This function removes whatever was in the old snapshot that is not in the new snapshot that was just loaded. While this is not significant when the old snapshot was the empty snapshot, it is important to do when we are loading a snapshot on top of an existing non-empty image. In initializeNewPublishers, the newly installed publishers should be given a MetadataDelta based on MetadataImage.EMPTY, reflecting the fact that they are seeing everything for the first time. Reviewers: David Arthur <mumrah@gmail.com>	2023-04-11 15:02:33 -07:00
José Armando García Sancio	672dd3ab6a	KAFKA-13020; Implement reading Snapshot log append timestamp (#13345 ) The SnapshotReader exposes the "last contained log time". This is mainly used during snapshot cleanup. The previous implementation used the append time of the snapshot record. This is not accurate as this is the time when the snapshot was created and not the log append time of the last record included in the snapshot. The log append time of the last record included in the snapshot is store in the header control record of the snapshot. The header control record is the first record of the snapshot. To be able to read this record, this change extends the RecordsIterator to decode and expose the control records in the Records type. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>	2023-04-07 09:25:54 -07:00
Calvin Liu	8c88cdb718	KAFKA-14617: Update AlterPartitionRequest and enable Kraft controller to reject stale request. (#13408 ) Second part of the [KIP-903](https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR), it updates the AlterPartitionRequest: - Deprecate the NewIsr field - Create a new field BrokerState with BrokerId and BrokerEpoch - Bump the AlterPartition version to 3 With this change, the Quorum Controller is enabled to reject stale AlterPartition request. Reviewers: Jun Rao <junrao@gmail.com>, David Jacot <djacot@confluent.io>	2023-03-31 11:27:42 +02:00
andymg3	887d05559f	MINOR: Create only one FeatureControlManager instance in ReplicationControlManagerTest (#13468 ) This is a small patch to make it so we only create one FeatureControlManager instance in ReplicationControlManagerTest. Currently we create two, which isn't needed. Its also a bit confusing because the ReplicationControlTestContext objects ends up having a different FeatureControlManager reference that the one its own ReplicationControlManager instance has a reference to. Reviewers: José Armando García Sancio <jsancio@apache.org>, dengziming <dengziming1993@gmail.com>	2023-03-29 19:10:03 -07:00
Colin Patrick McCabe	09e59bc776	KAFKA-14857: Fix some MetadataLoader bugs (#13462 ) The MetadataLoader is not supposed to publish metadata updates until we have loaded up to the high water mark. Previously, this logic was broken, and we published updates immediately. This PR fixes that and adds a junit test. Another issue is that the MetadataLoader previously assumed that we would periodically get callbacks from the Raft layer even if nothing had happened. We relied on this to install new publishers in a timely fashion, for example. However, in older MetadataVersions that don't include NoOpRecord, this is not a safe assumption. Aside from the above changes, also fix a deadlock in SnapshotGeneratorTest, fix the log prefix for BrokerLifecycleManager, and remove metadata publishers on brokerserver shutdown (like we do for controllers). Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>	2023-03-29 12:30:12 -07:00

1 2 3 4 5 ...

255 Commits