kafka

Commit Graph

Author	SHA1	Message	Date
Calvin Liu	ec49a60e4f	KAFKA-16540: enforce min.insync.replicas config invariants for ELR (#17952 ) If ELR is enabled, we need to set a cluster-level min.insync.replicas, and remove all broker-level overrides. The reason for this is that if brokers disagree about which partitions are under min ISR, it breaks the KIP-966 replication invariants. In order to enforce this, when the eligible.leader.replicas.version feature is turned on, we automatically remove all broker-level min.insync.replicas overrides, and create the required cluster-level override if needed. Similarly, if the cluster was created with eligible.leader.replicas.version enabled, we create a similar cluster-level record. In both cases, we don't allow setting overrides for individual brokers afterwards, or removing the cluster-level override. Split ActivationRecordsGeneratorTest up into multiple test cases rather than having it be one giant test case. Fix a bug in QuorumControllerTestEnv where we would replay records manually on objects, racing with the active controller thread. Instead, we should simply ensure that the initial bootstrap records contains what we want. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2025-01-08 13:42:25 -08:00
mingdaoy	c40cc5740f	KAFKA-18408 tweak the 'tag' field for BrokerHeartbeatRequest.json, BrokerRegistrationChangeRecord.json and RegisterBrokerRecord.json (#18421 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-01-08 04:16:59 +08:00
David Arthur	c4840f5e93	KAFKA-16446: Improve controller event duration logging (#15622 ) There are times when the controller has a high event processing time, such as during startup, or when creating a topic with many partitions. We can see these processing times in the p99 metric (kafka.controller:type=ControllerEventManager,name=EventQueueProcessingTimeMs), however it's difficult to see exactly which event is causing high processing time. With DEBUG logs, we see every event along with its processing time. Even with this, it's a bit tedious to find the event with a high processing time. This PR logs all events which take longer than 2 seconds at ERROR level. This will help identify events that are taking far too long, and which could be disruptive to the operation of the controller. The slow event logging looks like this: ``` [2024-12-20 15:03:39,754] ERROR [QuorumController id=1] Exceptionally slow controller event createTopics took 5240 ms. (org.apache.kafka.controller.EventPerformanceMonitor) ``` Also, every 60 seconds, it logs some event time statistics, including average time, maximum time, and the name of the event which took the longest. This periodic message looks like this: ``` [2024-12-20 15:35:04,798] INFO [QuorumController id=1] In the last 60000 ms period, 333 events were completed, which took an average of 12.34 ms each. The slowest event was handleCommit[baseOffset=0], which took 41.90 ms. (org.apache.kafka.controller.EventPerformanceMonitor) ``` An operator can disable these logs by adding the following to their log4j config: ``` org.apache.kafka.controller.EventPerformanceMonitor=OFF ``` Reviewers: Colin P. McCabe <cmccabe@apache.org>	2025-01-06 13:34:46 -08:00
Ismael Juma	409a43eff7	MINOR: Collection/Option usage simplification via methods introduced in Java 9 & 11 (#18305 ) Relevant methods: 1. `List.of`, `Set.of`, `Map.of` and similar (introduced in Java 9) 2. Optional: `isEmpty` (introduced in Java 11), `stream` (introduced in Java 9). Reviewers: Mickael Maison <mimaison@users.noreply.github.com>	2025-01-03 16:13:39 -08:00
Ismael Juma	d6f24d3665	Use `instanceof` pattern to avoid explicit cast (#18373 ) This feature was introduced in Java 16. Reviewers: David Arthur <mumrah@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>	2025-01-02 09:32:51 -08:00
Justine Olshan	8bd3746e0c	KAFKA-17705: Add Transactions V2 system tests and mark as production ready (#18132 ) Added transaction version 2 to some of the system tests. Also marking TV2 as production ready. Also fixes the defaultVersion test. Reviewers: Jun Rao <jun@confluent.io>	2024-12-21 14:01:54 -08:00
TengYao Chi	b37b89c668	KAFKA-9366 Upgrade log4j to log4j2 (#17373 ) This pull request replaces Log4j with Log4j2 across the entire project, including dependencies, configurations, and code. The notable changes are listed below: 1. Introduce Log4j2 Instead of Log4j 2. Change Configuration File Format from Properties to YAML 3. Adds warnings to notify users if they are still using Log4j properties, encouraging them to transition to Log4j2 configurations Co-authored-by: Lee Dongjin <dongjin@apache.org> Reviewers: Luke Chen <showuon@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-12-14 01:14:31 +08:00
Gantigmaa Selenge	747dc172e8	KIP-1073: Return fenced brokers in DescribeCluster response (#17524 ) mplementation of KIP-1073: Return fenced brokers in DescribeCluster response. Add new unit and integration tests for describeCluster. Reviewers: Luke Chen <showuon@gmail.com>	2024-12-13 10:58:11 +08:00
Nick Guo	671cbedc1b	KAFKA-18219 Use INFO level instead of ERROR after successfully performing an unclean leader election (#18159 ) Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-12-13 03:57:14 +08:00
TengYao Chi	772aa241b2	KAFKA-18136: Remove zk migration from code base (#18016 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-12-12 18:34:29 +01:00
David Mao	0ff55c316a	KAFKA-18106: Generate LeaderAndIsrUpdates on unclean shutdown (#18045 ) Generate LeaderAndISR change records when a broker re-registers and the quorum controller detects an unclean shutdown. This is necessary to ensure that we perform the expected partition state transitions, eg: bumping leader epochs and so on. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2024-12-05 16:19:05 -08:00
Dongnuo Lyu	e30edb3eff	KAFKA-18052: Decouple the dependency of feature stable version to the metadata version (#17886 ) Currently the validation of feature upgrade relies on the supported version range generated during registration. For a given feature, its max supported feature version in production is set to be the default version value (the latest feature version with bootstrap metadata value smaller or equal to the latest production metadata value). This patch introduces a LATEST_PRODUCTION value independent from the metadata version to each feature so that the highest supported feature version can be customized by the feature owner. The change only applies to dynamic feature upgrade. During formatting, we still use the default value associated the metadata version. Reviewers: Justine Olshan <jolshan@confluent.io>, Jun Rao <junrao@gmail.com>	2024-12-05 11:07:47 -08:00
Ken Huang	2b43c49f51	KAFKA-18050 Upgrade the checkstyle version to 10.20.2 (#17999 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-12-05 10:59:18 +08:00
Colin Patrick McCabe	a8cdbaf4b3	KAFKA-18138: The controller must add all extant brokers to BrokerHeartbeatTracker when activating (#18009 ) The controller must add all extant brokers to BrokerHeartbeatTracker when activating. Otherwise, we could end up in a situation where a broker fails exactly as a controller failover occurs, and we never fence it. Also, fix a bug where the slf4j logger object in PeriodicTaskControlManager was initialized as though it belonged to OffsetControlManager. Reviewers: David Mao <dmao@confluent.io>, David Arthur <mumrah@gmail.com>	2024-12-03 10:33:52 -05:00
Calvin Liu	2b2b3cd355	KAFKA-18062: use feature version to enable ELR (#17867 ) Replace the ELR static config with feature version. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2024-11-26 14:40:23 -08:00
PoAn Yang	98d47f47ef	KAFKA-18028 the effective kraft version of --no-initial-controllers should be 1 rather than 0 (#17836 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-11-27 01:45:11 +08:00
Colin Patrick McCabe	cd36d64535	KAFKA-18051: Disallow creating ACLs with principals that do not contain a colon (#17883 ) Kafka Principals must contain a colon. We should enforce this in createAcls. Reviewers: David Arthur <mumrah@gmail.com>	2024-11-22 16:50:33 -08:00
Colin Patrick McCabe	130bf1054b	MINOR: some minor cleanups in the quorum controller. (#17819 ) BrokerHeartbeatManager.java: fix an outdated comment. Move an inefficient test method that is O(num_brokers) from ClusterControlManager.java into ReplicationControlManagerTest.java, so that it doesn't accidentally get used in production code. Remove QuorumController.ImbalanceSchedule, etc. since it is no longer used. Move the initialization of OffsetControlManager later in the QuorumController constructor and add a comment explaining why it should come last. This doesn't fix any bugs currently, but it's a good practice for the future. Reviewers: Mickael Maison <mickael.maison@gmail.com>	2024-11-18 11:15:38 -08:00
Colin Patrick McCabe	085b27ec6e	KAFKA-17987 Remove assorted ZK-related files (#17768 ) Remove zookeeper files in bin: - bin/zookeeper-security-migration.sh - bin/zookeeper-server-start.sh - bin/zookeeper-server-stop.sh - bin/zookeeper-shell.sh Remove files used to configure Kafka in zookeeper mode in config: - config/server.properties - config/zookeeper.properties Remove ZK references from all remaining Kafka configuration files. Remove ZK references from all log4j.properties files. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-11-13 20:32:18 +08:00
kevin-wu24	ebb3202e01	KAFKA-16964 Integration tests for adding and removing voters (#17582 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-11-05 03:09:37 +08:00
Mahsa Seifikar	b864a66439	MINOR: Add logging for ReplicationControlManager topic deletion (#17617 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2024-11-01 12:24:22 -07:00
Jonah Hooper	18b8b992f9	[KAFKA-17870] Fail CreateTopicsRequest if total number of partitions exceeds 10k (#17604 ) We fail the entire CreateTopicsRequest action if there are more than 10k total partitions being created in this topic for this specific request. The usual pattern for this API to try and succeed with some topics. Since the 10k limit applies to all topics then no topic should be created if they all exceede it. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2024-10-31 13:54:03 -07:00
Mickael Maison	d7135b2a5b	MINOR: Various cleanups in metadata (#17633 ) Reviewers: David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-10-31 02:48:33 +08:00
Colin Patrick McCabe	14a9130f6f	KAFKA-17793: Improve kcontroller robustness against long delays (#17502 ) As described in KIP-500, the Kafka controller monitors the liveness of each broker in the cluster. It gathers this information from heartbeats sent from the brokers themselves. In some rare cases, the main controller thread may get blocked for several seconds at a time. In the current code, this will result in the controller being unable to update the last contact times for the brokers during this time. This PR changes the controller heartbeat handling to be partially lockless. Specifically, the last contact time for each broker will be updated locklessly prior to the rest of the heartbeat handling. This will ensure that heartbeats always get through. Additionally, this PR adds a PeriodicTaskControlManager to better manage periodic tasks. This should help handle the very common pattern where we want to schedule a background task at some frequency. We also want the background task to be immediately rescheduled if there is too much work to be done in one event. Reviewers: Liu Zeyu <zeyu.luke@gmail.com>, David Arthur <mumrah@gmail.com>	2024-10-28 08:36:07 -07:00
Kuan-Po Tseng	edb623cf67	MINOR: Remove unused method in BrokerRegistration (#17568 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-10-23 02:12:20 +08:00
Colin Patrick McCabe	e3751a838c	KAFKA-17794: Add some formatting safeguards for KIP-853 (#17504 ) KIP-853 adds support for dynamic KRaft quorums. This means that the quorum topology is no longer statically determined by the controller.quorum.voters configuration. Instead, it is contained in the storage directories of each controller and broker. Users of dynamic quorums must format at least one controller storage directory with either the --initial-controllers or --standalone flags. If they fail to do this, no quorum can be established. This PR changes the storage tool to warn about the case where a KIP-853 flag has not been supplied to format a KIP-853 controller. (Note that broker storage directories can continue to be formatted without a KIP-853 flag.) There are cases where we don't want to specify initial voters when formatting a controller. One example is where we format a single controller with --standalone, and then dynamically add 4 more controllers with no initial topology. In this case, we want the 4 later controllers to grab the quorum topology from the initial one. To support this case, this PR adds the --no-initial-controllers flag. Reviewers: José Armando García Sancio <jsancio@apache.org>, Federico Valeri <fvaleri@redhat.com>	2024-10-21 10:06:41 -07:00
Eric Chang	6b28e81ba1	KAKFA-17173 move quota config params from KafkaConfig to QuotaConfig (#17505 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-10-19 18:01:06 +08:00
Gaurav Narula	b03fe66cfe	KAFKA-17759 Remove Utils.mkSet (#17460 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-10-11 21:20:43 +08:00
Chia-Chuan Yu	b2380d7bf6	KAFKA-17772 Remove inControlledShutdownBrokers(Set<Integer>) and unfenceBrokers(Set<Integer>) from ReplicationControlManagerTest (#17466 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-10-11 17:14:12 +08:00
kevin-wu24	167e2f71f0	KAFKA-17713: Don't generate snapshot when published metadata is not batch aligned (#17398 ) When MetadataBatchLoader handles a BeginTransactionRecord, it will publish the metadata that has seen so far and not publish again until the transaction is ended or aborted. This means a partial record batch can be published. If a snapshot is generated during this time, the currently published metadata may not align with the end of a record batch. This causes problems with Raft replication which expects a snapshot's offset to exactly precede a record batch boundary. This patch enhances SnapshotGenerator to refuse to generate a snapshot if the metadata is not batch aligned. Reviewers: David Arthur <mumrah@gmail.com>	2024-10-10 13:23:14 -04:00
TengYao Chi	924c1081dc	KAFKA-17415 Avoid overflow of expired timestamp (#17026 ) Both ZK and KRaft modes do not handle overflow, so setting a large max lifetime results in a negative expired timestamp and negative max timestamp, which is unexpected behavior. In this PR, we are only fixing the KRaft code since ZK will be removed soon. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-10-07 01:43:43 +08:00
Colin Patrick McCabe	85bfdf4127	KAFKA-17613: Remove ZK migration code (#17293 ) Remove the controller machinery for doing ZK migration in Kafka 4.0. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>	2024-10-03 12:01:14 -07:00
Justine Olshan	49d7ea6c6a	KAFKA-16308 [3/N]: Introduce feature dependency validation to UpdateFeatures command (#16443 ) This change includes: 1. Dependency checking when updating the feature (all request versions) 2. Returning top level error and no feature level errors if any feature failed to update and using this error for all the features in the response. (all request versions) 3. Returning only top level none error for v2 and beyond Reviewers: Jun Rao <jun@confluent.io>	2024-10-01 14:21:38 -07:00
Chung, Ming-Yen	e136d7611c	KAFKA-17656 Replace string concatenation with parameterized logging for PartitionChangeBuilder (#17334 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-10-02 01:53:39 +08:00
Alyssa Huang	68b9770506	KAFKA-17608, KAFKA-17604, KAFKA-16963; KRaft controller crashes when active controller is removed (#17146 ) This change fixes a few issues. KAFKA-17608; KRaft controller crashes when active controller is removed When a control batch is committed, the quorum controller currently increases the last stable offset but fails to create a snapshot for that offset. This causes an issue if the quorum controller renounces and needs to revert to that offset (which has no snapshot present). Since the control batches are no-ops for the quorum controller, it does not need to update its offsets for control records. We skip handle commit logic for control batches. KAFKA-17604; Describe quorum output missing added voters endpoints Describe quorum output will miss endpoints of voters which were added via AddRaftVoter. This is due to a bug in LeaderState's updateVoterAndObserverStates which will pull replica state from observer states map (which does not include endpoints). The fix is to populate endpoints from the lastVoterSet passed into the method. Reviewers: José Armando García Sancio <jsancio@apache.org>, Colin P. McCabe <cmccabe@apache.org>, Chia-Ping Tsai <chia7712@apache.org>	2024-09-26 13:56:19 -04:00
Colin Patrick McCabe	7c429f3514	KAFKA-17612 Remove some tests that only apply to ZK mode or migration (#17276 ) Reviewers: David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-09-26 20:41:29 +08:00
Colin Patrick McCabe	d3936365bf	KAFKA-16468: verify that migrating brokers provide their inter.broker.listener (#17159 ) When brokers undergoing ZK migration register with the controller, it should verify that they have provided a way to contact them via their inter.broker.listener. Otherwise the migration will fail later on with a more confusing error message. Reviewers: David Arthur <mumrah@gmail.com>	2024-09-13 09:18:24 -07:00
David Arthur	0e30209f01	KAFKA-17506 KRaftMigrationDriver initialization race (#17147 ) There is a race condition between KRaftMigrationDriver running its first poll() and being notified by Raft about a leader change. If onControllerChange is called before RecoverMigrationStateFromZKEvent is run, we will end up getting stuck in the INACTIVE state. This patch fixes the race by enqueuing a RecoverMigrationStateFromZKEvent from onControllerChange if the driver has not yet initialized. If another RecoverMigrationStateFromZKEvent was already enqueued, the second one to run will just be ignored. Reviewers: Luke Chen <showuon@gmail.com>	2024-09-11 10:41:49 -04:00
David Arthur	1fd1646eb9	KAFKA-15648 Update leader volatile before handleLeaderChange in LocalLogManager (#17118 ) Update the leader before calling handleLeaderChange and use the given epoch in LocalLogManager#prepareAppend. This should hopefully fix several flaky QuorumControllerTest tests. Reviewers: José Armando García Sancio <jsancio@apache.org>	2024-09-06 13:54:03 -04:00
David Jacot	c977bfdd3c	KAFKA-17413; Re-introduce `group.version` feature flag (#17013 ) This patch re-introduces the `group.version` feature flag and gates the new consumer rebalance protocol with it. The `group.version` feature flag is attached to the metadata version `4.0-IV0` and it is marked as production ready. This allows system tests to pick it up directly by default without requiring to set `unstable.feature.versions.enable` in all of them. This is fine because we don't plan to do any incompatible changes before 4.0. Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2024-08-29 01:22:54 -07:00
Colin Patrick McCabe	ca0cc355f6	KAFKA-12670: Support configuring unclean leader election in KRaft (#16866 ) Previously in KRaft mode, we could request an unclean leader election for a specific topic using the electLeaders API. This PR adds an additional way to trigger unclean leader election when in KRaft mode via the static controller configuration and various dynamic configurations. In order to support all possible configuration methods, we have to do a multi-step configuration lookup process: 1. check the dynamic topic configuration for the topic. 2. check the dynamic node configuration. 3. check the dynamic cluster configuration. 4. check the controller's static configuration. Fortunately, we already have the logic to do this multi-step lookup in KafkaConfigSchema.java. This PR reuses that logic. It also makes setting a configuration schema in ConfigurationControlManager mandatory. Previously, it was optional for unit tests. Of course, the dynamic configuration can change over time, or the active controller can change to a different one with a different configuration. These changes can make unclean leader elections possible for partitions that they were not previously possible for. In order to address this, I added a periodic background task which scans leaderless partitions to check if they are eligible for an unclean leader election. Finally, this PR adds the UncleanLeaderElectionsPerSec metric. Co-authored-by: Luke Chen showuon@gmail.com Reviewers: Igor Soarez <soarez@apple.com>, Luke Chen <showuon@gmail.com>	2024-08-28 14:13:20 -07:00
TengYao Chi	4a485ddb71	KAFKA-17315 Fix the behavior of delegation tokens that expire immediately upon creation in KRaft mode (#16858 ) In kraft mode, expiring delegation token (`expiryTimePeriodMs` < 0) has following different behavior to zk mode. 1. `ExpiryTimestampMs` is set to "expiryTimePeriodMs" [0] rather than "now" [1] 2. it throws exception directly if the token is expired already [2]. By contrast, zk mode does not. [3] [0] `49fc14f611/metadata/src/main/java/org/apache/kafka/controller/DelegationTokenControlManager.java (L316)` [1] `49fc14f611/core/src/main/scala/kafka/server/DelegationTokenManagerZk.scala (L292)` [2] `49fc14f611/metadata/src/main/java/org/apache/kafka/controller/DelegationTokenControlManager.java (L305)` [3] `49fc14f611/core/src/main/scala/kafka/server/DelegationTokenManagerZk.scala (L293)` Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-08-25 07:29:58 +08:00
Dmitry Werner	6cad2c0d67	KAFKA-17370 Move LeaderAndIsr to metadata module (#16943 ) isrWithBrokerEpoch = addBrokerEpochToIsr(isrToSend.toL	2024-08-22 15:47:09 +08:00
Alyssa Huang	0bb2aee838	KAFKA-17305; Check broker registrations for missing features (#16848 ) When a broker tries to register with the controller quorum, its registration should be rejected if it doesn't support a feature that is currently enabled. (A feature is enabled if it is set to a non-zero feature level.) This is important for the newly added kraft.version feature flag. Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@apache.org>	2024-08-21 11:14:56 -07:00
TengYao Chi	81f0b13a70	KAFKA-17238 Move VoterSet and ReplicaKey from raft.internals to raft (#16775 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-08-16 00:24:51 +08:00
José Armando García Sancio	0f7cd4dcde	KAFKA-17304; Make RaftClient API for writing to log explicit (#16862 ) RaftClient API is changed to separate the batch accumulation (RaftClient#prepareAppend) from scheduling the append of accumulated batches (RaftClient#schedulePrepatedAppend) to the KRaft log. This change is needed to better match the controller's flow of replaying the generated records before replicating them. When the controller replay records it needs to know the offset associated with the record. To compute a table offset the KafkaClient needs to be aware of the records and their log position. The controller uses this new API by generated the cluster metadata records, compute their offset using RaftClient#prepareAppend, replay the records in the state machine, and finally allowing KRaft to append the records with RaftClient#schedulePreparedAppend. To implement this API the BatchAccumulator is changed to also support this access pattern. This is done by adding a drainOffset to the implementation. The batch accumulator is allowed to return any record and batch that is less than the drain offset. Lastly, this change also removes some functionality that is no longer needed like non-atomic appends and validation of the base offset. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>	2024-08-14 15:42:04 -04:00
DL1231	3a0efa2845	KAFKA-14510; Extend DescribeConfigs API to support group configs (#16859 ) This patch extends the DescribeConfigs API to support group configs. Reviewers: Andrew Schofield <aschofield@confluent.io>, David Jacot <djacot@confluent.io>	2024-08-14 06:37:57 -07:00
Colin Patrick McCabe	132e0970fb	KAFKA-17018: update MetadataVersion for the Kafka release 3.9 (#16841 ) - Mark 3.9-IV0 as stable. Metadata version 3.9-IV0 should return Fetch version 17. - Move ELR to 4.0-IV0. Remove 3.9-IV1 since it's no longer needed. - Create a new 4.0-IV1 MV for KIP-848. Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Justine Olshan <jolshan@confluent.io>	2024-08-12 16:30:43 -07:00
Colin Patrick McCabe	e1b2adea07	KAFKA-17190: AssignmentsManager gets stuck retrying on deleted topics (#16672 ) In MetadataVersion 3.7-IV2 and above, the broker's AssignmentsManager sends an RPC to the controller informing it about which directory we have chosen to place each new replica on. Unfortunately, the code does not check to see if the topic still exists in the MetadataImage before sending the RPC. It will also retry infinitely. Therefore, after a topic is created and deleted in rapid succession, we can get stuck including the now-defunct replica in our subsequent AssignReplicasToDirsRequests forever. In order to prevent this problem, the AssignmentsManager should check if a topic still exists (and is still present on the broker in question) before sending the RPC. In order to prevent log spam, we should not log any error messages until several minutes have gone past without success. Finally, rather than creating a new EventQueue event for each assignment request, we should simply modify a shared data structure and schedule a deferred event to send the accumulated RPCs. This will improve efficiency. Reviewers: Igor Soarez <i@soarez.me>, Ron Dagostino <rndgstn@gmail.com>	2024-08-10 12:31:45 +01:00
Josep Prat	4e862c0903	KAFKA-15875: Stops leak Snapshot in public methods (#16807 ) * KAFKA-15875: Stops leak Snapshot in public methods The Snapshot class is package protected but it's returned in several public methods in SnapshotRegistry. To prevent this accidental leakage, these methods are made package protected as well. For getOrCreateSnapshot a new method called IdempotentCreateSnapshot is created that returns void. * Make builer package protected, replace <br> with <p> Reviewers: Greg Harris <greg.harris@aiven.io>	2024-08-08 20:05:47 +02:00

1 2 3 4 5 ...

419 Commits