kafka

Commit Graph

Author	SHA1	Message	Date
Josep Prat	d9658aeaae	MINOR: Update dependencies (#15404 ) Updates minor versions for our dependencies and build tool - Jackson from 2.16.0 to 2.16.1 - JUnit from 5.10.0 to 5.10.2 https://junit.org/junit5/docs/5.10.2/release-notes/ and https://junit.org/junit5/docs/5.10.1/release-notes/ - Mockito from 5.8.0 to 5.10.0 (only if JDK 11 or higher) https://github.com/mockito/mockito/releases/tag/v5.10.0 and https://github.com/mockito/mockito/releases/tag/v5.9.0 - Gradle from 8.5 to 8.6 https://docs.gradle.org/8.6/release-notes.html Reviewers: Divij Vaidya <diviv@amazon.com> Signed-off-by: Josep Prat <josep.prat@aiven.io>	2024-02-22 12:20:21 +01:00
Stanislav Kozlovski	069073aef8	MINOR: Reconcile upgrade.html with kafka-site/36's version (#15406 ) The usual flow of updating the upgrade.html docs is to first do it in apache/kafka/trunk, then cherry-pick to the relative release branch and then copy into the kafka-site repo. It seems like this was not done with a few commits updating the 3.6.1, 3.5.2 and 3.5.1, resulting in kafka-site's latest upgrade.html containing content that isn't here. This was caught while we were adding the 3.7 upgrade docs. This patch reconciles both files by taking the extra changes from kafka-site and placing them here. This was done by simply comparing a diff of both changes and taking the ones that apply	2024-02-22 10:58:45 +01:00
Anton Liauchuk	8f33e81339	KAFKA-16278: Missing license for scala related dependencies (#15398 ) Reviewers: Divij Vaidya <diviv@amazon.com>	2024-02-21 11:27:28 +00:00
Matthias J. Sax	02197edaaa	MINOR: add note about Kafka Streams feature for 3.7 release (#15380 ) Reviewers: Walker Carlson <wcarlson@confluent.io>	2024-02-16 09:02:00 -08:00
Paolo Patierno	bb6990114b	MINOR: Added ACLs authorizer change during migration (#15333 ) This trivial PR makes clear when it's the right time to switch from AclAuthorizer to StandardAuthorizer during the migration process. Reviewers: Luke Chen <showuon@gmail.com>	2024-02-16 18:58:33 +08:00
Luke Chen	24cf78bd71	KAFKA-15670: add "inter.broker.listener.name" config in KRaft controller config (#14631 ) During ZK migrating to KRaft, before entering dual-write mode, the KRaft controller will send RPCs (i.e. UpdateMetadataRequest, LeaderAndIsrRequest, and StopReplicaRequest) to the brokers. Currently, we use the inter broker listener to send the RPC to brokers from the controller. But in the doc, we didn't provide this info to users because the normal KRaft controller won't use inter.broker.listener.names. This PR adds the missing config in the ZK migrating to KRaft doc. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Paolo Patierno <ppatierno@live.com>	2024-02-16 18:58:26 +08:00
Vedarth Sharma	21ceff6f36	[MINOR] Fix docker image build by introducing bash (#15347 ) The base eclipse-temuring:21-jre-alpine image got modified and had `bash` removed from it. This broke our build, since downstream steps utilizing bash scripts depended on it. This patch explicitly installs bash	2024-02-09 14:00:47 +01:00
David Arthur	38bac93634	MINOR: fix scala compile issue (#15343 ) Reviewers: David Jacot <djacot@confluent.io>	2024-02-08 15:46:00 -08:00
David Arthur	b0721296b2	MINOR: Fix some MetadataDelta handling issues during ZK migration (#15327 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2024-02-07 12:55:25 -08:00
Jim Galasyn	212301873c	Update Streams API broker compat table for AK 3.7 (#15051 ) Reviewers: Matthias J. Sax <matthias@confluent.io>	2024-02-05 12:03:50 -08:00
Matthias J. Sax	29d0194c09	KAFKA-16221: hotfix for EOS error handling (#15315 ) Kafka Streams should not crash if a task is closed dirty. This is a hotfix to catch/swallow an IllegalStateException from `producer.abortTrandsaction()` on the close-dirty clean-up path. A proper fix would be to not call `abortTransaction()` for this particular case. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>	2024-02-05 11:11:10 +01:00
Apoorv Mittal	2f26e30ce3	MINOR: Downgrade version of shadow jar plugin (#15308 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Andrew Schofield <aschofield@confluent.io>	2024-02-05 09:35:36 +01:00
Gaurav Narula	d631dca6e7	KAFKA-16157: fix topic recreation handling with offline disks (#15263 ) In Kraft mode, the broker fails to handle topic recreation correctly with broken disks. This is because ReplicaManager tracks HostedPartitions which are on an offline disk but it doesn't associate TopicId information with them. This change updates HostedPartition.Offline to associate topic id information. We also update the log creation logic in Partition::createLogInAssignedDirectoryId to not just rely on targetLogDirectoryId == DirectoryId.UNASSIGNED to determine if the log to be created is "new". Please refer to the comments in https://issues.apache.org/jira/browse/KAFKA-16157 for more information. Reviewers: Luke Chen <showuon@gmail.com>, Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Gaurav Narula <gaurav_narula2@apple.com>	2024-02-03 15:03:38 +08:00
Colin Patrick McCabe	5e9c61d79e	KAFKA-16180: Fix UMR and LAIR handling during ZK migration (#15293 ) While migrating from ZK mode to KRaft mode, the broker passes through a "hybrid" phase, in which it receives LeaderAndIsrRequest and UpdateMetadataRequest RPCs from the KRaft controller. For the most part, these RPCs can be handled just like their traditional equivalents from a ZK-based controller. However, there is one thing that is different: the way topic deletions are handled. In ZK mode, there is a "deleting" state which topics enter prior to being completely removed. Partitions stay in this state until they are removed from the disks of all replicas. And partitions associated with these deleting topics show up in the UMR and LAIR as having a leader of -2 (which is not a valid broker ID, of course, because it's negative). When brokers receive these RPCs, they know to remove the associated partitions from their metadata caches, and disks. When a full UMR or ISR is sent, deleting partitions are included as well. In hybrid mode, in contrast, there is no "deleting" state. Topic deletion happens immediately. We can do this because we know that we have topic IDs that are never reused. This means that we can always tell the difference between a broker that had an old version of some topic, and a broker that has a new version that was re-created with the same name. To make this work, when handling a full UMR or LAIR, hybrid brokers must compare the full state that was sent over the wire to their own local state, and adjust accordingly. Prior to this PR, the code for handling those adjustments had several major flaws. The biggest flaw is that it did not correctly handle the "re-creation" case where a topic named FOO appears in the RPC, but with a different ID than the broker's local FOO. Another flaw is that a problem with a single partition would prevent handling the whole request. In ZkMetadataCache.scala, we handle full UMR requests from KRaft controllers by rewriting the UMR so that it contains the implied deletions. I fixed this code so that deletions always appear at the start of the list of topic states. This is important for the re-creation case since it means that a single request can both delete the old FOO and add a new FOO to the cache. Also, rather than modifying the requesst in-place, as the previous code did, I build a whole new request with the desired list of topic states. This is much safer because it avoids unforseen interactions with other parts of the code that deal with requests (like request logging). While this new copy may sound expensive, it should actually not be. We are doing a "shallow copy" which references the previous list topic state entries. I also reworked ZkMetadataCache.updateMetadata so that if a partition is re-created, it does not appear in the returned set of deleted TopicPartitions. Since this set is used only by the group manager, this seemed appropriate. (If I was in the consumer group for the previous iteration of FOO, I should still be in the consumer group for the new iteration.) On the ReplicaManager.scala side, we handle full LAIR requests by treating anything which does not appear in them as a "stray replica." (But we do not rewrite the request objects as we do with UMR.) I moved the logic for finding stray replicas from ReplicaManager into LogManager. It makes more sense there, since the information about what is on-disk is managed in LogManager. Also, the stray replica detection logic for KRaft mode is there, so it makes sense to put the stray replica detection logic for hybrid mode there as well. Since the stray replica detection is now in LogManager, I moved the unit tests there as well. Previously some of those tests had been in BrokerMetadataPublisherTest for historical reasons. The main advantage of the new LAIR logic is that it takes topic ID into account. A replica can be a stray even if the LAIR contains a topic of the given name, but a different ID. I also moved the stray replica handling earlier in the becomeLeaderOrFollower function, so that we could correctly handle the "delete and re-create FOO" case. Reviewers: David Arthur <mumrah@gmail.com>	2024-02-02 16:00:59 -08:00
Gaurav Narula	73fb4de9aa	KAFKA-16195: ignore metadata.log.dir failure in ZK mode (#15262 ) In KRaft mode, or on ZK brokers that are migrating to KRaft, we have a local __cluster_metadata log. This log is stored in a single log directory which is configured via metadata.log.dir. If there is no metadata.log.dir given, it defaults to the first entry in log.dirs. In the future we may support multiple metadata log directories, but we don't yet. For now, we must abort the process when this log directory fails. In ZK mode, it is not necessary to abort the process when this directory fails, since there is no __cluster_metadata log there. This PR changes the logic so that we check for whether we're in ZK mode and do not abort in that scenario (unless we lost the final remaining log directory. of course.) Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>, Proven Provenzano <pprovenzano@confluent.io>	2024-02-02 09:51:27 -08:00
Colin P. McCabe	a1addb5668	Revert "KAFKA-16101: Additional fixes on KRaft migration documentation (#15287 )" This reverts commit `f7882a2cda`.	2024-02-01 16:16:43 -08:00
David Arthur	945f4b91df	KAFKA-16216: Reduce batch size for initial metadata load during ZK migration During migration from ZK mode to KRaft mode, there is a step where the kcontrollers load all of the data from ZK into the metadata log. Previously, we were using a batch size of 1000 for this, but 200 seems better. This PR also adds an internal configuration to control this batch size, for testing purposes. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2024-02-01 15:51:55 -08:00
Paolo Patierno	f7882a2cda	KAFKA-16101: Additional fixes on KRaft migration documentation (#15287 ) This PR fixes a couple of things related to the #15193 PR. When you complete "Enter Migration Mode on the brokers", we are actually in "Enabling the migration on the brokers" referring to the migration guide and the broker doesn't really have node.id yet but still broker.id, so the PR removes a statement saying to replace the one with the other. Also, during rollback it's not enough just deleting the /controller znode quickly after shutting down controllers because the controller election doesn't start yet until at least one broker is rolled back with the right configuration. Until the rolling and when controllers are down, the brokers just log something like this even if you deleted the znode "quickly": [2024-01-30 09:27:52,394] DEBUG [zk-broker-0-to-controller-quorum-channel-manager]: Controller isn't cached, looking for local metadata changes (kafka.server.BrokerToControllerRequestThread) [2024-01-30 09:27:52,394] INFO [zk-broker-0-to-controller-quorum-channel-manager]: Recorded new controller, from now on will use node localhost:9093 (id: 1 rack: null) (kafka.server.BrokerToControllerRequestThread) You have to reduce the amount of time between deleting the znode and rolling at least one broker, so that an election can start. Reviewers: Luke Chen <showuon@gmail.com>	2024-01-31 15:25:28 +08:00
Gaurav Narula	cf6defb8b5	KAFKA-16162: resend broker registration on metadata update to IBP 3.7-IV2 We update metadata update handler to resend broker registration when metadata has been updated to >= 3.7IV2 so that the controller becomes aware of the log directories in the broker. We also update DirectoryId::isOnline to return true on an empty list of log directories while the controller awaits broker registration. Co-authored-by: Proven Provenzano <pprovenzano@confluent.io> Reviewers: Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>, Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2024-01-30 10:07:21 -08:00
Mike Lloyd	b18d4c17ad	KAFKA-16210: Update jose4j to 0.9.4 (#15284 ) Co-authored-by: Mike Lloyd <mike.lloyd@teradata.com> Reviewers: Divij Vaidya <diviv@amazon.com>	2024-01-30 10:19:04 +00:00
Colin P. McCabe	5a861075bd	KAFKA-14616: Fix stray replica of recreated topics in KRaft mode When a broker is down, and a topic is deleted, this will result in that broker seeing "stray replicas" the next time it starts up. These replicas contain data that used to be important, but which now needs to be deleted. Stray replica deletion is handled during the initial metadata publishing step on the broker. Previously, we deleted these stray replicas after starting up BOTH LogManager and ReplicaManager. However, this wasn't quite correct. The presence of the stray replicas confused ReplicaManager. Instead, we should delete the stray replicas BEFORE starting ReplicaManager. This bug triggered when a topic was deleted and re-created while a broker was down, and some of the replicas of the re-created topic landed on that broker. The impact was that the stray replicas were deleted, but the new replicas for the next iteration of the topic never got created. This, in turn, led to persistent under-replication until the next time the broker was restarted. Reviewers: Luke Chen <showuon@gmail.com>, Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>, Gaurav Narula <gaurav_narula2@apple.com>	2024-01-29 22:44:00 -08:00
Colin Patrick McCabe	e71992d3be	KAFKA-16101: Fix KRaft migration documentation (#15193 ) This PR fixes some bugs in the KRaft migration documentation and reorganizes it to be easier to read. (Specifically, there were some steps that were previously out of order.) In order to keep it all straight, the revert documentation is now in the form of a table which maps the latest migration state to the actions which the system administrator should perform. Reviewers: Luke Chen <showuon@gmail.com>, David Arthur <mumrah@gmail.com>, Liu Zeyu <zeyu.luke@gmail.com>, Paolo Patierno <ppatierno@live.com>	2024-01-29 14:37:51 -08:00
David Arthur	b709872299	KAFKA-16171: Fix ZK migration controller race #15238 This patch causes the active KRaftMigrationDriver to reload the /migration ZK state after electing itself as the leader in ZK. This closes a race condition where the previous active controller could make an update to /migration after the new leader was elected. The update race was not actually a problem regarding the data since both controllers would be syncing the same state from KRaft to ZK, but the change to the znode causes the new controller to fail on the zk version check on /migration. This patch also fixes a as-yet-unseen bug where the active controllers failing to elect itself via claimControllerLeadership would not retry. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2024-01-29 14:37:35 -08:00
Luke Chen	d45bce4549	KAFKA-16085: Add metric value consolidated for topics on a broker for tiered storage. (#15133 ) In BrokerTopicMetrics group, we'll provide not only the metric for per topic, but also the all topic aggregated metric value. The beanName is like this: kafka.server:type=BrokerTopicMetrics,name=RemoteCopyLagSegments kafka.server:type=BrokerTopicMetrics,name=RemoteCopyLagSegments,topic=Leader This PR is to add the missing all topic aggregated metric value for tiered storage, specifically for gauge type metrics. Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Christo Lolov <lolovc@amazon.com>	2024-01-27 19:14:32 +08:00
Matthias J. Sax	a242229d38	KAFKA-15594: Add version 3.6 to Kafka Streams system tests (#15151 ) Reviewers: Walker Carlson <wcarlson@confluent.io>	2024-01-26 15:00:20 -08:00
Kirk True	f192435312	KAFKA-16029: Fix "Unable to find FetchSessionHandler for node X" bug (#15186 ) Change `AbstractFetcher`/`Fetcher` to _not_ clear the `sessionHandlers` cache during `prepareCloseFetchSessionRequests()`. During `close()`, `Fetcher` calls `maybeCloseFetchSessions()` which, in turn, calls `prepareCloseFetchSessionRequests()` and then calls `NetworkClient.poll()` to complete the requests. Since `prepareCloseFetchSessionRequests()` (erroneously) clears the `sessionHandlers` cache, when the response is processed, the sessions are missing, and the warning is logged. Reviewers: Andrew Schofield <aschofield@confluent.io>, David Jacot <djacot@confluent.io>	2024-01-24 13:57:02 +01:00
Justine Olshan	40a682a431	MINOR: Update KIP-890 note (#15244 ) We've released the fix so I updated the note. We can backport to 3.6 and 3.7 branches as well. Reviewers: Ismael Juma <ismael@juma.me.uk>, Divij Vaidya <diviv@amazon.com>	2024-01-23 09:21:27 -08:00
Luke Chen	aaf3a2f72f	KAFKA-16144: skip checkQuorum for only 1 voter case (#15235 ) When there's only 1 voter, there will be no fetch request from other voters. In this case, we should still not expire the checkQuorum timer because there's just 1 voter. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>, José Armando García Sancio <jsancio@apache.org>	2024-01-23 10:19:12 +08:00
Ismael Juma	6535fe7f04	Note that Java 11 support for broker and tools is deprecated for removal in 4.0 (#15236 ) Reviewers: Divij Vaidya <diviv@amazon.com>	2024-01-20 15:40:04 -08:00
Matthias J. Sax	8b9715bb70	KAFKA-16141: Fix StreamsStandbyTask system test (#15217 ) KAFKA-15629 added `TimestampedByteStore` interface to `KeyValueToTimestampedKeyValueByteStoreAdapter` which break the restore code path and thus some system tests. This PR reverts this change for now. Reviewers: Almog Gavra <almog.gavra@gmail.com>, Walker Carlson <wcarlson@confluent.io>	2024-01-19 09:24:28 -08:00
Apoorv Mittal	43c6064367	KAFKA-16159: MINOR - Correcting logs to debug telemetry reporter (#15228 ) Removed debug log as next time to update runs in poll loop and excessive logging happens. Reviewers: Qichao Chu <qichao@uber.com>, Philip Nee <pnee@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2024-01-18 18:00:47 -08:00
David Arthur	3c92330274	KAFKA-16078: Be more consistent about getting the latest MetadataVersion This PR creates MetadataVersion.latestTesting to represent the highest metadata version (which may be unstable) and MetadataVersion.latestProduction to represent the latest version that should be used in production. It fixes a few cases where the broker was advertising that it supported the testing versions even when unstable metadata versions had not been configured. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>	2024-01-17 16:37:13 -08:00
Proven Provenzano	2ff3ae5bed	KAFKA-16131: Only update directoryIds if the metadata version supports DirectoryAssignment (#15197 ) We only want to send directory assignments if the metadata version supports it. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2024-01-17 14:07:37 -08:00
Matthias J. Sax	1c29c84fa6	KAFKA-16139: Fix StreamsUpgradeTest (#15207 ) Adds version 3.6 to the possible values for config upgrade_from. Reviewers: Bruno Cadonna <bruno@confluent.io>	2024-01-17 13:29:02 -08:00
Bruno Cadonna	091d5570a8	KAFKA-16139: Fix StreamsUpgradeTest (#15199 ) Adds version 3.5 to the possible values for config upgrade_from. Reviewers: Matthias J. Sax <matthias@confluent.io>	2024-01-16 17:27:11 -08:00
Colin P. McCabe	b403683308	KAFKA-16126: Kcontroller dynamic configurations may fail to apply at startup Some kcontroller dynamic configurations may fail to apply at startup. This happens because there is a race between registering the reconfigurables to the DynamicBrokerConfig class, and receiving the first update from the metadata publisher. We can fix this by registering the reconfigurables first. This seems to have been introduced by the "MINOR: Install ControllerServer metadata publishers sooner" change. Reviewers: Ron Dagostino <rdagostino@confluent.io>	2024-01-16 16:04:12 -08:00
David Jacot	e3b3d7f34d	KAFKA-16118; Coordinator unloading fails when replica is deleted (#15182 ) When a replica is deleted, the unloading procedure of the coordinator is called with an empty leader epoch. However, the current implementation of the new group coordinator throws an exception in this case. My bad. This patch updates the logic to handle it correctly. We discovered the bug in our testing environment. We will add a system test or an integration test in a subsequent patch to better exercise this path. Reviewers: Justine Olshan <jolshan@confluent.io>	2024-01-16 08:29:33 +01:00
Lianet Magrans	7212c218c2	KAFKA-16133 - Reconciliation auto-commit fix (#15194 ) This fixes an issue with the time boundaries used for the auto-commit performed when partitions are revoked. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>	2024-01-15 21:51:37 +01:00
David Mao	4280df4816	KAFKA-16120: Fix partition reassignment during ZK migration When we are migrating from ZK mode to KRaft mode, the brokers pass through a phase where they are running in ZK mode, but the controller is in KRaft mode (aka a kcontroller). This is called "hybrid mode." In hybrid mode, the KRaft controllers send old-style controller RPCs to the remaining ZK mode brokers. (StopReplicaRequest, LeaderAndIsrRequest, UpdateMetadataRequest, etc.) To complete partition reassignment, the kcontroller must send a StopReplicaRequest to any brokers that no longer host the partition in question. Previously, it was sending this StopReplicaRequest with delete = false. This led to stray partitions, because the partition data was never removed as it should have been. This PR fixes it to set delete = true. This fixes KAFKA-16120. There is one additional problem with partition reassignment in hybrid mode, tracked as KAFKA-16121. The issue is that in ZK mode, brokers ignore any LeaderAndIsr request where the partition leader epoch is less than or equal to the current partition leader epoch. However, when in hybrid mode, just as in KRaft mode, we do not bump the leader epoch when starting a new reassignment, see: `triggerLeaderEpochBumpIfNeeded`. This PR resolves this problem by adding a special case on the broker side when isKRaftController = true. Reviewers: Akhilesh Chaganti <akhileshchg@users.noreply.github.com>, Colin P. McCabe <cmccabe@apache.org>	2024-01-14 20:40:57 -08:00
Mickael Maison	3505cbf44d	MINOR: Add 3.5.2 and 3.6.1 to system tests (#14932 ) Reviewers: Matthias J. Sax <mjsax@apache.org>	2024-01-13 14:54:12 +08:00
Luke Chen	d071cceffc	KAFKA-16074: close leaking threads in replica manager tests (#15077 ) Following @dajac 's finding in #15063, I found we also create new RemoteLogManager in ReplicaManagerTest, but didn't close them. While investigating ReplicaManagerTest, I also found there are other threads leaking: 1. remote fetch reaper thread. It's because we create a reaper thread in test, which is not expected. We should create a mocked one like other purgatory instance. 2. Throttle threads. We created a quotaManager to feed into the replicaManager, but didn't close it. Actually, we have created a global quotaManager instance and will close it on AfterEach. We should re-use it. 3. replicaManager and logManager didn't invoke close after test. Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, Satish Duggana <satishd@apache.org>, Justine Olshan <jolshan@confluent.io>	2024-01-10 19:56:28 +08:00
Colin Patrick McCabe	bdb4895f88	KAFKA-16094: BrokerRegistrationRequest.logDirs field must be ignorable (#15153 ) 3.7 brokers must be able to register with 3.6 and earlier controllers. Currently, this is broken because we will unconditionally try to set logDirs, but this field cannot be sent with BrokerRegistrationRequest versions older than v2. This PR marks the logDirs field as "ignorable." Marking the field as "ignorable" means that we will still be able to send the BrokerRegistrationRequest even if the schema doesn't support logDirs. Reviewers: Ron Dagostino <rdagostino@confluent.io>	2024-01-09 15:05:29 -08:00
Jeff Kim	44746104a3	MINOR: Remove classic group preparing rebalance sensor (#15143 ) Remove "group-rebalance-rate" and "group-rebalance-count" metrics from the new coordinator as this is not part of KIP-848. Reviewers: David Jacot <djacot@confluent.io>	2024-01-09 10:10:51 +01:00
Lucas Brutschy	564a2e12af	KAFKA-16089: Revert "KAFKA-14412: Better Rocks column family management" (#15145 ) * Revert "KAFKA-16086: Fix memory leak in RocksDBStore (#15135)" This reverts commit `58d6d2e592`. * Revert "KAFKA-14412: Better Rocks column family management (#14852)" This reverts commit `20a223061c`. Reviewers: Bruno Cadonna <cadonna@apache.org>	2024-01-09 10:10:01 +01:00
Lucas Brutschy	01ecb1ab48	KAFKA-16097: Disable state updater in 3.7 (#15146 ) Several problems are still appearing while running 3.7 with the state updater. This change will disable the state updater by default.	2024-01-09 09:24:33 +01:00
Christo Lolov	56774efdd7	MINOR: Add public documentation for metrics introduced in KIP-963 (#15138 ) Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>	2024-01-09 08:48:33 +05:30
Vedarth Sharma	e2a55e060a	KAFKA-16016: Add docker wrapper in core and remove docker utility script (#15048 ) Migrates functionality provided by utility to Kafka core. This wrapper will be used to generate property files and format storage when invoked from docker container. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>	2024-01-08 18:09:45 +05:30
Luke Chen	5cc0872135	KAFKA-16059: close more kafkaApis instances (#15132 ) Reviewers: Divij Vaidya <diviv@amazon.com>, Justine Olshan <jolshan@confluent.io>	2024-01-08 09:48:11 +00:00
Apoorv Mittal	24e41ee3f9	MINOR: Fix shadow jar publishing for the clients module (#15127 ) The PR fixes the publishing of kafka-clients artifact to remote maven. The kafka-clients jar was recently shadowed which would publish the artifacts to the local maven repo successfully but would throw an error when publishing to remote maven. (as part of the release process) The issue triggers only with publishMavenJavaPublicationToMavenRepository due to signing. Generating signed asc files error out for shadowed release artifacts as the module name (clients) differs from the artifact name (kafka-clients). The fix is basically to explicitly define artifact of shadowJar to signing and publish plugin. project.shadow.component(mavenJava) previously outputs the name as client-<version>-all.jar though the classifier and archivesBaseName are already defined correctly in :clients and shadowJar construction.	2024-01-08 10:46:18 +01:00
Lucas Brutschy	a9bc36e388	KAFKA-16077: Streams with state updater fails to close task upon fencing (#15117 ) * KAFKA-16077: Streams fails to close task after restoration when input partitions are updated There is a race condition in the state updater that can cause the following: 1. We have an active task in the state updater 2. We get fenced. We recreate the producer, transactions now uninitialized. We ask the state updater to give back the task, add a pending action to close the task clean once it’s handed back 3. We get a new assignment with updated input partitions. The task is still owned by the state updater, so we ask the state updater again to hand it back and add a pending action to update its input partition 4. The task is handed back by the state updater. We update its input partitions but forget to close it clean (pending action was overwritten) 5. Now the task is in an initialized state, but the underlying producer does not have transactions initialized This can cause an IllegalStateException: `Invalid transition attempted from state UNINITIALIZED to state IN_TRANSACTION` when running in EOSv2. To fix this, we introduce a new pending action CloseReviveAndUpdateInputPartitions that is added when we handle a new assignment with updated input partitions, but we still need to close the task before reopening it. We should not remove the task twice, otherwise, we'll end up in this situation 1. We have an active task in the state updater 2. We get fenced. We recreate the producer, transactions now uninitialized. We ask the state updater to give back the task, add a pending action to close the task clean once it’s handed back 3. The state updater moves the task from the updating tasks to the removed tasks 4. We get a new assignment with updated input partitions. The task is still owned by the state updater, so we ask the state updater again to hand it back (adding a task+remove into the task and action queue) and add a pending action to close, revive and update input partitions 5. The task is handed back by the state updater. We close revive and update input partitions, and add the task back to the state updater 6. The state updater executes the "task+remove" action that is still in its task + action queue, and hands the task immediately back to the main thread 7. The main thread discoveres a removed task that was not restored and has no pending action attached to it. IllegalStateException Reviewers: Bruno Cadonna <cadonna@apache.org>	2024-01-05 19:33:40 +01:00

1 2 3 4 5 ...

12207 Commits All Branches Search

12207 Commits

All Branches