When the group coordinator does not host any __consumer_offsets partitions, the existing ListGroup implementation won't schedule any operation, thus a `new CompletableFuture<>()` is returned directly and never gets completed. This patch fixes the issue.
Reviewers: David Jacot <djacot@confluent.io>
https://github.com/apache/kafka/pull/14369 introduced a compilation error in ReplicaManagerConcurrencyTest for Scala 2.12.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
A new AssignmentsManager accumulates, batches, and sends KIP-858
assignment events to the Controller. Assignments are sent via
AssignReplicasToDirs requests.
Move QuorumTestHarness.formatDirectories into TestUtils so it can be
used in other test contexts.
Fix a bug in ControllerRegistration.java where the wrong version of the
record was being generated in ControllerRegistration.toRecord.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>, Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>
This KIP-951 commit was reverted to investigate the org.apache.kafka.tiered.storage.integration.ReassignReplicaShrinkTest test failure (#14738).
A fix for that was merged in #14757, hence unreverting this change.
This reverts commit a98bd7d.
Reviewers: Justine Olshan <jolshan@confluent.io>, Mayank Shekhar Narula <mayanks.narula@gmail.com>
This patch adds the concept of a "Full" UpdateMetadataRequest, similar to what is used in
LeaderAndIsr. A new tagged field is added to UpdateMetadataRequest at version 8 which allows the
KRaft controller to indicate if a UMR contains all the metadata or not. Since UMR is implicitly
treated as incremental by the ZK broker, we needed a way to detect topic deletions when the KRaft
broker sends a metadata snapshot to the ZK broker. By sending a "Full" flag, the broker can now
compare existing topic IDs to incoming topic IDs and calculate which topics should be removed from
the MetadataCache.
This patch only removes deleted topics from the MetadataCache. Partition/log management was
implemented in KAFKA-15605.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
A few variables which aren't being used anymore but still exist. This commit removes those unused variables.
Co-authored-by: Sagar Rao <sagarrao@Sagars-MacBook-Pro.local>
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
This patch fixes a bug in the LeaveGroupResponse construction. Basically, when a top level error is set, no members are expected but the current check always requires one for versions prior to version 3.
Reviewers: David Jacot <djacot@confluent.io>
The consumer refactoring project introduced another `Consumer` implementation, creating two different, coexisting implementations of the `Consumer` interface:
* `KafkaConsumer` (AKA "existing", "legacy" consumer)
* `PrototypeAsyncConsumer` (AKA "new", "refactored" consumer)
The goal of this task is to refactor the code via the delegation pattern so that we can keep a top-level `KafkaConsumer` but then delegate to another implementation under the covers. There will be two delegates at first:
* `LegacyKafkaConsumer`
* `AsyncKafkaConsumer`
`LegacyKafkaConsumer` is essentially a renamed `KafkaConsumer`. That implementation handles the existing group protocol. `AsyncKafkaConsumer` is renamed from `PrototypeAsyncConsumer` and will implement the new consumer group protocol from KIP-848. Both of those implementations will live in the `internals` sub-package to discourage their use.
This task is part of the work to implement support for the new KIP-848 consumer group protocol.
Reviewers: Philip Nee <pnee@confluent.io>, Andrew Schofield <aschofield@confluent.io>, David Jacot <djacot@confluent.io>
In 91fa196, I accidentally removed the action queue paramater that was added in 7d147cf. I also renamed the actionQueue as to not confuse this in the future.
I don't think this broke anything since we don't use verification for group coordinator commits, but I should fix it to be as it was before.
Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>
See the motivation in jira description https://issues.apache.org/jira/browse/KAFKA-15824
This was discovered as ReassignReplicaShrinkTest started to fail with KIP-951 changes. KIP-951 changes since then have been reverted(PR), would be put back once this is in.
Reviewers: Walker Carlson <wcarlson@apache.org>, Andrew Schofield <aschofield@confluent.io>
This test fails because `localhost` resolved to more than one IP on the Ci. This patch updates the ClusterConnectionStatesTest.testSingleIP test to use a static resolver.
Reviewers: Luke Chen <showuon@gmail.com>
This patch fixes the compilation error for JDK 21 introduced in https://github.com/apache/kafka/pull/14708.
Reviewers: Ismael Juma <ismael@juma.me.uk>, David Jacot <djacot@confluent.io>
This reverts commit f38b0d8.
Trying to find the root cause of org.apache.kafka.tiered.storage.integration.ReassignReplicaShrinkTest failing in CI.
Reviewers: Justine Olshan <jolshan@confluent.io>
This is a follow up from #14659 that ports the windowed classes to use the StoreFactory abstraction as well. There's a side benefit of not duplicating the materialization code twice for each StreamImpl/CogroupedStreamImpl class as well.
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias Sax <mjsax@apache.org>
This patch fixes a compilation error introduced by https://github.com/apache/kafka/pull/14392 for Scala 2.12.
```
> Task :core:compileScala
[Error] /home/jenkins/workspace/Kafka_kafka-pr_PR-14392/core/src/main/scala/kafka/server/BrokerLifecycleManager.scala:305:49: value incl is not a member of scala.collection.immutable.Set[org.apache.kafka.common.Uuid]
```
Reviewers: Luke Chen <showuon@gmail.com>
The latest metadata version is now 3.7. Fix the KRaft upgrade
test to upgrade to that version instead of 3.6.
Change the vagrant setup and gradle dependencies to use 3.3.2 instead of 3.3.1.
Reviewers: David Arthur <mumrah@gmail.com>
BrokerLifecycleManager should send the offline log directories in the BrokerHeartbeatRequests it
sends. Also, when handling BrokerHeartbeatResponses, do so by enqueing a BrokerLifecycleManager
event, rather than trying to do the handling directly in the callback.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>
Update BrokerRegistration to use a Builder. This fixes the proliferation of different constructors,
and makes it clear what arguments are being used where.
Reviewers: Colin P. McCabe <cmccabe@confluent.io>
meta.properties files are used by Kafka to identify log directories within the filesystem.
Previously, the code for handling them was in BrokerMetadataCheckpoint.scala. This PR rewrites the
code for handling them as Java and moves it to the apache.kafka.metadata.properties namespace. It
also gets rid of the separate types for v0 and v1 meta.properties objects. Having separate types
wasn't so bad back when we had a strict rule that zk clusters used v0 and kraft clusters used v1.
But ZK migration has blurred the lines. Now, a zk cluster may have either v0 or v1, if it is
migrating, and a kraft cluster may have either v0 or v1, at any time.
The new code distinguishes between an individual meta.properties file, which is represented by
MetaProperties, and a collection of meta.properties files, which is represented by
MetaPropertiesEnsemble. It is useful to have this distinction, because in JBOD mode, even if some
log directories are inaccessible, we can still use the ensemble to extract needed information like
the cluster ID. (Of course, even when not in JBOD mode, KRaft servers have always been able to
configure a metadata log directory separate from the main log directory.)
Since we recently added a unique directory.id to each meta.properties file, the previous convention
of passing a "canonical" MetaProperties object for the cluster around to various places in the code
needs to be revisited. After all, we can no longer assume all of the meta.properties files are the
same. This PR fixes these parts of the code. For example, it fixes the constructors of
ControllerApis and RaftManager to just take a cluster ID, rather than a MetaProperties object. It
fixes some other parts of the code, like the constructor of SharedServer, to take a
MetaPropertiesEnsemble object.
Another goal of this PR was to centralize meta.properties validation a bit more and make it
unit-testable. For this purpose, the PR adds MetaPropertiesEnsemble.verify, and a few other
verification methods. These enforce invariants like "the metadata directory must be readable," and
so on.
Reviewers: Igor Soarez <soarez@apple.com>, David Arthur <mumrah@gmail.com>, Divij Vaidya <diviv@amazon.com>, Proven Provenzano <pprovenzano@confluent.io>
The task-level commit metrics were removed without deprecation in KIP-447 and the corresponding PR #8218. However, we forgot to update the docs and to remove the code that creates the task-level commit sensor.
This PR removes the task-level commit metrics from the docs and removes the code that creates the task-level commit sensor. The code was effectively dead since it was only used in tests.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <matthias@confluent.io>
When a task is corrupted, uncorrupted tasks are committed. That is also true for standby tasks. Committing standby tasks actually means that they are checkpointed.
When the state updater is enabled, standbys are owned by the state updater. The stream thread should not checkpoint them when handling corrupted tasks. That is not a big limitation since the state updater periodically checkpoints standbys anyway. Additionally, when handling corrupted tasks the important thing is to commit active running tasks to abort the transaction. Committing standby tasks do not abort any transaction.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
With the new callback mechanism we were accidentally passing context with the wrong request local. Now include a RequestLocal as an explicit argument to the callback.
Also make the arguments passed through the callback clearer by separating the method out.
Added a test to ensure we use the request handler's request local and not the one passed in when the callback is executed via the request handler.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Divij Vaidya <diviv@amazon.com>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>, Artem Livshits <alivshits@confluent.io>, Jun Rao <junrao@gmail.com>,
Only advance the HWM for a partition if the ISR set is equal to or above the min ISR config. This patch also sets an upper bound on the min ISR config so it cannot exceed the number of replicas.
Reviewers: David Arthur <mumrah@gmail.com>
Avoid calling cluster.partitionsForTopic(topic) if cluster.availablePartitionsForTopic(topic) is not empty
Reviewers: Divij Vaidya <diviv@amazon.com>, hudeqi <1217150961@qq.com>
A commit fixes a bug in ProduceRequest#partitionSizes() that may cause this method to incorrectly returning an empty or incomplete response for a thread when another thread is in the process of initialising it.
Reviewers: Divij Vaidya <diviv@amazon.com>, hudeqi <1217150961@qq.com>, vamossagar12 <sagarmeansocean@gmail.com>
--------------------------------
Co-authored-by: fangxiaobing <fangxiaobing@kuaishou.com>
This PR sets up the necessary prerequisites to respect configurations such as dsl.default.store.type and the dsl.store.suppliers.class (introduced in #14648) without requiring them to be passed in to StreamBuilder#new(TopologyConfig) (passing them only into new KafkaStreams(...).
It essentially makes StoreBuilder an external-only API and internally it uses the StoreFactory equivalent. It replaces KeyValueStoreMaterializer with an implementation of StoreFactory that creates the store builder only after configure() is called (in a Future PR we will create the missing equivalents for all of the places where the same thing happens for window stores, such as in all the *WindowKStreamImpl classes)
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
TestUtils.createTopicWithAdmin calls waitForAllPartitionsMetadata which waits for partition(s) to be present in each brokers' metadata cache. This is a sufficient check in ZK mode because the controller sends an LISR request before sending an UpdateMetadataRequest which means that the partition in the ReplicaManager will be updated before the metadata cache.
In KRaft mode, the metadata cache is updated first, so the check may return before partitions and other metadata listeners are fully initialized.
Testing:
Insert a Thread.sleep(100) in BrokerMetadataPublisher.onMetadataUpdate after
// Publish the new metadata image to the metadata cache.
metadataCache.setImage(newImage)
and run EdgeCaseRequestTest.testProduceRequestWithNullClientId and the test will fail locally nearly deterministically. After the change(s), the test no longer fails.
Reviewers: Justine Olshan <jolshan@confluent.io>
The patch includes the following changes as part of KIP-966
* Allow ISR shrink to empty
* Allow leader election with ELR members
* Allow electing the last known leader
Reviewers: Artem Livshits <alivshits@confluent.io>, David Arthur <mumrah@gmail.com>
This PR adds resources to store and handle client metrics needed for KIP-714.
Changes include:
Adding CLIENT_METRICS to resource type
Corresponding DYNAMIC client configurations in resources.
Changes to support dynamic loading of configuration on changes.
Changes to support API calls to fetch data stored against the new resource.
Test cases for the changes.
Reviewers: Andrew Schofield <andrew_schofield@uk.ibm.com>, Philip Nee <pnee@confluent.io>, Jun Rao <junrao@gmail.com>
DirectoryId.random doesn't need to instantiate the first 100 IDs to check if an ID is one of them.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Justine Olshan <jolshan@confluent.io>, Proven Provenzano <93720617+pprovenzano@users.noreply.github.com>
Since we have added directory.id to MetaProperties, it is no longer safe
to assume that all directories on a node contain the same MetaProperties.
Therefore, we should get rid of places where we are using a single
MetaProperties object to represent the settings of an entire cluster.
This PR removes a few such cases. In each case, it is sufficient just to
pass cluster ID.
The second part of this change refactors KafkaClusterTestKit so that we
convert paths to absolute before creating BrokerNode and ControllerNode
objects, rather than after. This prepares the way for storing an
ensemble of MetaProperties objects in BrokerNode and ControllerNode,
which we will do in a follow-up change.
Reviewers: Ron Dagostino <rndgstn@gmail.com>
Adding the note from the kafka-site repo to the main repo. I also included the fixed link.
apache/kafka-site@9fa596capache/kafka-site@4eb2409
Reviewers: Divij Vaidya <diviv@amazon.com>, Ismael Juma <ismael@juma.me.uk>
Reviewers: Christo Lolov <lolovc@amazon.com>, Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>, Ron Dagostino <rdagostino@confluent.io>
This patch adds the `group.protocol` and `group.remote.assignor` configurations as described in KIP-848.
Reviewers: Kirk True <ktrue@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield <aschofield@confluent.io>, David Jacot <djacot@confluent.io>