When we are migrating from ZK mode to KRaft mode, the brokers pass through a phase where they are
running in ZK mode, but the controller is in KRaft mode (aka a kcontroller). This is called "hybrid
mode." In hybrid mode, the KRaft controllers send old-style controller RPCs to the remaining ZK
mode brokers. (StopReplicaRequest, LeaderAndIsrRequest, UpdateMetadataRequest, etc.)
To complete partition reassignment, the kcontroller must send a StopReplicaRequest to any brokers
that no longer host the partition in question. Previously, it was sending this StopReplicaRequest
with delete = false. This led to stray partitions, because the partition data was never removed as
it should have been. This PR fixes it to set delete = true. This fixes KAFKA-16120.
There is one additional problem with partition reassignment in hybrid mode, tracked as KAFKA-16121.
The issue is that in ZK mode, brokers ignore any LeaderAndIsr request where the partition leader
epoch is less than or equal to the current partition leader epoch. However, when in hybrid mode,
just as in KRaft mode, we do not bump the leader epoch when starting a new reassignment, see:
`triggerLeaderEpochBumpIfNeeded`. This PR resolves this problem by adding a special case on the
broker side when isKRaftController = true.
Reviewers: Akhilesh Chaganti <akhileshchg@users.noreply.github.com>, Colin P. McCabe <cmccabe@apache.org>
Fetching from remote log segment implementation does not handle the topics that had retention policy as compact earlier and changed to delete. It always assumes record batch will exist in the required segment for the requested offset. But there is a possibility where the requested offset is the last offset of the segment and has been removed due to log compaction. Then it requires iterating over the next higher segment for further data as it has been done for local segment fetch request.
This change partially addresses the above problem by iterating through the remote log segments to find the respective segment for the target offset.
Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>
This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller.
Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when https://github.com/apache/kafka/pull/15087 is merged.
Reviewers: Justine Olshan <jolshan@confluent.io>
Following @dajac 's finding in #15063, I found we also create new RemoteLogManager in ReplicaManagerTest, but didn't close them.
While investigating ReplicaManagerTest, I also found there are other threads leaking:
1. remote fetch reaper thread. It's because we create a reaper thread in test, which is not expected. We should create a mocked one like other purgatory instance.
2. Throttle threads. We created a quotaManager to feed into the replicaManager, but didn't close it. Actually, we have created a global quotaManager instance and will close it on AfterEach. We should re-use it.
3. replicaManager and logManager didn't invoke close after test.
Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, Satish Duggana <satishd@apache.org>, Justine Olshan <jolshan@confluent.io>
Migrates functionality provided by utility to Kafka core. This wrapper will be used to generate property files and format storage when invoked from docker container.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
This patch moves the `RaftIOThread` implementation into Java. I changed the name to `KafkaRaftClientDriver` since the main thing it does is drive the calls to `poll()`. There shouldn't be any changes to the logic.
Reviewers: José Armando García Sancio <jsancio@apache.org>
Throw UnknownTopicIdException instead of InvalidTopicException when no name is found for the topic ID.
Similar to #6124 for describeTopics using a topic name. MockAdminClient already makes use of UnknownTopicIdException for this case.
Reviewers: Justine Olshan <jolshan@confluent.io>, Ashwin Pankaj <apankaj@confluent.io>
updated GroupCoordinatorIntegrationTest.testGroupCoordinatorPropagatesOffsetsTopicCompressionCodec to support KRaft
Reviewers: Justine Olshan <jolshan@confluent.io>
Mockito will keep the invocation history in the test suite and cause the huge heap usage. Since the mock replicaManager is only used to bypass the replicaManager constructor without verifying/mocking anything, we create a real dummy replicaManager to avoid the mockito invocation history in memory.
Reviewers: Luke Chen <showuon@gmail.com>, Justine Olshan <jolshan@confluent.io>
Co-authored-by: Luke Chen <showuon@gmail.com>
It seems like this PR (https://github.com/apache/kafka/pull/8768) duplicated the implementation to QuotaUtils, but didn't remove this implementation and private methods that is using
Reviewers: Justine Olshan <jolshan@confluent.io>
The controllerApi will create some resources, including the reaper threads. In ControllerApisTest, we created it on many test cases, but didn't close it. This commit doesn't change anything in the business logic of the test, it just adds try/finally to close the controllerApi instance.
Reviewers: Divij Vaidya <diviv@amazon.com>
This commit closes the KDC server properly in `CustomQuotaCallbackTest` and `AclAuthorizerWithZkSaslTest`.
Reviewers: Justine Olshan <jolshan@confluent.io>
related to KAFKA-15818
This is a bug in the AsyncKafkaConsumer poll loop that it does not send an event to the network thread to acknowledge user poll. This causes a few issues:
Autocommit won't work without user setting the timer
the member will just leave the group after rebalance timeout and never able to rejoin.
In this PR, a few subtle changes are made to address this issue
Hook up poll event to the AsyncKafkaConsumer#poll. It is only fired once per invocation
Upon entering staled state, we need to reset HeartbeatState otherwise we will get an invalid request
We will clear and current assignment and remove all assigned partitions once the heartbeat is sent. See changes in onHeartbeatRequestSent
Reviewers: David Jacot <djacot@confluent.io>, Bruno Cadonna <cadonna@apache.org>, Andrew Schofield <aschofield@confluent.io>
I was investigating a build which failed with "exit 1". In the logs of the broker, I was that the first call to exist was caught. However, a second one was not. See the logs below. The issue seems to be that we must first shutdown the cluster before reseting the exit catcher. Otherwise, there is still a change for the broker to call exit.
```
[2023-12-21 13:52:59,310] ERROR Shutdown broker because all log dirs in /tmp/kafka-2594137463116889965 have failed (kafka.log.LogManager:143)
[2023-12-21 13:52:59,312] ERROR test error (kafka.server.epoch.EpochDrivenReplicationProtocolAcceptanceWithIbp26Test:76)
java.lang.RuntimeException: halt(1, null) called!
at kafka.server.QuorumTestHarness.$anonfun$setUp$4(QuorumTestHarness.scala:273)
at org.apache.kafka.common.utils.Exit.halt(Exit.java:63)
at kafka.utils.Exit$.halt(Exit.scala:33)
at kafka.log.LogManager.handleLogDirFailure(LogManager.scala:224)
at kafka.server.ReplicaManager.handleLogDirFailure(ReplicaManager.scala:2600)
at kafka.server.ReplicaManager$LogDirFailureHandler.doWork(ReplicaManager.scala:324)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131)
```
```
[2023-12-21 13:53:05,797] ERROR Shutdown broker because all log dirs in /tmp/kafka-7355495604650755405 have failed (kafka.log.LogManager:143)
```
Reviewers: Luke Chen <showuon@gmail.com>
This patch ensures that the RemoteLogManager is closed in RemoteLogManagerTest.
Reviewers: Divij Vaidya <diviv@amazon.com>, Lucas Brutschy <lbrutschy@confluent.io>
These tests are removed in this commit because they are flaky.
After investigation, the causes are:
1. remoteLogSizeComputationTime: It failed with Expected to find 1000 for RemoteLogSizeComputationTime metric value, but found 0. The reason is because if the verification thread is too slow, and the 2nd run of RLMTask started, then it'll reset the value back to 0. Fix it by adding latch to wait for verification.
2. remoteFetchExpiresPerSec: It failed with The ExpiresPerSec value is not incremented. Current value is: 0. The reason is because the remoteFetchExpiresPerSec metric is a static metric. And we remove all metrics after each test completed in tearDown method. So once remoteFetchExpiresPerSec is removed, it won't be created again like other metrics. And that's why it failed sometimes in Jenkins because if there is a previous test have expired remote fetch, then this metric will be created and removed forever. Fix it by only removing it in afterAll.
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Satish Duggana <satishd@apache.org>, Christo Lolov <lolovc@amazon.com>
This pull request aims to implement RemoteLogSizeBytes from KIP-963.
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>
People has raised concerned about using `Generic` as a name to designate the old rebalance protocol. We considered using `Legacy` but discarded it because there are still applications, such as Connect, using the old protocol. We settled on using `Classic` for the `Classic Rebalance Protocol`.
The changes in this patch are extremely mechanical. It basically replaces the occurrences of `generic` by `classic`.
Reviewers: Divij Vaidya <diviv@amazon.com>, Lucas Brutschy <lbrutschy@confluent.io>
This patch wires the handling of makers written by the transaction coordinator via the WriteTxnMarkers API. In the old group coordinator, the markers are written to the logs and the group coordinator is informed to materialize the changes as a second step if the writes were successful. This approach does not really work with the new group coordinator for mainly two reasons: 1) The second step would actually fail while the coordinator is loading and there is no guarantee that the loading has picked up the write or not; 2) It does not fit well with the new memory model where the state is snapshotted by offset. In both cases, it seems that having a single writer to the `__consumer_offsets` partitions is more robust and preferable.
Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
The new coordinator stops loading if the partition goes offline during load. However, the partition is still considered active. Instead, we should return NOT_LEADER_OR_FOLLOWER exception during load.
Another change is that we only want to invoke CoordinatorPlayback#updateLastCommittedOffset if the current offset (last written offset) is greater than or equal to the current high watermark. This is to ensure that in the case the high watermark is ahead of the current offset, we don't clear snapshots prematurely.
Reviewers: David Jacot <djacot@confluent.io>
We are not properly closing Closeable resources in the code base at multiple places especially when we have an exception. This code change fixes multiple of these leaks.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Luke Chen <showuon@gmail.com>, Mickael Maison <mickael.maison@gmail.com>
This patch adds the group.coordinator.rebalance.protocols configuration which accepts a list of protocols to enable. At the moment, only generic and consumer are supported and it is not possible to disable generic yet. When consumer is enabled, the new consumer rebalance protocol (KIP-848) is enabled alongside the new group coordinator. This patch also publishes all the new configurations introduced by KIP-848.
Reviewers: Jeff Kim <jeff.kim@confluent.io>, Stanislav Kozlovski <stanislav@confluent.io>
This pull request aims to implement RemoteCopyLagSegments, RemoteDeleteLagBytes and RemoteDeleteLagSegments from KIP-963.
Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
Breakdown of this PR:
* Extend the generator to support deprecated api versions
* Set deprecated api versions via the request json files
* Expose the information via metrics and the request log
The relevant section of the KIP:
> * Introduce metric `kafka.network:type=RequestMetrics,name=DeprecatedRequestsPerSec,request=(api-name),version=(api-version),clientSoftwareName=(client-software-name),clientSoftwareVersion=(client-software-version)`
> * Add boolean field `requestApiVersionDeprecated` to the request
header section of the request log (alongside `requestApiKey` ,
`requestApiVersion`, `requestApiKeyName` , etc.).
Unit tests were added to verify the new generator functionality,
the new metric and the new request log attribute.
Reviewers: Jason Gustafson <jason@confluent.io>
We drive the consumer closing via events, and rely on the still-lived network thread to complete these operations.
This ticket encompasses several different tickets:
KAFKA-15696/KAFKA-15548
When closing the consumer, we need to perform a few tasks. And here is the top level overview:
We want to keep the network thread alive until we are ready to shut down, i.e., no more requests need to be sent out. To achieve so, I implemented a method, signalClose() to signal the managers to prepare for shutdown. Once we signal the network thread to close, the manager will prepare for the request to be sent out on the next event loop. The network thread can then be closed after issuing these events. The application thread's task is pretty straightforward, 1. Tell the background thread to perform n events and 2. Block on certain events until succeed or the timer runs out. Once all requests are sent out, we close the network thread and other components as usual.
Here I outline the changes in detail
AsyncKafkaConsumer: Shutdown procedures, and several utility functions to ensure proper exceptions are thrown during shutdown
AsyncKafkaConsumerTest: I examine each individual test and fix ones that are blocking for too long or logging errors
CommitRequestManager: signalClose()
FetchRequestManagerTest: changes due to change in pollOnClose()
ApplicationEventProcessor: handle CommitOnClose and LeaveGroupOnClose. Latter, it triggers leaveGroup() which should be completed on the next heartbeat (or we time out on the application thread)
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Kirk True <ktrue@confluent.io>
Producers and consumers could be leaked in the AuthorizerIntegrationTest. In the teardown logic, `removeAllClientAcls()` is called before calling the super teardown method. If `removeAllClientAcls()` fails, the super method does not have a change to close the producers and consumers. Example of such failure [here](https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14925/11/tests/).
As a new cluster is created for each test anyway, calling `removeAllClientAcls()` does not seem necessary. This patch removes it.
Reviewers: Jason Gustafson <jason@confluent.io>
This PR implements part of KIP-963, specifically for adding new metrics.
The metrics added in this PR are:
RemoteDeleteRequestsPerSec (emitted when expired log segments on remote storage being deleted)
RemoteDeleteErrorsPerSec (emitted when failed to delete expired log segments on remote storage)
BuildRemoteLogAuxStateRequestsPerSec (emitted when building remote log aux state for replica fetchers)
BuildRemoteLogAuxStateErrorsPerSec (emitted when failed to build remote log aux state for replica fetchers)
Reviewers: Luke Chen <showuon@gmail.com>, Nikhil Ramakrishnan <ramakrishnan.nikhil@gmail.com>, Christo Lolov <lolovc@amazon.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>
This reverts commit ed7ad6d.
We have been seeing a lot of failures of TransactionsWithTieredStoreTest.testTransactionsWithCompression on trunk, and it seems to start with this PR. I see how this PR can influence the test via the change in TestUtils. The bad part is that sometimes seems to kill the Gradle Executors completely. So I'd suggest reverting the change before investigating further to stabilize CI.
Reviewers: Bruno Cadonna <cadonna@apache.org>
I observed several failed tests in PR builds. Let's first disable them and try to find a different way to test the async consumer with these tests.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
I ran this test 40 times without KAFKA-15653 with and without compression enabled.
With compression it failed 39/40 times and without it passed 40/40 times.
With the KAFKA-15653 and compression it passed 40/40 times locally
Reviewers: Jason Gustafson <jason@confluent.io>
The consumer integration tests were experimentally disabled for the new `AsyncKafkaConsumer` variant with the aim of improving build stability. Several improvements have been made to the consumer code and other tests which seem to have made a difference. This patch re-enables the tests.
Reviewers: David Jacot <djacot@confluent.io>
Add metric for the number of expired remote fetches per second, and corresponding unit test to verify that the metric is marked on expiration.
kafka.server:type=DelayedRemoteFetchMetrics,name=ExpiresPerSec
Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Christo Lolov <lolovc@amazon.com>
This patch adds the logic for coordinating the invocation of the `ConsumerRebalanceListener` callback invocations between the background thread (in `MembershipManagerImpl`) and the application thread (`AsyncKafkaConsumer`) and back again. It allowed us to enable more tests from `PlaintextConsumerTest` to exercise the code herein.
Reviewers: David Jacot <djacot@confluent.io>
Moves ELR from MetadataVersion IBP_3_7_IV3 into the new IBP_3_8_IV0 because the ELR feature was not completed before 3.7 reached feature freeze. Leaves IBP_3_7_IV3 empty -- it is a no-op and is not reused for anything. Adds the new MetadataVersion IBP_3_7_IV4 for the FETCH request changes from KIP-951, which were mistakenly never associated with a MetadataVersion. Updates the LATEST_PRODUCTION MetadataVersion to IBP_3_7_IV4 to declare both KRaft JBOD and the KIP-951 changes ready for production use.
Reviewers: Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>, Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@apache.org>, Justine Olshan <jolshan@confluent.io>
Rewrote the verification flow to pass a callback to execute after verification completes.
For the TxnOffsetCommit, we will call doTxnCommitOffsets. This allows us to do offset validations post verification.
I've reorganized the verification code and group coordinator code to make these code paths clearer. The followup refactor (https://issues.apache.org/jira/browse/KAFKA-15987) will further clean up the produce verification code.
Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
This pull request implements the first in the list of metrics in KIP-963: Additional metrics in Tiered Storage.
Since each partition of a topic will be serviced by its own RLMTask we need an aggregator object for a topic. The aggregator object in this pull request is BrokerTopicAggregatedMetric. Since the RemoteCopyLagBytes is a gauge I have introduced a new GaugeWrapper. The GaugeWrapper is used by the metrics collection system to interact with the BrokerTopicAggregatedMetric. The RemoteLogManager interacts with the BrokerTopicAggregatedMetric directly.
Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
This patch ensure that `offset.commit.timeout.ms` is enforced. It does so by adding a timeout to the CoordinatorWriteEvent.
Reviewers: David Jacot <djacot@confluent.io>
Implement Consumer.listTopics and Consumer.partitionsFor in the new consumer. The topic metadata request manager already existed so this PR adds expiration to requests, removes some redundant state checking and adds tests.
Reviewers: Lucas Brutschy <lucasbru@apache.org>
Currently, poll interval is not being respected during consumer#poll. When the user stops polling the consumer, we should assume either the consumer is too slow to respond or is already dead. In either case, we should let the group coordinator kick the member out of the group and reassign its partition after the rebalance timeout expires.
If the consumer comes back alive, we should send a heartbeat and the member will be fenced and rejoin. (and the partitions will be revoked).
This is the same behavior as the current implementation.
Reviewers: Lucas Brutschy <lucasbru@apache.org>, Bruno Cadonna <cadonna@apache.org>, Lianet Magrans <lianetmr@gmail.com>
Session expiration in ZkClient can lead to a thread leak, and does fail CI on master.
This is happening in testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl, and possibly other tests.
Use try-with-resources to close ZkClient if this happens.
This does not fix the underlying session expiration in ZK.
Reviewers: David Jacot <djacot@confluent.io>
In the new consumer, the commit request manager and the membership manager are separate components. The commit request manager is initialised with group information that it uses to construct `OffsetCommit` requests. However, the initial value of the member ID is `""` in some cases. When the consumer joins the group, it receives a `ConsumerGroupHeartbeat` response which tells it the member ID. The member ID was not being passed to the commit request manager, so it sent invalid `OffsetCommit` requests that failed with `UNKNOWN_MEMBER_ID`.
Reviewers: Bruno Cadonna <cadonna@apache.org>, David Jacot <djacot@confluent.io>
The support for regular expressions has not been implemented yet in the new consumer group protocol. This patch removes the `SubscribedTopicRegex` from the `ConsumerGroupHeartbeatRequest` in preparation for 3.7. It seems better to bump the version and add it back when we implement the feature, as part of https://issues.apache.org/jira/browse/KAFKA-14517, instead of having an unused field in the request.
Reviewers: Sagar Rao <sagarmeansocean@gmail.com>, Justine Olshan <jolshan@confluent.io>
- Add proper start & stop for AssignmentsManager's event loop
- Dedupe queued duplicate assignments
- Fix bug where directory ID is resolved too late
Co-authored-by: Gaurav Narula <gaurav_narula2@apple.com>
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Allow using JBOD during ZK migration if MetadataVersion is at or above 3.7-IV2.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>
KAFKA-15361 (#14838) introduced a check for non empty directory list on brokerregistration requests
from MetadataVersion.IBP_3_7_IV2 or later, which enables directory assignment. However, ZK brokers
weren't yet registering yet with a directory list. This patch addresses that. We also make the
directory list non-optional in BrokerLifecycleManager.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>
This new integration test verifies that a static member who temporary left the group is removed after the session timeout expires. It also verifies that a new static member with the same instance id can't join the group until the previous static member is expired.
Reviewers: David Jacot <djacot@confluent.io>
DelegationTokenEndToEndAuthorizationWithOwnerTest can leak a thread, causing problems with many tests.
This is due to an admin client that isn't being closed when a (flaky) test fails. Using the Scala util Using to close the auto-closable admin client in case the validation fails.
Reviewers: David Jacot <djacot@confluent.io>, Bruno Cadonna <cadonna@apache.org>
Handle AssignReplicasToDirs requests, persist metadata changes
with new directory assignments and possible leader elections.
Reviewers: Proven Provenzano <pprovenzano@confluent.io>, Ron Dagostino <rndgstn@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
Add the CurrentControllerId metric as described in KIP-1001. This gives us an easy way to identify the current controller by looking at the metrics of any Kafka node (broker or controller).
Reviewers: David Arthur <mumrah@gmail.com>
Improvement for KIP-1000 to list client metrics resources in KafkaApis.scala. Using functionality exposed by KIP-1000 to support describe all metrics operations for KIP-714.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>
The Kafka consumer makes a variety of requests to brokers such as fetching committed offsets and updating metadata. In the LegacyKafkaConsumer, the approach is typically to prepare RPC requests and then poll the network to wait for responses. In the AsyncKafkaConsumer, the approach is to enqueue an ApplicationEvent for processing by one of the request managers on the background thread. However, it is still important to wait for responses rather than spinning enqueuing events for the request managers before they have had a chance to respond.
In general, the behaviour will not be changed by this code. The PlaintextConsumerTest.testSeek test was flaky because operations such as KafkaConsumer.position were not properly waiting for a response which meant that subsequent operations were being attempted in the wrong state. This test is no longer flaky.
Reviewers: Kirk True <ktrue@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, Bruno Cadonna <cadonna@apache.org>
After the new coordinator loads a __consumer_offsets partition, it logs the following exception when making a read operation (fetch/list groups, etc):
```
java.lang.RuntimeException: No in-memory snapshot for epoch 740745. Snapshot epochs are:
at org.apache.kafka.timeline.SnapshotRegistry.getSnapshot(SnapshotRegistry.java:178)
at org.apache.kafka.timeline.SnapshottableHashTable.snapshottableIterator(SnapshottableHashTable.java:407)
at org.apache.kafka.timeline.TimelineHashMap$ValueIterator.<init>(TimelineHashMap.java:283)
at org.apache.kafka.timeline.TimelineHashMap$Values.iterator(TimelineHashMap.java:271)
```
This happens because we don't have a snapshot at the last updated high watermark after loading. We cannot generate a snapshot at the high watermark after loading all batches because it may contain records that have not yet been committed. We also don't know where the high watermark will advance up to so we need to generate a snapshot for each offset the loader observes to be greater than the current high watermark. Then once we add the high watermark listener and update the high watermark we can delete all of the older snapshots.
Reviewers: David Jacot <djacot@confluent.io>
* Validate the client’s configuration for server-side assignor selection defined in config group.remote.assignor
* Include the assignor taken from config in the ConsumerGroupHeartbeat request, in the ServerAssignor field
* Properly handle UNSUPPORTED_ASSIGNOR errors that may be returned to the HB response if the server does not support the assignor defined by the consumer.
Includes a simple integration tests for sending an invalid assignor to the broker, and for using the range assignor with a single consumer.
Reviewers: David Jacot <djacot@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, Bruno Cadonna <cadonna@apache.org>
DELETE_RECORDS API can move the log-start-offset beyond the highest-copied-remote-offset. In such cases, we should allow deletion of local-log segments since they won't be eligible for upload to remote storage.
Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>
This adds the new ListClientMetricsResources RPC to the Kafka protocol and puts support
into the Kafka admin client. The broker-side implementation in this PR is just to return an empty
list. A future PR will obtain the list from the config store.
Includes a few unit tests for what is a very simple RPC. There are additional tests already written and
waiting for the PR that delivers the kafka-client-metrics.sh tool which builds on this PR.
Reviewers: Jun Rao <junrao@gmail.com>
This patch adds an integration test which verifies that a static member gets back its previous assignment back when rejoining.
Reviewers: David Jacot <djacot@confluent.io>
This patch adds the `Uniform` assignor to the default list of supported assignors. It also do small changes in the code.
Reviewers: Justine Olshan <jolshan@confluent.io>
This is a follow-up to https://github.com/apache/kafka/pull/14687 as we found out that some parameterized tests do not include the test method name in their name. For the context, the JUnit XML report does not include the name of the method by default but only rely on the display name provided.
Reviewers: David Arthur <mumrah@gmail.com>
This patch adds a ThreadLocal with a GrowableBufferSupplier so that each writing thread can reuse the same buffer instead of allocating a new one for each write. The patch relies on existing tests.
Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
In this [buid](https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14826/11/pipeline/12/), the following test hang forever.
```
Gradle Test Run :core:test > Gradle Test Executor 93 > PlaintextConsumerTest > testSeek(String, String) > testSeek(String, String).quorum=kraft+kip848.groupProtocol=consumer STARTED
```
As the new consumer is not extremely stable yet, we should add a Timeout to all those integration tests to ensure that builds are not blocked unnecessarily.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Justine Olshan <jolshan@confluent.io>