The purpose of this PR is to improve upon a test case in ProducerIdExpirationTest.scala. Specifically, the testTransactionAfterTransactionIdExpiresButProducerIdRemains(). This test was slightly flaky. Removed an assertion on the producerState size that caused flakiness in the test
Reviewers: Justine Olshan <jolshan@confluent.io>
This changes should fix the flakiness reported for DumpLogSegmentsTest#testDumpRemoteLogMetadataNonZeroStartingOffset.
I was not able to reproduce locally, but the issue was that the second segment was not created in time:
Missing required argument "[files]"
The fix consists of getting the log path directly from the rolled segment.
We were also creating the log twice, and that was producing this warning:
[2024-07-12 00:57:28,368] WARN [LocalLog partition=kafka-832386, dir=/tmp/kafka-2956913950351159820] Trying to roll a new log segment with start offset 0 =max(provided offset = Some(0), LEO = 0) while it already exists and is active with size 0. Size of time index: 873813, size of offset index: 1310720. (kafka.log.LocalLog:70)
This is also fixed.
Reviewers: Luke Chen <showuon@gmail.com>
As part of KIP-950, we want to split the RemoteLogManagerScheduledThreadPool into separate thread pools (one for copy and another for expiration). In this change, we are splitting it into three thread pools (one for copy, one for expiration, and another one for follower). We are reusing the same thread pool configuration for all three thread pools. We can introduce new user-facing configurations later.
Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>, Satish Duggana <satishd@apache.org>
Introduce the KRaftVersion enum to describe the current value of kraft.version. Change a bunch of places in the code that were using raw shorts over to using this new enum.
In BrokerServer.scala, fix a bug that could cause null pointer exceptions during shutdown if we tried to shut down before fully coming up.
Do not send finalized features that are finalized as level 0, since it is a no-op.
Reviewers: dengziming <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@apache.org>
Added handling of share group heartbeat and describe in KafkaApis. The Implementation of heartbeat and describe is with group coordinator.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Rahul <rahul.nirgude@mastercard.com>
When becoming the active KRaftMigrationDriver, there is another race condition similar to KAFKA-16171. This time, the race is due to a stale read from ZK. After writing to /controller and /controller_epoch, it is possible that a read on /migration is not linearized with the writes that were just made. In other words, we get a stale read on /migration. This leads to an inability to sync metadata to ZK due to incorrect zkVersion on the migration ZNode.
The non-linearizability of reads is in fact documented behavior for ZK, so we need to handle it.
To fix the stale read, this patch adds a write to /migration after updating /controller and /controller_epoch. This allows us to learn the correct zkVersion for the migration ZNode before leaving the BECOME_CONTROLLER state.
This patch also adds a check on the current leader epoch when running certain events in KRaftMigrationDriver. Historically, we did not include this check because it is not necessary for correctness. Writes to ZK are gated on the /controller_epoch zkVersion, and RPCs sent to brokers are gated on the controller epoch. However, during a time of rapid failover, there is a lot of processing happening on the controller (i.e., full metadata sync to ZK and full UMRs sent to brokers), so it is best to avoid running events we know will fail.
There is also a small fix in here to improve the logging of ZK operations. The log message are changed to past tense to reflect the fact that they have already happened by the time the log message is created.
Reviewers: Igor Soarez <soarez@apple.com>
ShareGroupHeartbeat API support as defined in KIP-932. The heartbeat persists Group and Member information on __consumer_offsets topic.
The PR also moves some of the ShareGroupConfigs to GroupCoordinatorConfigs as they should only be used in group coordinator.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
The first leader of a KRaft topic partition must rewrite the content of the bootstrap checkpoint (0-0.checkpoint) to the log so that it is replicated. Bootstrap checkpoints are not replicated to the followers.
The control records for KRaftVersionRecord and VotersRecord in the bootstrap checkpoint will be written in one batch along with the LeaderChangeMessage. The leader will write these control records before accepting data records from the state machine (Controller).
The leader determines that the bootstrap checkpoint has not been written to the log if the latest set of voters is located at offset -1. This is the last contained offset for the bootstrap checkpoint (0-0.checkpoint).
This change also improves the RaftClientTestContext to allow for better testing of the reconfiguration functionality. This is mainly done by allowing the voter set to be configured statically or through the bootstrap checkpoint.
Reviewers: José Armando García Sancio <jsancio@apache.org>, Colin P. McCabe <cmccabe@apache.org>
Co-authors: José Armando García Sancio <jsancio@apache.org>
As part of KIP-584, brokers expose a range of supported versions for each feature. For example, metadata.version might be supported from 1 to 21. (Note that feature level ranges are always inclusive, so this would include both level 1 and 21.)
These supported ranges are supposed to be able to include 0. For example, it should be possible for a broker to support a kraft.version between 0 and 1. However, in older software versions, there is an assertion in org.apache.kafka.common.feature.SupportedVersionRange that prevents this. This causes problems when the older software attempts to deserialize an ApiVersionsResponse containing such a range.
In order to resolve this dilemma, this PR bumps the version of ApiVersionsRequest from 3 to 4. Clients which send v4 promise to be able to handle ranges including 0. Clients which send v3 will not be exposed to these ranges. The feature will show up as having a minimum version of 1 instead. This work is part
of KIP-1022.
Similarly, this PR also introduces a new version of BrokerRegistrationRequest, and specifies that the
older versions of that RPC cannot handle supported version ranges including 0. Therefore, 0 is translated to 1 in the older requests.
Reviewers: Jun Rao <junrao@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>, Justine Olshan <jolshan@confluent.io>
This change adds the --remote-log-metadata-decoder flag to the kafka-dump-log.sh tool. This new flag can be used to decode the payload of the __remote_log_metadata records produced by the default RemoteLogMetadataManager.
Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>
Add a remote.log.disable.policy on a topic-level only as part of KIP-950
Reviewers: Kamal Chandraprakash <kchandraprakash@uber.com>, Luke Chen <showuon@gmail.com>, Murali Basani <muralidhar.basani@aiven.io>
The PR contains the following -
1. Added minor enhancement to not have a write state RPC call in case we have 0 states to be updated.
2. Removed unutilized function deliveryCount()
3. Added more unit tests covering functionalities across SharePartition.java
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Added ShareGroupMetrics inner class to SharePartitionManager.
Added code to record metrics at various checkpoints in code.
Reviewers: Andrew Schofield <aschofield@confluent.io>,Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
This change simply adds property methods to LogOffsetMetadata. It
changes all of the callers to use the new property methods instead of
using the fields directly.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Defined share group, member and sinmple assignor classes with API definition for Share Group Heartbeat and Describe API.
The ShareGroup and ShareGroupMember extends the common ModernGroup and ModernGroupMember respectively.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
As part of KIP-956, we have added quota for remote copies to remote storage. In this PR, we are adding the following metrics for remote copy throttling.
1. remote-copy-throttle-time-avg The average time in millis remote copies was throttled by a broker
2. remote-copy-throttle-time-max The max time in millis remote copies was throttled by a broker
Added unit test for the metrics.
Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
The release API exposed Partitions which should be an internal implementation detail for releaseAcquiredRecords API. Also lessen the scope for cached topic partitions method as it's not needed.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>, Abhinav Dixit <adixit@confluent.io>
Implemented the functionality which takes care of archiving the records when LSO moves past them. Implemented the following functions -
1. updateCacheAndOffsets - Updates the cached state, start and end offsets of the share partition as per the new log start offset. The method is called when the log start offset is moved for the share partition.
2. archiveAvailableRecordsOnLsoMovement - This function archives all the available records when they are behind the LSO.
3. archivePerOffsetBatchRecords - It archives all the available records in the per offset tracked batch passed to this function.
4. archiveCompleteBatch - It archives all the available records of the complete batch passed to this function.
Reviewers: Andrew Schofield <aschofield@confluent.io>,Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
As part of [KIP-956](https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas), we have added quota for remote fetches from remote storage. In this PR, we are adding the following metrics for remote fetch throttling.
remote-fetch-throttle-time-avg : The average time in millis remote fetches was throttled by a broker
remote-fetch-throttle-time-max : The max time in millis remote fetches was throttled by a broker
Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
Following the discussion and suggestion by @dajac, https://github.com/apache/kafka/pull/16054#discussion_r1613638293, the PR refactors the common classes to build TargetAssignment in `modern` package. `consumer` package has been moved inside `modern` package with classes exclusive to `consumer group`.
This PR completes the refactoring and base to introduce `share` package inside `modern`. The subsequent PRs will define the implementation specific to Share Groups while re-using the common functionality from `modern` package classes.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
About
This PR adds acknowledge code in SharePartitionManager. Internally, the record acknowledgements happen at the SharePartition level. SharePartitionManager identifies the SharePartitions and calls their acknowledge method to actually acknowledge the individual records
Testing
Added unit tests to cover the new functionality added in SharePartitionManagerTest
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
This patch partially reverts `group.version` in trunk. I kept the `GroupVersion` class but removed it from `Features` so it is not advertised. I also kept all the changes in the test framework. I removed the logic to require `group.version=1` to enable the new consumer rebalance protocol. The new protocol is enabled based on the static configuration.
For the context, I prefer to revert it in trunk now so we don't forget to revert it in the 3.9 release. I will bring it back for the 4.0 release.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
his PR adds KRaft support to the following tests in AlterUserScramCredentialsRequestNotAuthorizedTest
Co-authored-by: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Prior to KIP-853, users were not allow to enumerate listeners specified in `controller.listener.names` in the `advertised.listeners`. This decision was made in 3.3 because the `controller.quorum.voters` property is in effect the list of advertised listeners for all of the controllers.
KIP-853 is moving away from `controller.quorum.voters` in favor of a dynamic set of voters. This means that the user needs to have a way of specifying the advertised listeners for controller.
This change allows the users to specify listener names in `controller.listener.names` in `advertised.listeners`. To make this change forwards compatible (use a valid configuration from 3.8 in 3.9), the controller's advertised listeners are going to get computed by looking up the endpoint in `advertised.listeners`. If it doesn't exist, the controller will look up the endpoint in the `listeners` configuration.
This change also includes a fix the to the BeginQuorumEpoch request where the default value for VoterId was 0 instead of -1.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Create 3 new metadata versions:
- 3.8-IV0, for the upcoming 3.8 release.
- 3.9-IV0, to add support for KIP-1005.
- 3.9-IV1, as the new release vehicle for KIP-966.
Create ListOffsetRequest v9, which will be used in 3.9-IV0 to support KIP-1005. v9 is currently an unstable API version.
Reviewers: Jun Rao <junrao@gmail.com>, Justine Olshan <jolshan@confluent.io>
About
Implemented release acquired records functionality in SharePartition. This functionality is used when a share session gets closed, hence all the acquired records should either move to AVAILABLE or ARCHIVED state. Implemented the following functions -
1. releaseAcquiredRecords - This function is executed when the acquisition lock timeout is reached. The function releases the acquired records.
2. releaseAcquiredRecordsForCompleteBatch - Function which releases acquired records maintained at a batch level.
3. releaseAcquiredRecordsForPerOffsetBatch - Function which releases acquired records maintained at an offset level.
Testing
Added unit tests to cover the new functionality added.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
About
Implemented releaseAcquiredRecords functionality in SharePartitionManager which will act as a bridge between the call from KafkaApis to SharePartition for releasing the acquired records when a share session gets closed.
Testing
The added function has been tested with unit tests.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>