There is no Kafka Administrator action needed for an InvalidProducerEpochException, ERROR level is worrisome while such exception can happen for a variety of valid reason, by design
Proposing to lower the log level from ERROR to INFO
Reviewers: Justine Olshan <jolshan@confluent.io>
A few of the share group configs in KIP-932 were defined with limits that do not match KIP-932. This PR corrects the limits.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
The share-partition leader keeps track of the state and delivery attempts for in-flight records. However, delivery attempts tracking follows atleast-once semantics.
The consumer processes the records and acknowledges them upon successful consumption. This successful attempt triggers a transition to the "Acknowledged" state.
The code implements the functionality to acknowledge the offset/batches in the request to in-memory cached data.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
The below remote log configs can be configured dynamically:
1. remote.log.manager.copy.max.bytes.per.second
2. remote.log.manager.fetch.max.bytes.per.second and
3. remote.log.index.file.cache.total.size.bytes
If those values are configured dynamically, then during the broker restart, it ensures the dynamic values are loaded instead of the static values from the config.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>
When broker configuration is incompatible with the current Metadata Version the Broker should log an error-level message but avoid shutting down.
Reviewers: Luke Chen <showuon@gmail.com>
Define the interfaces and RPCs for share-group persistence. (KIP-932). This PR is just RPCs and interfaces to allow building of the broker components which depend upon them. The implementation will follow in subsequent PRs.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
Added additional APIs for SharePartition which shall be used by SharePartitionManager.
The lock API on SharePartition helps not issuing concurrent fetch request on replica manager for same SharePartition. The updateCacheAndOffsets API helps to update the cache and corresponding offsets when an exception is encountered in SharePartitionManager because of movement of Log Start Offset.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Allow the committed offsets fetch to run for as long as needed. This handles the case where a user invokes Consumer.poll() with a very small timeout (including zero).
Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
About
KIP-932 introduces share sessions for share groups. This PR implements share sessions and contexts for incoming share fetch requests on broker. The changes include:
Defined CachedSharePartition class which are stored in share sessions.
Defined ShareSessionKey, ShareSession classes.
Defined ShareSessionCache class which caches all the share sessions and has evict policy defined as per KIP-932
Defined the 2 types of contexts -
a. ShareSessionContext - for share session fetch request.
b. FinalContext - for final share fetch request (epoch = -1).
Defined newContext function which returns the created/updated context on receiving share fetch request on broker.
Testing
The added code has been tested with the help of unit tests present in the PR.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
The implementation for share-fetch next-fetch-offset in share partition and acquiring records from log.
The Next Fetch Offset (NFO) determines where the Share Partition should initiate the next data read from the Replica Manager. While it typically aligns with the last offset of the most recently returned batch, last offset + 1, there are exceptions. Messages marked available again due to release acknowledgements or lock timeouts can cause the NFO to shift.
The acquire method caches the batches as acquired in-memory and spawns a timer task for lock timeout.
Cache
Per-offset Metadata: Simple to implement but inefficient. Every offset requires in-memory storage and traversal, leading to high memory usage and processing overhead, especially for per-batch acknowledgements (mostly the way records would be acknowledged).
Per-Replica Fetch Batch: This approach aligns with the Replica Manager fetch batches. Since a full Replica Manager batch is retrieved whenever the requested offset falls within that batch's boundaries, a single Share Fetch request will likely receive an entire Replica Manager batch. However, there's a trade-off. Replica Manager batches are based on producer batching. If producers don't batch effectively, the in-flight metadata becomes heavily reliant on the producer's batching behavior.
For per-message acknowledgements, per-offset tracking will be necessary which again requires splitting in-flight batches based on state. Splitting bacthes is inefficient as it requires cache update wshich maintains sorted order. Therefore, we propose a hybrid approach:
Implemented a combination of option 2 (per-in-flight batch tracking) with option 1 (per-offset tracking). This aligns well with Replica Manager batching.
States shall be maintained per in-flight batch. If state inconsistencies arise within in-flight batches due to per-message acknowledgements, switch state tracking for the respective batch to option 1 (per-offset tracking).
Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>, Abhinav Dixit <144765188+adixitconfluent@users.noreply.github.com>
When the node transitions from a leader to a follower for a partition, then the tier-lag metrics should be reset to zero. Otherwise, it would lead to false positive in metrics. Addressed the concurrency issue while emitting the metrics.
Reviewers: Satish Duggana <satishd@apache.org>, Francois Visconte <f.visconte@gmail.com>,
The listRemoteLogSegments returns the metadata list sorted by the start-offset. However, the returned metadata list contains all the uploaded segment information including the duplicate and overlapping remote-log-segments. The reason for duplicate/overlapping remote-log-segments cases is explained [here](https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogLeaderEpochState.java#L103).
The list returned by the RLMM#listRemoteLogSegments can contain the duplicate segment metadata at the end of the list. So, while computing the next log-start-offset we should take the maximum of segments (end-offset + 1).
Reviewers: Satish Duggana <satishd@apache.org>
This commit implements KIP-899: Allow producer and consumer clients to rebootstrap. It introduces the new setting `metadata.recovery.strategy`, applicable to all the types of clients.
Reviewers: Greg Harris <gharris1727@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
KAFKA-16570 FenceProducers API returns "unexpected error" when successful
* Client handling of ConcurrentTransactionsException as retriable
* Unit test
* Integration test
Reviewers: Chris Egerton <chrise@aiven.io>, Justine Olshan <jolshan@confluent.io>
This patch is the continuation of https://github.com/apache/kafka/pull/15964. It introduces the records coalescing to the CoordinatorRuntime. It also introduces a new configuration `group.coordinator.append.linger.ms` which allows administrators to chose the linger time or disable it with zero. The new configuration defaults to 10ms.
Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
- Added the integration of the quota manager to throttle copy requests to the remote storage. Reference KIP-956
- Added unit-tests for the copy throttling logic.
Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
Add support for KIP-953 KRaft Quorum reconfiguration in the DescribeQuorum request and response.
Also add support to AdminClient.describeQuorum, so that users will be able to find the current set of
quorum nodes, as well as their directories, via these RPCs.
Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Andrew Schofield <aschofield@confluent.io>
KIP-932 introduces a bunch of broker and dynamic configs for share groups. This PR adds those new configs. The changes include:
1. Defined ShareGroupConfigs class which stores various share group configurations.
2. Use the defined share configs in KafkaConfig.scala for making it available to BrokerServer
3. Adds a few tests to validate the conditions on these new configs.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Propagate metadata error from background thread to application thread.
So, this fix ensures that metadata errors are thrown to the user on consumer.poll()
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Philip Nee <pnee@confluent.io>, Lianet Magrans <lianetmr@gmail.com>
This patch implements the handling of `includeAuthorizedOperations` flag in the ConsumerGroupDescribe API.
Reviewers: David Jacot <djacot@confluent.io>
Tests in BrokerLifecycleManagerTest do not close BrokerLifecycleManager
if an assertion fails.
This change makes BrokerLifecycleManager an instance variable that is
closed in an `@AfterEach` method.
Reviewers: Igor Soarez <i@soarez.me>
KAFKA-16606 (#15834) introduced a change that broke
ReassignPartitionsCommandTest.testReassignmentCompletionDuringPartialUpgrade.
The point was to validate that the MetadataVersion supports JBOD
in KRaft when multiple log directories are configured.
We do that by checking the version used in
kafka-features.sh upgrade --metadata, and the version discovered
via a FeatureRecord for metadata.version in the cluster metadata.
There's no point in checking inter.broker.protocol.version in
KafkaConfig, since in KRaft, that configuration is deprecated
and ignored — always assuming the value of MINIMUM_KRAFT_VERSION.
The broken that was broken sets inter.broker.protocol.version in
KRaft mode and configures 3 directories. So alternatively, we
could change the test to not configure this property.
Since the property isn't forbidden in KRaft mode, just ignored,
and operators may forget to remove it, it seems better to remote
the fail condition in KafkaConfig.
Reviewers: Luke Chen <showuon@gmail.com>
Support for multiple log directories in KRaft exists from
MetataVersion 3.7-IV2.
When migrating a ZK broker to KRaft, we already check that
the IBP is high enough before allowing the broker to startup.
With KIP-584 and KIP-778, Brokers in KRaft mode do not require
the IBP configuration - the configuration is deprecated.
In KRaft mode inter.broker.protocol.version defaults to
MetadataVersion.MINIMUM_KRAFT_VERSION (IBP_3_0_IV1).
Instead KRaft brokers discover the MetadataVersion by reading
the "metadata.version" FeatureLevelRecord from the cluster metadata.
This change adds a new configuration validation step upon discovering
the "metadata.version" from the cluster metadata.
Reviewers: Mickael Maison <mickael.maison@gmail.com>
Improve consistency and correctness for user-provided timeouts at the Consumer network request layer, per the Java client Consumer timeouts design (https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts). While the changes introduced in KAFKA-15974 enforce timeouts at the Consumer's event layer, this change enforces timeouts at the network request layer.
The changes mostly fit into the following areas:
1. Create shared code and idioms so timeout handling logic is consistent across current and future RequestManager implementations
2. Use deadlineMs instead of expirationMs, expirationTimeoutMs, retryExpirationTimeMs, timeoutMs, etc.
3. Update "preemptive pruning" to remove expired requests that have had at least one attempt
Reviewers: Lianet Magrans <lianetmr@gmail.com>, Bruno Cadonna <cadonna@apache.org>