* KAFKA-18086: Enable propagation of the error message when writing state
* Propagate the error message in the writing state when calling SharePartitionManager.acknowledge and SharePartitionManager.releaseSession, and add corresponding tests to verify that the expected error message is propagated.
* Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>
This patch does a few things:
1) It cleans up resolved regular expressions when they are unsubscribed from. It covers the regular leave/fenced paths for the new protocol and it also covers the LeaveGroup API as new members could be removed via the admin API.
2) It ensures that tombstones for resolved regular expressions are generated on the conversion patch from consumer to classic group.
3) It fixes [KAFKA-18116](https://issues.apache.org/jira/browse/KAFKA-18116) because I faced the same issue while working on the LeaveGroup API. It adds an integration test for this case too.
Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Jeff Kim <jeff.kim@confluent.io>
Additional work on ShareConsumerTest.testAcquisitionLockTimeoutOnConsumer. First, mark the test as flaky since it fails occasionally and it would be best to get a decent number of passes before assuming it's no longer flaky. Then, change the assertions so that the test checks which records were received before it counts them (we might get too many records because the wrong records are being returned, or because records are actually being duplicated). The rare failures appear to be related to returning too many records, so it would be good to see whether we can learn more about the duplication.
There are a couple of other improvements too. Reducing the number of share-group state partitions so we don't have the overhead of creating 50 partitions when a few will do. Changing the warm-up logic since that has been observed very occasionally to assert.
Reviewers: ShivsundarR <shr@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
When there is a become follower transition on a transaction coordinator state partition, we intend to unload the state partition. However, we pass the new epoch to the method that does the unloading. In that method, we create a `TransactionPartitionAndLeaderEpoch` object comprising of the topic partition and the epoch that we use as a key to remove the partition from loading. However, we wouldn't ever expect to see this epoch in that map since we only load on the leader. See the code snippet: d00f0ecf1a/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala (L602)
We could have a partition load after the unloading occurs, and that partition will be stuck storing stale state on the broker until it restarts. While this may not immediately cause a correctness issue, we should try to properly clean up state.
Check that the epoch is less than the new epoch when removing the partition from loadingPartitions.
Added a test that failed before this change was made.
Reviewers: Artem Livshits <alivshits@confluent.io>, Jeff Kim <jeff.kim@confluent.io>
1) Bump validVersions of ConsumerGroupDescribeRequest.json and ConsumerGroupDescribeResponse.json to "0-1".
2) Add MemberType field to ConsumerGroupDescribeResponse.json. Default value is -1 (unknown). 0 is for classic member and 1 is for consumer member.
3) When ConsumerGroupMember#useClassicProtocol is true, return MemberType field as 0. Otherwise, return 1.
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
There are two issues in KAFKA-18060:
1) New coordinator can't handle the TxnOffsetCommitRequest with empty member id, and TxnOffsetCommitRequest v0-v2 do definitely has an empty member ID, causing ConsumerGroup#validateOffsetCommit to throw an UnknownMemberIdException. This prevents the old producer from calling sendOffsetsToTransaction. Note that TxnOffsetCommitRequest versions v0-v2 are included in KIP-896, so it seems the new coordinator should support that operations
2) The deprecated API Producer#sendOffsetsToTransaction does not use v0-v2 to send TxnOffsetCommitRequest with an empty member ID. Unfortunately, it has been released for a while. Therefore, the new coordinator needs to handle TxnOffsetCommitRequest with an empty member ID for all versions.
Taken from the two issues above, we need to handle empty member id in all API versions when new coordinator are dealing with TxnOffsetCommitRequest.
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
This patch relaxes requiring non-empty subscribed names and regex in the full heartbeat request. Without this, a consumer using client side regexes may not be able to join the group when the regex does not match any topics yet and this is inconsistent with the old protocol. Relaxing the subscribed regex is not strictly required but it seems better to keep it consistent.
Reviewers: Lianet Magrans <lmagrans@confluent.io>
This PR implement KIP-1011, kafka-configs.sh now uses incrementalAlterConfigs API to alter broker configurations instead of the deprecated alterConfigs API, and it will fall directly if the broker doesn't support incrementalAlterConfigs.
Reviewers: David Jacot <djacot@confluent.io>, OmniaGM <o.g.h.ibrahim@gmail.com>.
- integration tests for new subscribe api with RE2J pattern
- fix to ensure all topics are included in metadata requests when consumer is subscribed to RE2J pattern
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
Convert testUncleanLeaderElectionEnable and testMetricsReporterUpdate to KRaft.
Remove testAdvertisedListenerUpdate, testAddRemoveSslListener, testAddRemoveSaslListeners since we no longer support dynamically adding or removing network listeners when in KRaft mode.
Make TestUtils.ensureConsistentKRaftMetadata robust against brokers that don't have sharedServer.loader initialized yet (or have shut down).
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
https://github.com/apache/kafka/pull/15881 changed our tests to utilize `using` blocks. But these blocks don't throw any errors, so if there is a failed assertion within the block, the test will still pass.
We should either check the failure using a corresponding `match` block with Success(_) and Failure(e), use `using.resource`, or use try/finally blocks to clean up resources.
See https://www.scala-lang.org/api/3.0.2/scala/util/Using$.html
Co-authored-by: frankvicky <kitingiao@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This patch introduces the asynchronous resolution of regular expressions. Let me unpack a few details about the implementations:
1) I have decided to finally update all the regular expressions within a consumer group together. My assumption is that the number of regular expressions in a group will be generally small but the number of topics in a cluster is large. Hence grouping has two benefits. Firstly, it allows to go through the list of topics once for all the regular expressions. Secondly, it reduces the number of potential rebalances because all the regular expressions are updated at the same time.
2) An update is triggered when the group is subscribed to at least one regular expressions.
3) An update is triggered when there is no ongoing update.
4) An update is triggered only of the previous one is older than 10s.
5) An update is triggered when the group has unresolved regular expressions.
6) An update is triggered when the metadata image has new topics.
Reviewers: Jeff Kim <jeff.kim@confluent.io>
ShareConsumerTest.testShareAutoOffsetResetDefaultValue has been tightened up by making sure that records produced have been flushed before starting consumption. A possible but unlikely race condition seems the source of the flakiness and this should now be eliminated in the previous PR to this test case.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
KIP-1043 introduced Admin.listGroups as the way to list all types of groups. As a result, Admin.listShareGroups has been removed. This PR is the final step of the removal.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This PR introduces the unified GroupState enum for all group types from KIP-1043. This PR also removes ShareGroupState and begins the work to replace Admin.listShareGroups with Admin.listGroups. That will complete in a future PR.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>