In the response handler for ShareAcknowledge, we are passing the clientResponse.receivedTimeMs() to the handler methods. But when there is a disconnect or when the response received is null, we should be passing the current time instead.
This bug was causing consumer to hang as it did not call the handler methods on disconnect, and further requests were blocked waiting for its completion.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
1) Bump validVersions of ConsumerGroupDescribeRequest.json and ConsumerGroupDescribeResponse.json to "0-1".
2) Add MemberType field to ConsumerGroupDescribeResponse.json. Default value is -1 (unknown). 0 is for classic member and 1 is for consumer member.
3) When ConsumerGroupMember#useClassicProtocol is true, return MemberType field as 0. Otherwise, return 1.
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
There are two issues in KAFKA-18060:
1) New coordinator can't handle the TxnOffsetCommitRequest with empty member id, and TxnOffsetCommitRequest v0-v2 do definitely has an empty member ID, causing ConsumerGroup#validateOffsetCommit to throw an UnknownMemberIdException. This prevents the old producer from calling sendOffsetsToTransaction. Note that TxnOffsetCommitRequest versions v0-v2 are included in KIP-896, so it seems the new coordinator should support that operations
2) The deprecated API Producer#sendOffsetsToTransaction does not use v0-v2 to send TxnOffsetCommitRequest with an empty member ID. Unfortunately, it has been released for a while. Therefore, the new coordinator needs to handle TxnOffsetCommitRequest with an empty member ID for all versions.
Taken from the two issues above, we need to handle empty member id in all API versions when new coordinator are dealing with TxnOffsetCommitRequest.
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
Update AutoOffsetResetStrategy.java to support duration based reset strategy
Update OffsetFetcher related classes and unit tests
Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
- integration tests for new subscribe api with RE2J pattern
- fix to ensure all topics are included in metadata requests when consumer is subscribed to RE2J pattern
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
Introduce ShareMemberDescription and ShareMemberAssignment as distinct classes for share groups. Although the correspondence with consumer groups is fairly close, the concepts are likely to diverge over time and separating these concepts now makes sense.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
There was a bug in handling the ShareAcknowledgeResponse for commitAsync(). Currently after we receive a response, we send out a background event to the application thread to update the acknowledgement commit callbacks for EVERY TopicIdPartition.
The map that was sent was not cleared after sending the event. This meant we ended up sending responses for partitions that were already sent in the previous event. So there will be duplicate calls to the callback.
The PR fixes the bug and adds a unit test for the same.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
When a KafkaShareConsumer is constructed in AK 4.0, a log message is written warning about the early access nature of the feature.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
With KAFKA-14562, we implemented epoch bump on both the client and the server. Mentioned below are the different epoch bump scenarios we have on hand after enabled tv2
Non-Transactional Producers
• Epoch bumping is always allowed.
• Different code paths are used to handle epoch bumping.
Transactional Producers
No Epoch Bump Allowed
• coordinatorSupportsBumpingEpoch = false when initPIDVersion < 3 or initPIDVersion = null.
Client-Triggered Epoch Bump Allowed
• coordinatorSupportsBumpingEpoch = true when initPIDVersion >= 3.
• TransactionVersion2Enabled = false when endTxnVersion < 5.
Only Server-Triggered Epoch Bump Allowed
• TransactionVersion2Enabled = true and endTxnVersion >= 5.
We want to refine the code and make it more structured to correctly handle epoch bumping in the above mentioned cases.
The changes made in this patch are:
Rename epochBumpRequired to epochBumpTriggerRequired to symbolize a manual epoch bump request from the client
Modify canEpochBump method according to the above mentioned scenarios
Reviewers: Artem Livshits <alivshits@confluent.io>, Calvin Liu <caliu@confluent.io>, Justine Olshan <jolshan@confluent.io>
When disconnecting channels before rebootstrapping due to the rebootstrap conditions introduced in KIP-1102, we should ensure that inflight requests are aborted similar to other disconnections like request timeout in clients. With the earlier rebootstrapping from KIP-899, we only rebootstrapped when there were no connections, so no disconnections are required.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
KIP-1043 introduced Admin.listGroups as the way to list all types of groups. As a result, Admin.listShareGroups has been removed. This PR is the final step of the removal.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
- Deprecates OffsetResetStrategy enum
- Adds new internal class AutoOffsetResetStrategy
- Replaces all OffsetResetStrategy enum usages with AutoOffsetResetStrategy
- Deprecate old/Add new constructors to MockConsumer
Reviewers: Andrew Schofield <aschofield@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This PR introduces the unified GroupState enum for all group types from KIP-1043. This PR also removes ShareGroupState and begins the work to replace Admin.listShareGroups with Admin.listGroups. That will complete in a future PR.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
What
There was a bug in handling piggyback acknowledgements in ShareConsumeRequestManager, where the fetchAcknowledgementsMap could be updated when the request was in flight and when the ShareFetch response is received, we were removing any acknowledgements(without actually sending them) which came when the request was in flight.
Fix
Now we are maintaining 2 separate maps(one which has the acknowledgements to send and one which keeps track of the acknowledgements in flight).
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Admin API operations have two phases: lookup and fulfilment. The lookup phase involves a METADATA request whose details depend upon the operation being performed.
For some operations, the METADATA request can be quite expensive to serve. For example, if the user calls Admin.listOffsets for 1000 topics, the METADATA request will include all 1000 topics and the response will contain the leader information for all of these topics. And then the actual fulfilment phase does the real work of the operation.
In cases where a long-running application is performing repeated admin operations which need the same metadata information about partition leadership, it is not necessary to send the METADATA request for every single admin operation.
This PR adds a cache of the mapping from topic-partition to leader id to the admin client. The cache doesn't need to be very sophisticated because the admin client will retry if the information becomes stale, and the cache can be updated as a result of the retry.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This patch contains changes to the handling of the INVALID_PRODUCER_ID_MAPPING error.
Quoted from KIP-890
Since we bump epoch on abort, we no longer need to call InitProducerId to fence requests. InitProducerId will only be called when the producer starts up to fence a previous instance.
With this change, some other calls to InitProducerId were inspected including the call after receiving an InvalidPidMappingException. This exception was changed to abortable as part of KIP-360: Improve reliability of idempotent/transactional producer. However, this change means that we can violate EOS guarantees. As an example:
Consider an application that is copying data from one partition to another
Application instance A processes to offset 4
Application instance B comes up and fences application instance A
Application instance B processes to offset 5
Application instances A and B are idle for transaction.id.expiration.ms, transaction id expires on server
Application instance A attempts to process offset 5 (since in its view, that is next) -- if we recover from invalid pid mapping, we can duplicate this processing
Thus, INVALID_PID_MAPPING should be fatal to the producer.
This is consistent with KIP-1050: Consistent error handling for Transactions where errors that are fatal to the producer are in the "application recoverable" category. This is a grouping that indicates to the client that the producer needs to restart and recovery on the application side is necessary. KIP-1050 is approved so we are consistent with that decision.
This PR also fixes the flakiness of TransactionsExpirationTest.
Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>, Calvin Liu <caliu@confluent.io>
Remove or update custom display names to make sure we actually include the test method as the first part of the display name.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Bill Bejeck <bill@confluent.io>
Implementation of https://cwiki.apache.org/confluence/display/KAFKA/KIP-1102%3A+Enable+clients+to+rebootstrap+based+on+timeout+or+error+code
- Introduces rebootstrap trigger interval config metadata.recovery.rebootstrap.trigger.ms, set to 5 minutes by default
- Makes rebootstrap the default for metadata.recovery.strategy
- Adds new error code REBOOTSTRAP_REQUIRED, introduces top-level error code in metadata response. On this error, clients rebootstrap.
- Configs apply to producers, consumers, share consumers, admin clients, Connect and KStreams clients.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
RoundRobinPartitioner does not handle the fact that on new batch creation, the partition method is called twice.
Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Mickael Maison <mickael.maison@gmail.com>