The StreamsGroupHeartbeat API is the new core API used by streams application to form a group. The API allows members to initialize a topology, advertise their state, and their owned tasks. The group coordinator uses it to assign/revoke tasks to/from members. This API is also used as a liveness check.
This change adds the JSON definition of the RPC, as defined in KIP-1071.
Reviewers: Bruno Cadonna <cadonna@apache.org>
Migrates KTableSuppressProcessorSupplier to use the the ProcessorSupplier#stores() method
Reviewers: Guozhang Wang <guozhang.wang.us@gmail.com>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
This patch transitions the KTable#filter implementation to provide the materialized store via the ProcessorSupplier so that it can be wrapped by the processor wrapper if the wrapper is configured
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
The issue has been fixed by https://issues.apache.org/jira/browse/KAFKA-18188. We can re-enable the test with the CONSUMER protocol.
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Andrew Schofield <aschofield@confluent.io>
KIP-848 will be release as GA in Apache Kafka 4.0. Hence we need to mark all the related public apis as stable.
Reviewers: Jeff Kim <jeff.kim@confluent.io>
Instead of parsing the GE_ACCESS_TOKEN, the patch uses the newly minted DV_API_ACCESS secret for Develocity API access.
Reviewers: Mickael Maison <mickael.maison@gmail.com>
* Avoid attaching empty writes to empty batches.
* Handle flushes of empty batches, which would return a 0 offset otherwise.
Reviewers: David Jacot <djacot@confluent.io>
When a static member rejoins the group, the group state is rewritten to the partition in order to persist the change. If the write fails, the change is reverted. However, this is done without acquiring the group lock.
This is only try in the old group coordinator. The new one does not suffer from this issue.
Reviewers: Jeff Kim <jeff.kim@confluent.io>
Kafka consumer supports auto.offset.reset config option, which is used when there is no initial offset in Kafka (or) if the current offset does not exist any more on the server. This config currently supports earliest/latest/none options. Currently consumer resets might force applications to reprocess large amounts of data from earlier offsets. With infinite storage, its beneficial to have a duration based offset reset strategy. This will allow applications to consume/initialise from a fixed duration when there is no initial offset in Kafka.
As part of KIP-932, we are adding support for share consumer groups. Share consumer groups supports dynamic group configuration property share.auto.offset.reset. This is used to set the initial Share-Partition Start Offset (SPSO) based on the share.auto.offset.reset configuration. Currently share.auto.offset.reset supports earliest and latest options to automatically reset the offset
Similar to the Kafka Consumer, we will add support for by_duration: config value for share.auto.offset.reset.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>
KAFKA-18026: supply stores for KTable#mapValues using ProcessorSupplier#stores
Reviewers: Guozhang Wang <guozhang.wang.us@gmail.com>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
- Disallow configuring -1 for copier and expiration thread pools dynamically
Co-authored-by: Peter Lee <peterxcli@gmail.com>
Reviewers: Peter Lee <peterxcli@gmail.com>, Satish Duggana <satishd@apache.org>
Appears to be a typo in the code, since the error message indicates this check is for taskId being null, but instead we accidentally check the streams metrics twice
Reviewers: Matthias Sax <mjsax@apache.org>, runo Cadonna <cadonna@apache.org>, Lucas Brutschy <lbrutschy@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
We have observed the below errors in some cluster:
Uncaught exception in scheduled task 'handleTxnCompletion-902667' exception.message:Trying to complete a transactional offset commit for producerId *** and groupId *** even though the offset commit record itself hasn't been appended to the log.
When a transaction is completed, the transaction coordinator sends a WriteTxnMarkers request to all the partitions involved in the transaction to write the markers to them. When the broker receives it, it writes the markers and if markers are written to the __consumer_offsets partitions, it informs the group coordinator that it can materialize the pending transactional offsets in its main cache. The group coordinator does this asynchronously since Apache Kafka 2.0, see this patch.
The above error appends when the asynchronous operation is executed by the scheduler and the operation finds that there are pending transactional offsets that were not written yet. How come?
There is actually an issue is the steps described above. The group coordinator does not wait until the asynchronous operation completes to return to the api layer. Hence the WriteTxnMarkers response may be send back to the transaction coordinator before the async operation is actually completed. Hence it is possible that the next transactional produce to be started also before the operation is completed too. This could explain why the group coordinator has pending transactional offsets that are not written yet.
There is a similar issue when the transaction is aborted. However on this path, we don't have any checks to verify whether all the pending transactional offsets have been written or not so we don't see any errors in our logs. Due to the same race condition, it is possible to actually remove the wrong pending transactional offsets.
PS: The new group coordinator is not impacted by this bug.
Reviewers: Justine Olshan <jolshan@confluent.io>
Introduces v2 of Vote RPC and implements the handling of the new version of the RPC.
Many references to "candidate" in the Vote RPC are changed to the more generic "replica". Replicas sending Vote request with PreVote set to true are not candidate. They are instead prospective candidate that are attempting to become candidate.
Replicas receiving PreVote requests (vote request with PreVote=true) with an epoch equal to their own will _not_ transition to Unattached state. They will only grant the vote if they have not recently fetched from leader and the request's last epoch and offset are up-to-date with theirs.
If a replica receives a PreVote request with an epoch greater than their current epoch, they will transition to Unattached state (setting their epoch to the one from the pre-vote request) and then grant the vote if the request's last epoch and offset are up-to-date with theirs.
To avoid a possible ping-pong scenario. For example, there is 3 node quorum, leader node A disconnects from quorum, node B goes into prospective state first before node C, node B sends pre-vote request to node C still in follower state and receives back that node A is leader, node B transitions to follower while node C transitions to prospective after election timeout. If you repeat this interaction, it is possible for such replicas to transition from Follower to Prospective in perpetuity. This issue is resolved by having follower state nodes grant pre-vote requests only if they have successfully fetched from the leader at least once after becoming a follower.
This change introduces a new suite called KafkaRaftClientPreVoteTest, for additional KRaft protocol tests with respect to pre-vote.
Reviewers: José Armando García Sancio <jsancio@apache.org>