In KAFKA-16424, we added a fallback logic to delete the logs, but the file has no parent. It'd be better we have some guard from it.
Signed-off-by: PoAn Yang <payang@apache.org>
Reviewers: Luke Chen <showuon@gmail.com>
Currently the code in ShareConsumeRequestManager works on the basis that there can only be one commitSync()/close() at a time. But there is a chance these calls timeout on the application thread, but are still sent later on the background thread. This will mean the incoming commitSync()/close() will not be processed, resulting in possible loss of acknowledgements.
To cover this case, we will now have a list of AcknowledgeRequestStates to store the commitSyncs() and a separate requestState to store the close(). This queue will be processed one by one until its empty. For close(), we are still assuming there can only be one active close() at a time.
eviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
Failed tasks discovered when removed from the state updater during assignment or revocation are added to the task registry. From there they are retrieved and handled as normal tasks. This leads to a couple of IllegalStateExceptions because it breaks some invariants that ensure that only good tasks are assigned and processed.
This commit solves this bug by distinguish failed from non-failed tasks in the task registry.
Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
Moving reviewers.py and kafka-merge-pr.py into committer-tools. Also include a new find-unfinished-test.py
script which can be used for finding hanging tests on Jenkins or Github Actions.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
As a follow-up of #13827, this patch updates the stale PR workflow to automatically close PRs that have not had activity in 120 days
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Josep Prat <josep.prat@aiven.io>
Introduced a share fetch purgatory on the broker which delays share fetch requests that cannot be completed instantaneously. Introduced 2 new classes -
DelayedShareFetch - Contains logic to instantaneously complete or force complete a share fetch request on timeout.
DelayedShareFetchKey - Contains the key which can be used to watch the entries within the share fetch purgatory.
ShareFetchUtils - This utility class contains functionalities required for post-processing once the replica manager fetch is completed.
There are many scenarios which can cause a share fetch request to be delayed and multiple scenarios when a delayed share fetch can be attempted to be completed. In this PR, we are only targeting the case when record lock partition limit is reached, the ShareFetch should wait for up to MaxWaitMs for records to be released.
Reviewers: David Arthur <mumrah@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Jun Rao <junrao@gmail.com>
This is the equivalent of #16755 for the share group consumer.
The node request-latency-max and request-latency-avg were not being recorded and thus reported as NaN for the share group consumer.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
When KIP-714 was developed, the entity type of client-metrics was added to the kafka-configs.sh tool. The idea was to have two forms of specifying the name and type of a client metrics config resource, either --entity-type client-metrics --entity-name NAME or --client-metrics NAME. This style of alias is used for all of the entity types. Unfortunately, the --client-metrics form was not implemented. This PR corrects that and adds more tests.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, DL1231 <53332773+DL1231@users.noreply.github.com>, Chia-Ping Tsai <chia7712@gmail.com>
There is a race condition between KRaftMigrationDriver running its first poll() and being notified by Raft about a leader change. If onControllerChange is called before RecoverMigrationStateFromZKEvent is run, we will end up getting stuck in the INACTIVE state.
This patch fixes the race by enqueuing a RecoverMigrationStateFromZKEvent from onControllerChange if the driver has not yet initialized. If another RecoverMigrationStateFromZKEvent was already enqueued, the second one to run will just be ignored.
Reviewers: Luke Chen <showuon@gmail.com>
This PR implements exponential backoff for state directory lock to increase the time between two consecutive attempts of acquiring the lock.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
Add a Python script that analyzes our Git history to find top contributors. This can be used by committers to update
the list of contributors in .asf.yaml without a lot of tedious effort.
Co-authored-by: stevenbooke <steviebeee55@gmail.com>
Co-authored-by: Joao Pedro Fonseca <fonsdant@gmail.com>
Reviewers: David Arthur <mumrah@gmail.com>
Instead of attempting to set statuses on the head repo of a PR, the CI Complete workflow should create statuses on the commit in apache/kafka.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
The 3.8 controller assumes the unknown features have min version = 0, but KAFKA-17011 replace the min=0 by min=1 when BrokerRegistrationRequest < 4. Hence, to support upgrading from 3.8.0 to 3.9, this PR changes the implementation of ApiVersionsResponse (<4) and BrokerRegistrationRequest (<4) to skip features with supported minVersion of 0 instead of replacing 0 with 1
Reviewers: Jun Rao <junrao@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
This patch fixes a few buts in the replay logic of the consumer group records:
* The first issue is that the logic assumed that the group or the member exists when tombstones are replayed. Obviously, this is incorrect after a restart. The group or the member may not me there anymore if the __consumer_offsets partitions only contains tombstones for the group or the member. The patch fixes this by considering tombstones as no-ops if the entity does not exist.
* The second issue is that the logic assumed that consumer group records are always in a specific order in the log so the logic was only accepting to create a consumer group when `ConsumerGroupMemberMetadata` record is replayed. This is obviously incorrect too. During the life time of a consumer group, the records may be in different order. The patch fixes this by allowing the creating of a consumer group by any record.
* The third issue is that it is possible to replay offset commit records for a specific consumer group before the consumer group is actually created while replying its records. By default the OffsetMetadataManager creates a simple classic group to hold those offset commits. When the consumer offset records are finally replayed, the logic will fail because a classic group already exists. The patch fixes this by converting a simple classic group when records for a consumer group are replayed.
All those combinations are hard to test with unit tests. This patch adds an integration tests which reproduces some of those interleaving of records. I used them to reproduce the issues describe above.
Reviewers: TengYao Chi <kitingiao@gmail.com>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
Trunk builds are run off of "push" events rather than "pull_request". We were missing some logic in the is-public-fork condition that mistakenly caused some trunk builds to skip the build scan.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Newline characters in the failure message of tests were causing the Markdown tables to be malformed.
This patch fixes that by replacing newlines with "<br>" tags and escaping other HTML that may appear in message.
Reviewers: David Arthur <mumrah@gmail.com>
Tighten the condition to only run the ci-complete workflow if the triggering run was success or failure. Also,
add a status check failure if the PR did not produce the expected build scans.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Introduces the share coordinator. This coordinator is built on the new coordinator runtime framework. It
is responsible for persistence of share-group state in a new internal topic named "__share_group_state".
The responsibility for being a share coordinator is distributed across the brokers in a cluster.
Reviewers: David Arthur <mumrah@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>