Introduce new quarantinedTest that excludes tests tagged with "flaky". Also introduce two new build parameters "maxQuarantineTestRetries" and "maxQuarantineTestRetryFailures".
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This PR adds a Reporter instance that will add streams thread metrics to the telemetry pipeline.
For testing, the PR adds a unit test.
Reviewers: Matthias Sax <mjsax@apache.org>
I've added the release-version flag to the upgrade and downgrade commands. I've also added tests.
While working on this, I realized that we reveal non-production features to be returned in the version-mapping and dependencies commands. I have changed this to only return production features (except in tests) and added tests for this.
Reviewers: Jun Rao <jun@confluent.io>
Add purgatory actions to DelayedActionQueue when partition locks are released after fetch in forceComplete.
Reviewers: David Arthur <mumrah@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>, Jun Rao <junrao@gmail.com>
When a replica restarts in the follower state it is possible for the set of leader endpoints to not match the latest set of leader endpoints. Voters will discover the latest set of leader endpoints through the BEGIN_QUORUM_EPOCH request. This means that KRaft needs to allow for the replica to transition from Follower to Follower when only the set of leader endpoints has changed.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Alyssa Huang <ahuang@confluent.io>
In the existing implementation, If an operation modifying the classic group state fails, the group reverts but the group size counter does not. This creates an inconsistency between the group size metric and the actual group size.
Considering that It will be complicated to rely on the appendFuture to revert the metrics upon the operation failure, this PR introduces a new implementation. A timeout task will periodically refresh the metrics based on the current groups soft state. The refreshing interval is hardcoded to 60 seconds.
Reviewers: David Jacot <djacot@confluent.io>
This change introduces a retry mechanism for cleaninig up remote segments that failed the copy to remote storage.
It also makes sure that we always update the remote segment state whenever we attempt a deletion.
When a segment copy fails, we immediately try to delete the segment, but this can also fail.
The RLMExpirationTask is now also responsible for retring dangling segments cleanup.
This is how a segment state is updated in the above case:
1. COPY_SEGMENT_STARTED (copy task fails)
2. DELETE_SEGMENT_STARTED (copy task cleanup also fails)
3. DELETE_SEGMENT_STARTED (expiration task retries; self state transition)
4. DELETE_SEGMENT_FINISHED (expiration task completes)
5. COPY_SEGMENT_STARTED (copy task retries)
6. COPY_SEGMENT_FINISHED (copy task completes)
Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Luke Chen <showuon@gmail.com>
This updates the versions of Java we test on from 8 and 21 to 11 and 21. This also removes unnecessary Check and Compile Java variations.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Create a schema checker that can validate that later versions of a KRPC schema are compatible with earlier ones.
Reviewers: David Arthur <mumrah@gmail.com>
Do not acquire the DynamicBrokerConfig lock in DynamicBrokerConfig.removeReconfigurable. It's not
necessary, because the list that these functions are modifying is a thread-safe
CopyOnWriteArrayList. In DynamicBrokerConfig.reloadUpdatedFilesWithoutConfigChange, I changed the
code to use a simple Java forEach rather than a Scala conversion, in order to feel more confident
that concurrent modifications to the List would not have any bad effects here. (forEach is always
safe on CopyOnWriteArrayList.)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, David Arthur <mumrah@gmail.com>
This PR implements the metadata redirection feature of the ShareFetch and ShareAcknowledge responses where an error code of NOT_LEADER_OR_FOLLOWER or FENCED_LEADER_EPOCH along with current leader information in the response is used to optimise handling of leadership changes in the client. This is applying the logic of KIP-951 to share group consumers.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Adding some missing input checks and fixing a formatting issue.
Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
Reviewers: Luke Chen <showuon@gmail.com>
This patch completely removes the compile-time dependency on core for both test and main sources by introducing two new modules.
1) `test-common` include all the common test implementation code (including dependency on :core for BrokerServer, ControllerServer, etc)
2) `test-common:api` new sub-module that just includes interfaces including our junit extension
Reviewers: David Arthur <mumrah@gmail.com>
When calling KafkaStreams#close from teardown methods in integration tests, we need to pass timeout to avoid potentially blocking forever during teardown.
Reviewers: Matthias J. Sax <matthias@confluent.io>
49d7ea6 updated the behavior of the UpdateFeaturesRequest/Response, but the MockAdminClient did not reflect those changes.
Now if any feature fails, all the features fail and the correct message is written in the result. Also only update the features if all features are successful and the command is not validate only.
Reviewers: Jun Rao <jun@confluent.io>
This patch introduces a merging algorithm for persistent state batches in the share coordinator.
The algorithm removes any expired batches (lastOffset before startOffset) and then places the rest in a sorted map. It then identifies batch pairs which overlap and combine them while preserving the relative priorities of any intersecting sub-ranges. The resultant batches are placed back into the map. The algorithm ends when no more overlapping pairs can be found.
Reviewers: Andrew Schofield <aschofield@confluent.io>, David Arthur <mumrah@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>, Jun Rao <junrao@gmail.com>
This patch adds support for decoding the new KIP-932 record schemas in kafka-dump-log.sh
Reviewers: David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This change includes:
1. Dependency checking when updating the feature (all request versions)
2. Returning top level error and no feature level errors if any feature failed to update and using this error for all the features in the response. (all request versions)
3. Returning only top level none error for v2 and beyond
Reviewers: Jun Rao <jun@confluent.io>
Uses the `gh` CLI to find the latest trunk commit which has been cached by GitHub actions. By basing PRs off of this
ref rather than HEAD, we will see fewer cache misses in our CI builds.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>
This fixes some formatting issues with the control.plane.listener.format.name property. It was missing some new lines and code markup.
For testing, I built locally and viewed the output.
Reviewers: Justine Olshan <jolshan@confluent.io>
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewers: José Armando García Sancio <jsancio@apache.org>, Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Chris Egerton <fearthecellos@gmail.com>, Luke Chen <showuon@gmail.com>, David Arthur <mumrah@gmail.com>
WakeupTrigger was refactored as a result of changes in AsyncKafkaConsumer. This PR makes the equivalent changes in ShareConsumerImpl.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This PR implements exponential backoff for failed initializations of tasks due to lock exceptions. It increases the time between two consecutive attempts of initializing the tasks.
Reviewer: Bruno Cadonna <cadonna@apache.org>