Commit Graph

13891 Commits

Author SHA1 Message Date
David Arthur d38a90df2b
KAFKA-17672 Run quarantined tests separately (#17329)
Introduce new quarantinedTest that excludes tests tagged with "flaky". Also introduce two new build parameters "maxQuarantineTestRetries" and "maxQuarantineTestRetryFailures".

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-06 14:09:24 +08:00
Chia-Ping Tsai f1d6549679
KAFKA-17706 Allow all imports for test-common and test-common-api (#17378)
Reviewers: David Arthur <mumrah@gmail.com>
2024-10-06 14:02:10 +08:00
Bill Bejeck 930f165546
KAFKA-17248: Add reporter for adding thread metrics to telemetry pipeline and a test [2/N] (#17376)
This PR adds a Reporter instance that will add streams thread metrics to the telemetry pipeline.
For testing, the PR adds a unit test.

Reviewers: Matthias Sax <mjsax@apache.org>
2024-10-05 18:28:31 -04:00
TaiJuWu 3bb408c4de
MINOR: rename ConfigName.RENAME to ConfigName.RENAMES for replaceFiled (#17369)
Reviewers: Andrew Schofield <aschofield@confluent.io>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-10-05 05:50:48 +08:00
TaiJuWu 529095ba34
KAFKA-17542: Automatically label small PRs (#17260)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>
2024-10-04 16:31:31 -04:00
Justine Olshan c3f13b5c57
KAFKA-16308 [4/4]: Add release-version flag to upgrade and downgrade commands (#17362)
I've added the release-version flag to the upgrade and downgrade commands. I've also added tests.

While working on this, I realized that we reveal non-production features to be returned in the version-mapping and dependencies commands. I have changed this to only return production features (except in tests) and added tests for this.

Reviewers: Jun Rao <jun@confluent.io>
2024-10-04 13:03:54 -07:00
Bill Bejeck c11a38f9df
KAFKA-17248: KIP-1076 add admin client test helper [1/N] (#17375)
No functional changes, this PR contains a test-helper class for working with AdminClient

Reviewers Matthias Sax <mjsax@apache.org>
2024-10-04 13:58:16 -04:00
Abhinav Dixit 455c79c339
KAFKA-17509: Introduce a delayed action queue to complete purgatory actions outside purgatory. (#17177)
Add purgatory actions to DelayedActionQueue when partition locks are released after fetch in forceComplete. 

Reviewers: David Arthur <mumrah@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>, Jun Rao <junrao@gmail.com>
2024-10-04 09:32:24 -07:00
José Armando García Sancio 16186eabcd
KAFKA-16927; Handle expanding leader endpoints (#17363)
When a replica restarts in the follower state it is possible for the set of leader endpoints to not match the latest set of leader endpoints. Voters will discover the latest set of leader endpoints through the BEGIN_QUORUM_EPOCH request. This means that KRaft needs to allow for the replica to transition from Follower to Follower when only the set of leader endpoints has changed.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Alyssa Huang <ahuang@confluent.io>
2024-10-04 10:51:43 -04:00
Kamal Chandraprakash 5b3027dfcb
KAFKA-15859: Fix the Unsupported version error when new admin connects to old broker (#17358)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Luke Chen <showuon@gmail.com>
2024-10-04 13:47:40 +05:30
Dongnuo Lyu cbc02e006d
KAFKA-16106; Schedule timeout task to refresh classic group size metric (#17325)
In the existing implementation, If an operation modifying the classic group state fails, the group reverts but the group size counter does not. This creates an inconsistency between the group size metric and the actual group size.

Considering that It will be complicated to rely on the appendFuture to revert the metrics upon the operation failure, this PR introduces a new implementation. A timeout task will periodically refresh the metrics based on the current groups soft state. The refreshing interval is hardcoded to 60 seconds.

Reviewers: David Jacot <djacot@confluent.io>
2024-10-04 00:31:06 -07:00
Federico Valeri c8cfb4c7f1
KAFKA-17428: Add retry mechanism for cleaning up dangling remote segments (#17335)
This change introduces a retry mechanism for cleaninig up remote segments that failed the copy to remote storage.
It also makes sure that we always update the remote segment state whenever we attempt a deletion.

When a segment copy fails, we immediately try to delete the segment, but this can also fail.
The RLMExpirationTask is now also responsible for retring dangling segments cleanup.

This is how a segment state is updated in the above case:

1. COPY_SEGMENT_STARTED (copy task fails)
2. DELETE_SEGMENT_STARTED (copy task cleanup also fails)
3. DELETE_SEGMENT_STARTED (expiration task retries; self state transition)
4. DELETE_SEGMENT_FINISHED (expiration task completes)
5. COPY_SEGMENT_STARTED (copy task retries)
6. COPY_SEGMENT_FINISHED (copy task completes)

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Luke Chen <showuon@gmail.com>
2024-10-04 11:20:07 +08:00
TaiJuWu 894c4a9691
KAFKA-17525 Convert the UnknownServerException to InvalidRequestException when altering client-metrics config at runtime (#17168)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-10-04 10:19:54 +08:00
Chia-Chuan Yu 93e27c7413
KAFKA-17658 Move BootstrapControllersIntegrationTest to kafka.server (#17356)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-04 10:13:46 +08:00
David Arthur bbbf688f55
KAFKA-17684 Update our java build versions (#17350)
This updates the versions of Java we test on from 8 and 21 to 11 and 21. This also removes unnecessary Check and Compile Java variations.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-04 09:46:27 +08:00
Colin Patrick McCabe dbd50ff847
KAFKA-16469: Metadata schema checker (#15995)
Create a schema checker that can validate that later versions of a KRPC schema are compatible with earlier ones.

Reviewers: David Arthur <mumrah@gmail.com>
2024-10-03 12:13:38 -07:00
Colin Patrick McCabe 85bfdf4127
KAFKA-17613: Remove ZK migration code (#17293)
Remove the controller machinery for doing ZK migration in Kafka 4.0.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>
2024-10-03 12:01:14 -07:00
Ken Huang 8b5d755bf6
KAFKA-17685 Define common testRuntimeOnly dependencies (#17355)
Reviewers: David Arthur <mumrah@gmail.com>
2024-10-03 15:00:30 -04:00
Colin Patrick McCabe 0edf5dbd20
KAFKA-16649: Remove lock from DynamicBrokerConfig.removeReconfigurable (#15838)
Do not acquire the DynamicBrokerConfig lock in DynamicBrokerConfig.removeReconfigurable. It's not
necessary, because the list that these functions are modifying is a thread-safe
CopyOnWriteArrayList.  In DynamicBrokerConfig.reloadUpdatedFilesWithoutConfigChange, I changed the
code to use a simple Java forEach rather than a Scala conversion, in order to feel more confident
that concurrent modifications to the List would not have any bad effects here. (forEach is always
safe on CopyOnWriteArrayList.)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, David Arthur <mumrah@gmail.com>
2024-10-03 09:25:17 -07:00
Sushant Mahajan d173842d36
KAFKA-17469: Move persister related classes to persister pkg. (#17349)
Reviewers: Andrew Schofield <aschofield@confluent.io>, David Arthur <mumrah@gmail.com>
2024-10-03 11:00:22 -04:00
Lianet Magrans 1962917436
KAFKA-17674: Fix bug on update positions of newly added partitions (#17342)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-03 13:10:54 +02:00
Sean Quah 99e1d8fbb3
MINOR: Cache topic resolution in TopicIds set (#17285)
Looking up topics in a TopicsImage is relatively slow. Cache the results
in TopicIds to improve assignor performance. In benchmarks, we see a
noticeable improvement in performance in the heterogeneous case.

Before
```
Benchmark                                       (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionType)  (topicCount)  Mode  Cnt    Score   Error  Units
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10         HOMOGENEOUS          1000  avgt    5   36.400 ± 3.004  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10       HETEROGENEOUS          1000  avgt    5  158.340 ± 0.825  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS          1000  avgt    5    1.329 ± 0.041  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10       HETEROGENEOUS          1000  avgt    5  382.901 ± 6.203  ms/op
```

After
```
Benchmark                                       (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionType)  (topicCount)  Mode  Cnt    Score   Error  Units
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10         HOMOGENEOUS          1000  avgt    5   36.465 ± 1.954  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10       HETEROGENEOUS          1000  avgt    5  114.043 ± 1.424  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS          1000  avgt    5    1.454 ± 0.019  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10       HETEROGENEOUS          1000  avgt    5  342.840 ± 2.744  ms/op
```

---

Based heavily on https://github.com/apache/kafka/pull/16527.

Reviewers: David Arthur <mumrah@gmail.com>, David Jacot <djacot@confluent.io>
2024-10-03 00:40:25 -07:00
Chung, Ming-Yen 696e33ee6d
KAFKA-17451 Remove deprecated consumer#committed (#17320)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-03 13:53:32 +08:00
Eric Chang 63d65c6899
KAFKA-17511 Move ElectLeadersRequestOps to ElectLeadersRequest (#17312)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-03 13:27:58 +08:00
Andrew Schofield e3e3f25cc4
KAFKA-17500: Metadata redirection for NOT_LEADER_OR_FOLLOWER (#17279)
This PR implements the metadata redirection feature of the ShareFetch and ShareAcknowledge responses where an error code of NOT_LEADER_OR_FOLLOWER or FENCED_LEADER_EPOCH along with current leader information in the response is used to optimise handling of leadership changes in the client. This is applying the logic of KIP-951 to share group consumers.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2024-10-03 10:51:34 +05:30
bboyleonp666 8be808ea4a
KAFKA-17285 Consider using `Utils.closeQuietly` to replace `CoreUtils.swallow` when handling Closeable objects (#16843)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-03 10:45:01 +08:00
Federico Valeri d0ad84df5d
MINOR: producer perf improvements (#17348)
Adding some missing input checks and fixing a formatting issue.

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

Reviewers: Luke Chen <showuon@gmail.com>
2024-10-03 10:29:19 +08:00
Chia-Ping Tsai 979740b49d
KAFKA-17589 Move JUnit extensions to test-common module (#17318)
This patch completely removes the compile-time dependency on core for both test and main sources by introducing two new modules.

1) `test-common` include all the common test implementation code (including dependency on :core for BrokerServer, ControllerServer, etc)
2) `test-common:api` new sub-module that just includes interfaces including our junit extension

Reviewers: David Arthur <mumrah@gmail.com>
2024-10-03 10:28:37 +08:00
Matthias J. Sax 22a14d75c2
MINOR: improve KafkaStreams.State JavaDocs (#17351)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-03 10:23:50 +08:00
David Arthur be1929c44c
KAFKA-17680 Add timeout to streams test teardown (#17346)
When calling KafkaStreams#close from teardown methods in integration tests, we need to pass timeout to avoid potentially blocking forever during teardown.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2024-10-02 16:36:21 -04:00
Justine Olshan ae6e53fab2
MINOR: Fix MockAdminClient to match the server side update features handling. (#17343)
49d7ea6 updated the behavior of the UpdateFeaturesRequest/Response, but the MockAdminClient did not reflect those changes.

Now if any feature fails, all the features fail and the correct message is written in the result. Also only update the features if all features are successful and the command is not validate only.

Reviewers: Jun Rao <jun@confluent.io>
2024-10-02 13:20:44 -07:00
Kuan-Po Tseng b480135b4f
KAFKA-16974 KRaft support in SslAdminIntegrationTest (#17251)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-03 01:10:53 +08:00
Chung, Ming-Yen 540fb91103
KAFKA-17258 Migrate AdminFenceProducersIntegrationTest to ClusterTestExtensions framework (#17311)
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-10-03 00:47:29 +08:00
Sushant Mahajan 7b7eb6243f
KAFKA-17367: Share coordinator persistent batch merging algorithm. [3/N] (#17149)
This patch introduces a merging algorithm for persistent state batches in the share coordinator. 

The algorithm removes any expired batches (lastOffset before startOffset) and then places the rest in a sorted map. It then identifies batch pairs which overlap and combine them while preserving the relative priorities of any intersecting sub-ranges. The resultant batches are placed back into the map. The algorithm ends when no more overlapping pairs can be found.

Reviewers: Andrew Schofield <aschofield@confluent.io>, David Arthur <mumrah@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>, Jun Rao <junrao@gmail.com>
2024-10-02 11:30:51 -04:00
Andrew Schofield 12a16ecf28
KAFKA-16733: Add share group record support to OffsetsMessageParser (#17282)
This patch adds support for decoding the new KIP-932 record schemas in kafka-dump-log.sh

Reviewers: David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-10-02 10:39:37 -04:00
David Arthur 2547d750a3
MINOR Update asf.yaml collaborators (#17345)
Reviewers: Josep Prat <josep.prat@aiven.io>
2024-10-02 10:25:02 -04:00
xijiu cfd7d94108
KAFKA-17657 Replace the consumer-fetch-manager-metrics by groupName (#17317)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-02 12:38:40 +08:00
Justine Olshan 49d7ea6c6a
KAFKA-16308 [3/N]: Introduce feature dependency validation to UpdateFeatures command (#16443)
This change includes:

1. Dependency checking when updating the feature (all request versions)
2. Returning top level error and no feature level errors if any feature failed to update and using this error for all the features in the response. (all request versions)
3. Returning only top level none error for v2 and beyond

Reviewers: Jun Rao <jun@confluent.io>
2024-10-01 14:21:38 -07:00
Matthias J. Sax 4312ce6d25
MINOR: improve RecordCollectorImpl (#17185)
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-10-01 12:33:42 -07:00
陳昱霖(Yu-Lin Chen) 4c90d3518b
KAFKA-17646 Fix flaky KafkaStreamsTest.testStateGlobalThreadClose (#17310)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-02 02:18:13 +08:00
Mickael Maison 7fb25a2b06
KAFKA-16769 Remove add.source.alias.to.metrics configuration (#17323)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-02 02:03:02 +08:00
David Arthur 5377595a5f
KAFKA-17673 Switch back to statuses API (#17336)
This reverts parts of #17299 related to the checks API 

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-01 13:57:38 -04:00
Chung, Ming-Yen e136d7611c
KAFKA-17656 Replace string concatenation with parameterized logging for PartitionChangeBuilder (#17334)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-02 01:53:39 +08:00
Joao Pedro Fonseca Dantas 84bcdc95af
KAFKA-17540: Create script for updating a reference of latest cached trunk commit (#17204)
Uses the `gh` CLI to find the latest trunk commit which has been cached by GitHub actions. By basing PRs off of this
ref rather than HEAD, we will see fewer cache misses in our CI builds.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>
2024-10-01 10:06:10 -04:00
Logan Zhu 06d2649e2e
KAFKA-17606: Include Rat errors in GitHub workflow summary (#17280)
Reviewers: David Arthur <mumrah@gmail.com>
2024-10-01 09:47:07 -04:00
Cheryl Simmons 35f55a84fe
MINOR: fixing formatting of control.plane.listener.name.doc (#17307)
This fixes some formatting issues with the control.plane.listener.format.name property. It was missing some new lines and code markup.

For testing, I built locally and viewed the output.

Reviewers: Justine Olshan <jolshan@confluent.io>
2024-09-30 15:12:26 -07:00
Greg Harris 818ee8a581
KAFKA-17078: Add SecurityManagerCompatibility shim (#16522)
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewers: José Armando García Sancio <jsancio@apache.org>, Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Chris Egerton <fearthecellos@gmail.com>, Luke Chen <showuon@gmail.com>, David Arthur <mumrah@gmail.com>
2024-09-30 08:06:14 -07:00
Andrew Schofield 800de133bf
KAFKA-17634 Tweak wakeup logic to match WakeupTrigger changes (#17304)
WakeupTrigger was refactored as a result of changes in AsyncKafkaConsumer. This PR makes the equivalent changes in ShareConsumerImpl.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-09-30 21:50:44 +08:00
Alyssa Huang e27d0dfb17
MINOR: Fix kafkatest advertised listeners (#17294)
Followup for #17146

Reviewers: Bill Bejeck <bbejeck@apache.org>
2024-09-30 08:51:49 -04:00
Alieh Saeedi bb112570ae
KAFKA-17109: Move lock backoff retry to streams TaskManager (#17209)
This PR implements exponential backoff for failed initializations of tasks due to lock exceptions. It increases the time between two consecutive attempts of initializing the tasks.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2024-09-30 13:30:54 +02:00