Commit Graph

14378 Commits

Author SHA1 Message Date
kevin-wu24 38aca3a045
KAFKA-17917: Convert Kafka core system tests to use KRaft (#17847)
- Remove some unused Zookeeper code

- Migrate group mode transactions, security rolling upgrade, and throttling tests to using KRaft

- Add KRaft downgrade tests to kraft_upgrade_test.py

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-11-21 13:40:49 -08:00
Colin Patrick McCabe 5fba067aaa
KAFKA-18063: SnapshotRegistry should not leak memory (#17898)
SnapshotRegistry needs to have a reference to all snapshot data structures. However, this should
not be a strong reference, but a weak reference, so that these data structures can be garbage
collected as needed. This PR also adds a scrub mechanism so that we can eventually reclaim the
slots used by GC'ed Revertable objects in the SnapshotRegistry.revertables array.

Reviewers: David Jacot <david.jacot@gmail.com>
2024-11-21 13:27:53 -08:00
Matthias J. Sax 240efbb99d
MINOR: improve JavaDocs for Kafka Streams exceptions and error handlers (#17856)
Reviewers: Bill Bejeck <bill@confluent.io>
2024-11-21 11:46:23 -08:00
Matthias J. Sax 2519e4af0c
KAFKA-18038: fix flakey test StreamThreadTest.shouldLogAndRecordSkippedRecordsForInvalidTimestamps (#17889)
With KAFKA-17872, we changed some internals that effects the conditions
of this test, introducing a race condition when the expected log
messages are printed.

This PR adds additional wait-conditions to the test to close the race
condition.

Reviewers: Bill Bejeck <bill@confluent.io>
2024-11-21 11:42:28 -08:00
Bill Bejeck 1c998f8ef3
KAFKA-17869: Adding tests to ensure KIP-1076 doesn't interfere with consumer metrics[1/3] (#17781)
Adding tests to ensure the KIP-1076 methods don't interfere with existing metrics in clients

Reviewers: Apoorv Mittal <amittal@confluent.io>, Matthias Sax <mjsax@apache.org>
2024-11-21 13:41:29 -05:00
Bill Bejeck f5781d59dd
Update streams docs with alive stream threads (#17868)
Add alive-stream-threads to Kafka Streams client metrics table
Reviewers: Matthias Sax <mjsax@apache.org>
2024-11-21 11:15:14 -05:00
Joao Pedro Fonseca Dantas 6c59e657c0
KAFKA-17640 Document Java 23 support and include release note (#17403)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-21 22:29:10 +08:00
Mickael Maison c0a092f562
MINOR: Cleanups in raft module (#17877)
Reviewers: Yash Mayya <yash.mayya@gmail.com>
2024-11-21 15:19:07 +01:00
Yash Mayya 4f1688742e
KAFKA-15387: Remove Connect's deprecated task configurations endpoint (#17412)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-11-21 19:43:54 +05:30
Joao Pedro Fonseca Dantas e9ccc2d6f5
KAFKA-16041: Replace Afterburn module with Blackbird (#17884)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-11-21 14:52:45 +01:00
Bill Bejeck fd9de50de1
KAFKA-18041: Update key for storing global consumer instance id for consistency (#17869)
This PR updates the key for storing the KIP-714 client instance id for the global consumer to follow a more consistent pattern of the other embedded Kafka Streams consumer clients.

Reviewers: Matthias Sax <mjsax@apache.org>
2024-11-20 16:14:03 -05:00
Abhinav Dixit aa7a3dbd30
KAFKA-18027: MINOR: Correct DelayedOperationPurgatory code around adding of an already completed operation (#17842)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Jun Rao <junrao@gmail.com>
2024-11-20 09:26:44 -08:00
Kuan-Po Tseng c6294aacef
KAFKA-17721 Enable to configure listener name and protocol for controller (#17525)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-20 23:06:29 +08:00
David Arthur e9fd0437d5
MINOR: Increase operations for stale PR workflow (#17854)
The Stale PRs workflow is only able to act on a relatively small number of PRs due to the API operations limit. This patch increases the limit from 100 to 500.

Reviewers: Josep Prat <josep.prat@aiven.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-11-20 19:50:19 +08:00
Dongnuo Lyu 9f7af93978
MINOR: Use JDK 17 in Vagrant after dropping JDK 8 (#17861)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-20 16:08:17 +08:00
David Arthur 441a6d0b79
MINOR fix test-catalog generation (#17866)
Fixes another issue introduced in #17725 where the streaming XML parser would skip over tests that followed a SKIPPED test. This caused a large number of tests to be removed from the test catalog e4a5eb8

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-19 15:37:41 -05:00
Cheryl Simmons 6dcca01efc
MINOR: Fixing typos in property definition: Adding an apostrophe and capitalizing ISR
Reviewers: Justine Olshan <jolshan@confluent.io>
2024-11-19 12:09:15 -08:00
Dongnuo Lyu 8ccb26de2e
KAFKA-17733: Protocol upgrade should allow empty member assignment in group conversion (#17853)
During conversion from classic to consumer group, if a member has empty assignment (e.g., the member just joined and has never synced), the conversion will fail because of the buffer underflow error when deserializing the member assignment. This patch allows empty assignment while deserializing the member assignment.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>
2024-11-19 10:46:07 -08:00
Mickael Maison b5158aa3ad
MINOR: Bump Netty to 4.1.115.Final (#17860)
Reviewers: Josep Prat <josep.prat@aiven.io>
2024-11-19 17:27:27 +01:00
Ken Huang a4cd94e4ef
MINOR: Fix the leak "unknown" `group.coordinator.rebalance.protocols` on documentation (#17834)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
2024-11-19 07:52:31 -08:00
Andrew Schofield 32c887b05e
KAFKA-17949: Introduce GroupState and replace ShareGroupState (#17763)
This PR introduces the unified GroupState enum for all group types from KIP-1043. This PR also removes ShareGroupState and begins the work to replace Admin.listShareGroups with Admin.listGroups. That will complete in a future PR.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2024-11-19 21:17:12 +05:30
David Jacot a211ee99b5
KAFKA-17593; [7/N] Introduce CoordinatorExecutor (#17823)
This patch introduces the `CoordinatorExecutor` construct into the `CoordinatorRuntime`. It allows scheduling asynchronous tasks from within a `CoordinatorShard` while respecting the runtime semantic. It will be used to asynchronously resolve regular expressions.

The `GroupCoordinatorService` uses a default `ExecutorService` with a single thread to back it at the moment. It seems that it should be sufficient. In the future, we could consider making the number of threads configurable.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2024-11-19 07:19:22 -08:00
David Arthur a334b1b6fd
MINOR Fix build scan artifact name in ci-complete (#17863)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-19 09:48:37 -05:00
Sebastien Viale 615c8c0e11
KAFKA-17850: fix leaking internal exception in state manager (#17711)
Following the KIP-1033 a FailedProcessingException is passed to the Streams-specific uncaught exception handler.

The goal of the PR is to unwrap a FailedProcessingException into a StreamsException when an exception occurs during the flushing or closing of a store

Reviewer: Bruno Cadonna <cadonna@apache.org>
2024-11-19 10:51:07 +01:00
Mickael Maison 389f96aabd
MINOR: Various cleanups in coordinator modules (#17828)
Reviewers: David Jacot <djacot@confluent.io>, Ken Huang <s7133700@gmail.com>
2024-11-19 10:01:05 +01:00
Mickael Maison 624cd4f7d0
MINOR: Various cleanups in connect:runtime tests (#17827)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-19 15:55:54 +08:00
David Jacot 0685b73010
MINOR: Make `group.consumer.migration.policy` public (#17846)
This patch makes `group.consumer.migration.policy` as public config.

Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Jeff Kim <jeff.kim@confluent.io>
2024-11-18 22:46:36 -08:00
David Arthur 5f4cbd4aa4
KAFKA-17767 Automatically quarantine new tests [5/n] (#17725)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-19 09:56:36 +08:00
Nick Telford 57299cfbb1
KAFKA-17954: Error getting oldest-iterator-open-since-ms from JMX (#17713)
The thread that evaluates the gauge for the oldest-iterator-open-since-ms runs concurrently
with threads that open/close iterators (stream threads and interactive query threads). This PR
fixed a race condition between `openIterators.isEmpty()` and `openIterators.first()`, by catching
a potential exception. Because we except the race condition to be rare, we rather catch the
exception in favor of introducing a guard via locking.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2024-11-18 17:45:49 -08:00
David Arthur 8f63a77ba1
MINOR quarantine flaky tests for Nov 18, 2024 (#17845) 2024-11-18 16:56:12 -05:00
Colin Patrick McCabe 130bf1054b
MINOR: some minor cleanups in the quorum controller. (#17819)
BrokerHeartbeatManager.java: fix an outdated comment.

Move an inefficient test method that is O(num_brokers) from ClusterControlManager.java into ReplicationControlManagerTest.java, so that it doesn't accidentally get used in production code.

Remove QuorumController.ImbalanceSchedule, etc. since it is no longer used.

Move the initialization of OffsetControlManager later in the QuorumController constructor and add a comment explaining why it should come last. This doesn't fix any bugs currently, but it's a good practice for the future.

Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-11-18 11:15:38 -08:00
Bill Bejeck 50c15b94c9
KAFKA-17561: KIP-1091 add operator metrics (#17820)
Implementation of KIP-1091 adding operator metrics to Kafka Streams
Updated existing tests to validate added metrics
Reviewers: Bruno Cadonna <cadonna@apache.org>, Matthias Sax <mjsax@apache.org>
2024-11-18 10:30:09 -05:00
ShivsundarR eafa78d99d
KAFKA-18016: Modified handling of piggyback acknowledgements in ShareConsumeRequestManager. (#17824)
What
There was a bug in handling piggyback acknowledgements in ShareConsumeRequestManager, where the fetchAcknowledgementsMap could be updated when the request was in flight and when the ShareFetch response is received, we were removing any acknowledgements(without actually sending them) which came when the request was in flight.

Fix
Now we are maintaining 2 separate maps(one which has the acknowledgements to send and one which keeps track of the acknowledgements in flight).

 Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>,  Manikumar Reddy <manikumar.reddy@gmail.com>
2024-11-18 17:15:42 +05:30
Dmitry Werner cd1bf196f0
MINOR: Delete unused logging argument from SocketServer#closeSocket (#17835)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-18 18:29:20 +08:00
TengYao Chi 53d8316b5d
KAFKA-14934 KafkaClusterTestKit makes FaultHandler accessible (#17774)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-18 18:23:54 +08:00
TengYao Chi 381fbc1359
MINOR: Fix incorrect scala example in README of test-common (#17837)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-18 18:03:17 +08:00
Kuan-Po Tseng 00cd9cd3f9
KAFKA-18003: Add test to make sure `Admin#deleteRecords` can handle the corrupted records (#17840)
Reviewers: Divij Vaidya <diviv@amazon.com>
2024-11-18 10:36:01 +01:00
David Jacot bc68011b62
MINOR: Various cleanups in CoordinatorRuntimeTest (#17829)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-11-18 10:30:54 +01:00
PoAn Yang 078d34f39d
KAFKA-17910 Create integration tests for Admin.listGroups and Admin.describeClassicGroups (#17712)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-11-18 16:35:48 +08:00
Andrew Schofield a592912ec9
KAFKA-17663 Add metadata caching in PartitionLeaderStrategy (#17367)
Admin API operations have two phases: lookup and fulfilment. The lookup phase involves a METADATA request whose details depend upon the operation being performed.

For some operations, the METADATA request can be quite expensive to serve. For example, if the user calls Admin.listOffsets for 1000 topics, the METADATA request will include all 1000 topics and the response will contain the leader information for all of these topics. And then the actual fulfilment phase does the real work of the operation.

In cases where a long-running application is performing repeated admin operations which need the same metadata information about partition leadership, it is not necessary to send the METADATA request for every single admin operation.

This PR adds a cache of the mapping from topic-partition to leader id to the admin client. The cache doesn't need to be very sophisticated because the admin client will retry if the information becomes stale, and the cache can be updated as a result of the retry.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-18 14:45:06 +08:00
TengYao Chi e1dcd383bc
KAFKA-17927 Disallow users to configure `max.in.flight.requests.per.connection` bigger than 5 (#17717)
Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2024-11-18 14:01:16 +08:00
Andrew Schofield ebbee397a7
KAFKA-17990: Flaky test improvements (#17814)
Several of the ShareConsumerTest integration tests are a bit flaky. This PR tightens up the logic with the aim of eliminating the flakes. Annoyingly the tests seem rock solid locally so this might take some experimentation.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Divij Vaidya <diviv@amazon.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
2024-11-18 10:33:24 +05:30
Ritika Reddy e4c0034679
KAFKA-18019: Make INVALID_PRODUCER_ID_MAPPING a fatal error (#17822)
This patch contains changes to the handling of the INVALID_PRODUCER_ID_MAPPING error.
Quoted from KIP-890
Since we bump epoch on abort, we no longer need to call InitProducerId to fence requests. InitProducerId will only be called when the producer starts up to fence a previous instance.

With this change, some other calls to InitProducerId were inspected including the call after receiving an InvalidPidMappingException. This exception was changed to abortable as part of KIP-360: Improve reliability of idempotent/transactional producer. However, this change means that we can violate EOS guarantees. As an example:

Consider an application that is copying data from one partition to another

Application instance A processes to offset 4
Application instance B comes up and fences application instance A
Application instance B processes to offset 5
Application instances A and B are idle for transaction.id.expiration.ms, transaction id expires on server
Application instance A attempts to process offset 5 (since in its view, that is next) -- if we recover from invalid pid mapping, we can duplicate this processing
Thus, INVALID_PID_MAPPING should be fatal to the producer.

This is consistent with KIP-1050: Consistent error handling for Transactions where errors that are fatal to the producer are in the "application recoverable" category. This is a grouping that indicates to the client that the producer needs to restart and recovery on the application side is necessary. KIP-1050 is approved so we are consistent with that decision.

This PR also fixes the flakiness of TransactionsExpirationTest.

Reviewers:  Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>, Calvin Liu <caliu@confluent.io>
2024-11-17 18:43:04 -08:00
Lianet Magrans 5cf9872e8f
KAFKA-18017: Fix call order for HB error and group manager (#17805)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-17 19:12:25 -05:00
Mickael Maison 7ef56a9313
KAFKA-17987 Remove ZooKeeper related windows scripts (#17811)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-18 02:21:24 +08:00
Ken Huang fde6ae1500
KAFKA-18029 remove the `kraft.version=1` from kafka.py (#17838)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-18 00:48:19 +08:00
Abhinav Dixit dfa5aa5484
KAFKA-18022: fetchOffsetMetadata handling for minBytes estimation in both common/uncommon cases of share fetch (#17825)
Reviewers: Jun Rao <junrao@gmail.com>
2024-11-16 07:26:30 -08:00
Justin Lee a8f84cab95
KAFKA-18001: Support UpdateRaftVoterRequest in KafkaNetworkChannel (#17773)
Adds support for UpdateRaftVoterRequest in KafkaNetworkChannel. This addresses the following scenario:

* Bootstrap a KRaft Controller quorum in dynamic mode
* Start additional controllers (as observers)
* Update kraft.version feature from 0 to 1
* Use kafka-metadata-quorum add-controller to promote an observer controller to a follower

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Alyssa Huang <ahuang@confluent.io>
2024-11-15 15:55:01 -05:00
PoAn Yang 5725a51453
KAFKA-16460: New consumer times out consuming records in multiple consumer_test.py system tests (#17777)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2024-11-15 19:41:39 +01:00
xijiu 283d56cf56
KAFKA-17904: Flaky testMultiConsumerSessionTimeoutOnClose (#17789)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2024-11-15 19:39:16 +01:00