Commit Graph

14111 Commits

Author SHA1 Message Date
Andrew Schofield 3d9f88daf3
KAFKA-17546 Admin.listGroups and kafka-groups.sh (#17626)
This implements the kafka-groups.sh tool and Admin.listGroups method defined in KIP-1043.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-02 05:37:04 +08:00
Mahsa Seifikar b864a66439
MINOR: Add logging for ReplicationControlManager topic deletion (#17617)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-11-01 12:24:22 -07:00
kevin-wu24 568b9e8a6c
KAFKA-17803: LogSegment#read should return the base offset of the batch that contains startOffset rather than startOffset (#17528)
Reviewers: Jose Sancio <jsancio@gmail.com>, Jun Rao <junrao@gmail.com>
2024-11-01 09:32:00 -07:00
Ritika Reddy e14a81b4b2
KAFKA-14562 [2/3]: Implement epoch bump after every transaction (KIP-890) (#17402)
This patch includes changes to the clients end transaction response handling when transaction version 2 is enabled.
Version 5+ of the End Txn Response includes the producer Id and the producer epoch fields.

Upon receiving the request, the client updates its producer Id and epoch according to the response.

On receiving an EndTxnRequest the server would've either:

Bumped the epoch for the given producer ID.
On epoch overflow, sent a new producer Id with epoch 0.
This patch also includes changes to the endTxnRequest to send the right request version based on whether txnV2 is enabled.

There was a test failure in the integration tests that allowed us to catch a bug in the PrepareComplete method where we update the transit metadata incorrectly. Added the bug fix in this patch where the lastProducerEpoch is updated correctly.

Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-11-01 08:12:00 -07:00
David Arthur 2696a6d7a1
KAFKA-17767 Load test catalog data [3/n] (#17654)
Add a new "load-catalog" job to the workflow. This job will checkout the test-catalog branch at 7 days prior and generate a text file of all the tests that were known at that time. This file is then passed down to the two parallel "test" jobs to be used as a source of data for the quarantined test behavior.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-01 10:38:47 -04:00
PoAn Yang 5a3b544d61
KAFKA-17880 Move integration test from streams module to streams/integration-tests module (#17615)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-01 18:21:06 +08:00
Andrew Schofield 346fdbafc5
KAFKA-17912 Align string representations of SharePartitionKey (#17656)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-01 15:43:32 +08:00
Jonah Hooper 18b8b992f9
[KAFKA-17870] Fail CreateTopicsRequest if total number of partitions exceeds 10k (#17604)
We fail the entire CreateTopicsRequest action if there are more than 10k total
partitions being created in this topic for this specific request. The usual pattern for
this API to try and succeed with some topics. Since the 10k limit applies to all topics
then no topic should be created if they all exceede it.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-10-31 13:54:03 -07:00
TengYao Chi 6f040cabc7
KAFKA-17116 New consumer may not send effective leave group if member ID received after close (#17549)
KIP-1082 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-1082%3A+Require+Client-Generated+IDs+over+the+ConsumerGroupHeartbeat+RPC)

Reviewers: Andrew Schofield <aschofield@confluent.io>, David Jacot <djacot@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-11-01 03:03:17 +08:00
TengYao Chi ea7da09e53
KAFKA-17899 Add more unit tests for NetworkReceive (#17637)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-01 02:38:43 +08:00
Yung 6094882315
KAFKA-17905 Remove the specified type of using lambda for BaseFunction (#17648)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-01 02:33:16 +08:00
David Arthur dd432c0ca1
MINOR: Add links test catalog commits (#17650)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-01 00:53:02 +08:00
David Arthur 7c536c9643
Mark PlaintextAdminIntegrationTest#testShareGroups as flaky (#17649)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Ron Dagostino <rndgstn@gmail.com>
2024-10-31 10:19:14 -04:00
Yung 0ba6f3212d
KAFKA-17903 Remove KafkaFuture#Function and KafkaFuture#thenApply (#17644)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-31 14:45:02 +08:00
Bill Bejeck 45982186dc
MINOR: Revert "Temporarily update Vagrantfile to Java 11" (#17642)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-31 13:23:04 +08:00
Apoorv Mittal ff116df015
KAFKA-17002: Integrated partition leader epoch for Persister APIs (KIP-932) (#16842)
The PR integrates leader epoch for partition while invoking Persister APIs. The write RPC is retried once on leader epoch failure.

Reviewers: Abhinav Dixit <adixit@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>
2024-10-30 14:41:39 -07:00
PoAn Yang fb65dfeb11
KAFKA-17726: New consumer subscribe/subscribeFromPattern in background thread (#17569)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2024-10-30 22:15:13 +01:00
David Arthur 4fadec527d
MINOR: Quarantine the worst flaky tests (#17639)
Using the last 7 days of data on Oct 30 2024, this patch marks all flaky tests with more than 10% flakiness on trunk.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-30 17:02:04 -04:00
David Arthur 835f256f70
MINOR: Don't run update-test-catalog on PRs (#17640)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-30 16:57:49 -04:00
Bill Bejeck 29881782c8
KAFKA-17609 Migrate broker compatibility test from ZK to KRaft (#17603)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-10-31 04:51:06 +08:00
Mickael Maison d7135b2a5b
MINOR: Various cleanups in metadata (#17633)
Reviewers: David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-10-31 02:48:33 +08:00
Ken Huang 30b1bdfc74
KAFKA-17835 Move ProducerIdManager and RPCProducerIdManager to transaction-coordinator module (#17562)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-31 02:40:47 +08:00
David Arthur 68c6c6da86
KAFKA-17767 Store the test catalog in Git [2/n] (2nd attempt) (#17627)
This patch adds a CI job to store our test catalog in an orphaned branch named "test-catalog" within this repo. 
This data will be used to help determine which tests should be quarantined.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-30 13:50:42 -04:00
Josep Prat 5859df9ee0
MINOR: Add Kafka 3.8.1 to system tests (#17629)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-31 01:00:37 +08:00
Luke Chen 67e2a0ae6f
MINOR: Improve the KIP-853 documentation (#17598)
In docs/ops.html, add a section discussion the difference between static and dynamic quorums. This section also discusses how to find out which quorum type you have. Also discuss the current limitations, such as the inability to transition from static quorums to dynamic.

Add a brief section to docs/upgrade.html discussing controller membership change.

Co-authored-by: Federico Valeri <fedevaleri@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
Reviewers: Justine Olshan <jolshan@confluent.io>
2024-10-30 09:41:19 -07:00
Bill Bejeck 58dd76817e
KAFKA-17609:[2/4]Convert system tests to kraft part 2 (#17321)
* Part 2 of 4 converting system tests to use KRaft

Reviewers: Matthias Sax <mjsax@apache.org>
2024-10-30 12:31:47 -04:00
Bill Bejeck 358d8775fb
KAFKA-17609:[3/4]Convert system tests to kraft part 3 (#17327)
Part 3 of 4 converting streams system tests to KRaft

Reviewers: Matthias Sax <mjsax@apache.org>
2024-10-30 12:20:58 -04:00
Bill Bejeck 3d2edf8de0
KAFKA-17609:[4/4]Convert system tests to kraft part 4 (#17328)
Part 4 of 4 converting streams system tests to KRaft

Reviewers: Matthias Sax <mjsax@apache.org>
2024-10-30 12:07:16 -04:00
David Jacot f29ca9ba9a
MINOR: Fix typo in ConsumerGroupHeartbeatResponse.json (#17635)
We forgot to change `INVALID_SUBSCRIPTION_REGEX` to `INVALID_REGULAR_EXPRESSION` in the spec.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Lianet Magrans <lmagrans@confluent.io>
2024-10-30 08:21:50 -07:00
Mahsa Seifikar bed70d4d2e
MINOR: Correct error message in reassign_partitions_test.py (#17632)
Reviewers: Justine Olshan <jolshan@confluent.io>
2024-10-30 08:20:46 -07:00
David Jacot af5df59d2b
KAFKA-17593; [1/N] Introduce re2j dependency (#17634)
This patch is the first of a series of patches to introduce support for server side regular expression. It introduces the re2j dependency.

Co-authored-by: Lianet Magrans <lmagrans@confluent.io>

Reviewers: Lianet Magrans <lmagrans@confluent.io>
2024-10-30 08:20:11 -07:00
Ken Huang 2a46282b2a
KAFKA-17873: Add description to all packages in the public API (#17605)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-10-30 15:41:10 +01:00
PoAn Yang 7fb6e9ec1c
KAFKA-17840 Move ReplicationQuotaManager, ClientRequestQuotaManager and QuotaFactory to server module (#17609)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-30 21:18:28 +08:00
Kuan-Po Tseng f0a3960e3e
KAFKA-17867 Consider using zero-copy for PushTelemetryRequest (#17622)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-30 20:49:56 +08:00
wperlichek 49ea947095
MINOR: Fix spelling typo in Docker Compose examples in README (#17631)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-30 20:44:24 +08:00
Josep Prat c55d4f08c4
MINOR: Update the upgrade docs to include 3.8.1 version (#17628)
Signed-off-by: Josep Prat <josep.prat@aiven.io>

Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-10-30 11:41:00 +01:00
Nick Telford 571f50817c
KAFKA-17411: Create local state Standbys on start (#16922)
Instead of waiting until Tasks are assigned to us, we pre-emptively
create a StandbyTask for each non-empty Task directory found on-disk.

We do this before starting any StreamThreads, and on our first
assignment (after joining the consumer group), we recycle any of these
StandbyTasks that were assigned to us, either as an Active or a
Standby.

We can't just use these "initial Standbys" as-is, because they were
constructed outside the context of a StreamThread, so we first have to
update them with the context (log context, ChangelogReader, and source
topics) of the thread that it has been assigned to.

The motivation for this is to (in a later commit) read StateStore
offsets for unowned Tasks from the StateStore itself, rather than the
.checkpoint file, which we plan to deprecate and remove.

There are a few additional benefits:

Initializing these Tasks on start-up, instead of on-assignment, will
reduce the time between a member joining the consumer group and beginning
processing. This is especially important when active tasks are being moved
over, for example, as part of a rolling restart.

If a Task has corrupt data on-disk, it will be discovered on startup and
wiped under EOS. This is preferable to wiping the state after being
assigned the Task, because another instance may have non-corrupt data and
would not need to restore (as much).

There is a potential performance impact: we open all on-disk Task
StateStores, and keep them all open until we have our first assignment.
This could require large amounts of memory, in particular when there are
a large number of local state stores on-disk.

However, since old local state for Tasks we don't own is automatically
cleaned up after a period of time, in practice, we will almost always
only be dealing with the state that was last assigned to the local
instance.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Bruno Cadonna <cadonna@apache.org>, Matthias Sax <mjsax@apache.org>
2024-10-29 12:59:25 -07:00
TengYao Chi 7366f0487a
KAFKA-17128 Make node.id immutable after removing zookeeper migration (#17616)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-29 23:29:18 +08:00
Josep Prat 4419677e06
Update docker_scan.yml for 3.8.1 (#17623)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-29 16:06:01 +01:00
Yung 984777f0b9
KAFKA-17875: Align KRaft controller count recommendations (#17600)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
Co-authored-by: yungh <yungh@nvidia.com>
2024-10-29 11:37:49 +01:00
Alieh Saeedi 4817eb9227
KAFKA-15344: Streams task should cache consumer nextOffsets (#17091)
This PR augments Streams messages with leader epoch. In case of empty buffer queues, the last offset and leader epoch are retrieved from the streams task 's cache of nextOffsets.

Co-authored-by: Lucas Brutschy <lbrutschy@confluent.io>
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2024-10-29 09:30:11 +01:00
David Arthur 8c071b02e9
MINOR: Fix JDK versions in CI (#17621)
Our "validate" job was running on JDK 21 while the "test" job was running 11 and 23. This patch updates the validate job to 23 and fixes the test catalog step to only run on JDK 23 (instead of 21)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-29 13:58:38 +08:00
Kirk True 9e424755d4
KAFKA-17439: Make polling for new records an explicit action/event in the new consumer (#17035)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2024-10-28 15:46:37 -04:00
Sushant Mahajan 5f92f60bff
KAFKA-17329: DefaultStatePersister implementation (#17270)
Adds the DefaultStatePersister and other supporting classes for managing share state.

* Added DefaultStatePersister implementation. This is the entry point for callers who wish to invoke the share state RPC API.
* Added PersisterStateManager which is used by DefaultStatePersister to manage and send the RPCs over the network.
* Added code to BrokerServer and BrokerMetadataPublisher to instantiate the appropriate persister based on the config value for group.share.persister.class.name. If this is not specified, the DefaultStatePersister will be used. To force use of NoOpStatePersister, set the config to empty. This is an internal config, not to be exposed to the end user. This will be used to factory plug the appropriate persister.
* Using this persister, the internal __share_group_state topic will come to life and will be used for persistence of share group info.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>
2024-10-28 14:11:04 -04:00
Colin Patrick McCabe 14a9130f6f
KAFKA-17793: Improve kcontroller robustness against long delays (#17502)
As described in KIP-500, the Kafka controller monitors the liveness of each broker in the cluster. It gathers this information from heartbeats sent from the brokers themselves.

In some rare cases, the main controller thread may get blocked for several seconds at a time. In the current code, this will result in the controller being unable to update the last contact times for the brokers during this time.

This PR changes the controller heartbeat handling to be partially lockless. Specifically, the last contact time for each broker will be updated locklessly prior to the rest of the heartbeat handling. This will ensure that heartbeats always get through.

Additionally, this PR adds a PeriodicTaskControlManager to better manage periodic tasks. This should help handle the very common pattern where we want to schedule a background task at some frequency. We also want the background task to be immediately rescheduled if there is too much work to be done in one event.

Reviewers: Liu Zeyu <zeyu.luke@gmail.com>, David Arthur <mumrah@gmail.com>
2024-10-28 08:36:07 -07:00
Mickael Maison 6e88c10ed5
KAFKA-14483 Move LocalLog to storage module (#17587)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-28 20:41:46 +08:00
Dmitry Werner 12a60b8cd9
KAFKA-17878 Move ActionQueue to server module (#17602)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-28 20:35:23 +08:00
Yung db25c212ed
KAFKA-17883 Fix jvm error caused by UseParNewGC when running old kafka client in e2e (#17612)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-28 20:23:32 +08:00
xijiu 71f4001a21
KAFKA-17874: Fix the KRaft metric names in the documentation (#17599)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-10-28 11:12:18 +01:00
Yung 24689dc6ab
KAFKA-17879 test_performance_services.py should use DEV version to run kafka service (#17606)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-28 04:32:06 +08:00