Commit Graph

11383 Commits

Author SHA1 Message Date
David Jacot 2aa1555423
MINOR: rat should depend on processMessages task (#13854)
This fix the following issue that we occasionally see in [builds](https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-13848/4/pipeline/13/).

```
[2023-06-14T11:41:50.769Z] * What went wrong:
[2023-06-14T11:41:50.769Z] A problem was found with the configuration of task ':rat' (type 'RatTask').
[2023-06-14T11:41:50.769Z]   - Gradle detected a problem with the following location: '/home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-13848'.
[2023-06-14T11:41:50.769Z]     
[2023-06-14T11:41:50.769Z]     Reason: Task ':rat' uses this output of task ':clients:processTestMessages' without declaring an explicit or implicit dependency. This can lead to incorrect results being produced, depending on what order the tasks are executed.
[2023-06-14T11:41:50.769Z]     
[2023-06-14T11:41:50.769Z]     Possible solutions:
[2023-06-14T11:41:50.769Z]       1. Declare task ':clients:processTestMessages' as an input of ':rat'.
[2023-06-14T11:41:50.769Z]       2. Declare an explicit dependency on ':clients:processTestMessages' from ':rat' using Task#dependsOn.
[2023-06-14T11:41:50.769Z]       3. Declare an explicit dependency on ':clients:processTestMessages' from ':rat' using Task#mustRunAfter.
[2023-06-14T11:41:50.769Z]     
[2023-06-14T11:41:50.769Z]     Please refer to https://docs.gradle.org/8.1.1/userguide/validation_problems.html#implicit_dependency for more details about this problem.
```

Validated manually as well:

```
% ./gradlew rat

> Configure project :
Starting build with version 3.6.0-SNAPSHOT (commit id 874081ca) using Gradle 8.1.1, Java 17 and Scala 2.13.10
Build properties: maxParallelForks=10, maxScalacThreads=8, maxTestRetries=0

> Task :storage:processMessages
MessageGenerator: processed 4 Kafka message JSON files(s).

> Task :raft:processMessages
MessageGenerator: processed 1 Kafka message JSON files(s).

> Task :core:processMessages
MessageGenerator: processed 2 Kafka message JSON files(s).

> Task :group-coordinator:processMessages
MessageGenerator: processed 16 Kafka message JSON files(s).

> Task :streams:processMessages
MessageGenerator: processed 1 Kafka message JSON files(s).

> Task :metadata:processMessages
MessageGenerator: processed 20 Kafka message JSON files(s).

> Task :clients:processMessages
MessageGenerator: processed 146 Kafka message JSON files(s).

> Task :clients:processTestMessages
MessageGenerator: processed 4 Kafka message JSON files(s).

BUILD SUCCESSFUL in 8s
```

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-06-16 18:38:43 +02:00
Chris Egerton 73dd51e137
Revert "MINOR: Reduce MM2 integration test flakiness due to missing dummy offset commits (#13838)" (#13864)
Reviewers: Josep Prat <josep.prat@aiven.io>

Reverts commit 505c7b6487.
2023-06-16 12:10:26 -04:00
David Arthur 66f0cbc424
MINOR: Add ZK migration instructions to the operations documentation (#13257)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-06-16 11:03:57 -04:00
Divij Vaidya b10beaae77
MINOR: Add more information in assertion failure for non daemon threads (#13858)
Reviewers: Luke Chen <showuon@gmail.com>
2023-06-16 15:17:57 +02:00
Luke Chen 74238656dc
KAFKA-15066: add "remote.log.metadata.manager.listener.name" config to rlmm (#13828)
add "remote.log.metadata.manager.listener.name" config to rlmm to allow producer/consumer to connect to the server. Also add tests.

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>
2023-06-16 20:56:13 +08:00
hudeqi 88320a8140
MINOR: Fix illogical log in fetchOffsetAndTruncate method (#13719)
Co-authored-by: Deqi Hu <deqi.hu@shopee.com>

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, David Jacot <djacot@confluent.io>, Alexandre Dupriez <alexandre.dupriez@gmail.com>
2023-06-16 13:51:22 +02:00
Chris Egerton e1d59920f4
KAFKA-15059: Remove pending rebalance check when fencing zombie source connector tasks (#13819)
Discovered while researching KAFKA-14718

Currently, we perform a check during zombie fencing that causes the round of zombie fencing to fail when a rebalance is pending (i.e., when we've detected from a background poll of the config topic that a new connector has been created, that an existing connector has been deleted, or that a new set of connector tasks has been generated).

It's possible but not especially likely that this check causes issues when running vanilla Kafka Connect. Even when it does, it's easy enough to restart failed tasks via the REST API.

However, when running MirrorMaker 2 in dedicated mode, this check is more likely to cause issues as we write three connector configs to the config topic in rapid succession on startup. And in that mode, there is no API to restart failed tasks aside from restarting the worker that they are hosted on.

In either case, this check can lead to test flakiness in integration tests for MirrorMaker 2 both in dedicated mode and when deployed onto a vanilla Kafka Connect cluster.

This check is not actually necessary, and we can safely remove it. Copied from Jira:

>If the worker that we forward the zombie fencing request to is a zombie leader (i.e., a worker that believes it is the leader but in reality is not), it will fail to finish the round of zombie fencing because it won't be able to write to the config topic with a transactional producer.

>If the connector has just been deleted, we'll still fail the request since we force a read-to-end of the config topic and refresh our snapshot of its contents before checking to see if the connector exists.

>And regardless, the worker that owns the task will still do a read-to-end of the config topic and verify that (1) no new task configs have been generated for the connector and (2) the worker is still assigned the connector, before allowing the task to process any data.

In addition, while waiting on a fix for KAFKA-14718 that adds more granularity for diagnosing failures in the DedicatedMirrorIntegrationTest suite (#13284), some of the timeouts in that test are bumped to work better on our CI infrastructure.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Yash Mayya <yash.mayya@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
2023-06-16 11:58:36 +02:00
Divij Vaidya 11aa999d20
MINOR: Change file permissions of reviewers.py to make it executable (#13861)
Reviewers: David Jacot <djacot@confluent.io>, Manyanda Chitimbo <manyanda.chitimbo@gmail.com>
2023-06-15 17:01:37 +02:00
Christo Lolov c5df47a1cb
KAFKA-14133: Migrate StandbyTaskCreator mock in TaskManagerTest to Mockito (#13711)
Reviewers: Bruno Cadonna <cadonna@apache.org>
2023-06-15 14:55:55 +02:00
Greg Harris 505c7b6487
MINOR: Reduce MM2 integration test flakiness due to missing dummy offset commits (#13838)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-06-15 12:02:32 +02:00
Gantigmaa Selenge 930744c3a8
KAFKA-14709: Move content in connect/mirror/README.md to the docs (#13650)
Most of the contents in the README.md was already covered in the docs therefore only had to add the section for Exactly Once support.

Reviewers: Luke Chen <showuon@gmail.com>
2023-06-15 10:16:52 +08:00
Walker Carlson 4a5d1b3205
KAFKA-14936: Add On Disk Time Ordered Buffer (1/N) (#13756)
KAFKA-14936: Add On Disk Time Ordered Buffer

Add a time ordered key-value buffer stored on disk and implemented using RocksDBTimeOrderedKeyValueSegmentedBytesStore.

This will be used in the stream buffered for joins with a grace period.

Reviewers: Bruno Cadonna <cadonna@confluent.io> Victoria Xia <victoria.xia@confluent.io>
2023-06-14 15:16:55 -05:00
David Jacot 45a279ec70
MINOR: Move Timer/TimingWheel to server-common (#13820)
This patch rewrite `Timer` and the related classes in Java and moves them to `server-common` module. It is basically a one to one rewrite of the Scala code. Note that `MockTimer` is not moved as part of this patch. It will be done separately.

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-06-14 18:21:30 +02:00
Manyanda Chitimbo 044d058e03
MINOR: remove unused field ProcessorNode#time (#13624)
Reviewers: Divij Vaidya <diviv@amazon.com>
2023-06-14 15:30:57 +02:00
David Jacot dfe050c8bf
KAFKA-15080; Fetcher's lag never set when partition is idle (#13843)
The PartitionFetchState's lag field is set to None when the state is created and it is updated when bytes are received for a partition. For idle partitions (newly created or not), the lag is never updated because `validBytes > 0` is never true. As a side effect, the partition is considered out-of-sync and could be incorrectly throttled.

Reviewers: Divij Vaidya <diviv@amazon.com>, Jason Gustafson <jason@confluent.io>
2023-06-13 15:18:54 +02:00
Calvin Liu 303b457049
MINOR: Make sure replicas will not be removed in initial ISR (#13844)
Reviewers: David Jacot <djacot@confluent.io>
2023-06-13 14:02:10 +02:00
Christo Lolov 7f0e45590a
KAFKA-14133: Migrate Admin mock in TaskManagerTest to Mockito (#13712)
This pull requests migrates the Admin mock in TaskManagerTest from EasyMock to Mockito.
The change is restricted to a single mock to minimize the scope and make it easier for review.

Reviewers: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2023-06-13 10:48:07 +02:00
David Jacot 7556ce366a
KAFKA-14462; [17/N] Add CoordinatorRuntime (#13795)
This patch introduces the CoordinatorRuntime. The CoordinatorRuntime is a framework which encapsulate all the common features requires to build a coordinator such as the group coordinator. Please refer to the javadoc of that class for the details.

Reviewers: Divij Vaidya <diviv@amazon.com>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
2023-06-13 09:46:38 +02:00
José Armando García Sancio b7a6a8fd5f
KAFKA-15076; KRaft should prefer latest snapshot (#13834)
If the KRaft listener is at offset 0, the start of the log, and KRaft has generated a snapshot, it should prefer the latest snapshot instead of having the listener read from the start of the log.

This is implemented by having KafkaRaftClient send a Listener.handleLoadSnapshot event, if the Listener is at offset 0 and the KRaft partition has generated a snapshot.

Reviewers: Jason Gustafson <jason@confluent.io>, David Arthur <mumrah@gmail.com>
2023-06-12 07:25:42 -07:00
dengziming c0cb8dd4bc
KAFKA-15036: Add a test case for controller failover (#13832)
Add a test case for controller failover to avoid bug like KAFKA-15036.

Reviewers: Luke Chen <showuon@gmail.com>
2023-06-12 10:45:17 +08:00
Bruno Cadonna 6fe74f78dc
KAFKA-10199: Re-add revived tasks to the state updater after handling (#13829)
Fixes a bug regarding the state updater where tasks that experience corruption
during restoration are passed from the state updater to the stream thread
for closing and reviving but then the revived tasks are not re-added to
the state updater.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Walker Carlson <wcarlson@confluent.io>
2023-06-11 15:14:46 +02:00
Sushant Mahajan 5afce2de68
KAFKA-15077: Code to trim token in FileTokenRetriever (#13835)
The FileTokenRetriever class is used to read the access_token from a file on the clients system and then it is passed along with the jaas config to the OAuthBearerSaslServer. In case the token was sent using FileTokenRetriever on the client side, some EOL character is getting appended to the token, causing authentication to fail with the message:


Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2023-06-11 11:52:25 +05:30
Walker Carlson daba741826
KAFKA-14936: Change Time Ordered Buffer to not require Change<> 0/N (#13830)
Change the TimeOrderedKeyValueBuffer to take three types to include the store type so that it can be used for non change<V> operations as well.

Reviewers: Victoria Xia<victoria.xia@confluent.io> , Gabriel Gama <>
2023-06-10 17:22:32 -05:00
David Arthur 4fe96ddfcc
MINOR fix formatting in stale PR comment 2023-06-10 17:05:03 -04:00
David Arthur 9637cc7a32
MINOR: Enable schedule stale PR workflow (#13841)
Also fixes a few things in the workflow config (label name, better PR comment)

Reviewers: Josep Prat <josep.prat@aiven.io>
2023-06-10 16:46:55 -04:00
David Arthur da22d03605
KAFKA-15073: Add a Github action to mark PRs as stale (#13827)
This patch makes use of the Github action `actions/stale@v8` to add a `stale` label to PRs older than 90 days. 

Reviewers: Josep Prat <josep.prat@aiven.io>, David Jacot <djacot@confluent.io>
2023-06-10 10:03:07 -04:00
David Jacot 7eea2a3908
MINOR: Move MockTime to server-common (#13823)
This patch rewrite `MockTime` in Java and moves it to `server-common` module. This is a prerequisite to move `MockTimer` later on to `server-common` as well. 

Reviewers: David Arthur <mumrah@gmail.com>
2023-06-09 08:54:25 +02:00
Luke Chen d3e0b27b24
KAFKA-15040: trigger onLeadershipChange under KRaft mode (#13807)
When received LeaderAndIsr request, we'll notify remoteLogManager about this leadership changed to trigger the following workflow. But LeaderAndIsr won't be sent in KRaft mode, instead, the topicDelta will be received.

This PR fixes this issue by getting leader change and follower change from topicDelta, and triggering rlm.onLeadershipChange to notify remote log manager. Adding tests for remote storage enabled cases.

Reviewers: Satish Duggana <satishd@apache.org>
2023-06-09 09:53:46 +08:00
Lianet Magrans 4af4bccbbf
KAFKA-14966: Extract OffsetFetcher reusable logic (#13815)
The OffsetFetcher is internally used by the KafkaConsumer to fetch offsets, validate and reset positions. For the new KafkaConsumer with a refactored threading model, similar functionality will be needed.

This is an initial refactoring for extracting logic from the OffsetFetcher, that will be reused by the new consumer implementation. No changes to the existing logic, just extracting classes, functions or pieces of logic.

All the functionality moved out of the OffsetFetcher is already covered by tests in OffsetFetcherTest and FetcherTest. There were no individual tests for the extracted functions, so no tests were migrated.

Reviewers: Jun Rao <junrao@gmail.com>
2023-06-08 14:03:45 -07:00
Chris Egerton 6b128d7e30
KAFKA-14006: Parameterize WorkerConnectorTest suite (#12307)
Reviewers: Sagar Rao <sagarmeansocean@gmail.com>, Christo Lolov <lolovc@amazon.com>, Kvicii <kvicii.yu@gmail.com>, Mickael Maison <mickael.maison@gmail.com>
2023-06-08 11:20:35 -04:00
Danica Fine 513e1c641d
KAFKA-14539: Simplify StreamsMetadataState by replacing the Cluster metadata with partition info map (#13751)
Replace usage of Cluster in StreamsMetadataState with Map<String, List>. Update StreamsPartitionAssignor#onAssignment method to pass existing Map<TopicPartition, PartitionInfo> instead of fake Cluster object.

Behavior remains the same; updated existing unit tests accordingly.

Reviewers:  Walker Carlson <wcarlson@confluent.io>, Bill Bejeck <bbejeck@apache.org>
2023-06-07 15:35:11 -04:00
Lucas Brutschy ff77b3ad04
KAFKA-14278: Fix InvalidProducerEpochException and InvalidTxnStateException handling in producer clients (#13811)
This PR fixes three issues:

InvalidProducerEpochException was not handled consistently. InvalidProducerEpochException used to be able to be return via both transactional response and produce response, but as of KIP-588 (2.7+), transactional responses should not return InvalidProducerEpochException anymore, only produce responses can. It can happen that older brokers may still return InvalidProducerEpochException for transactional responses; these must be converted to the newer ProducerFencedException. This conversion wasn't done for TxnOffsetCommit (sent to the group coordinator).

InvalidTxnStateException was double-wrapped in KafkaException, whereas other exceptions are usually wrapped only once. Furthermore, InvalidTxnStateException was not handled at all for in AddOffsetsToTxn response, where it should be a possible error as well, according to API documentation.

According to API documentation, UNSUPPORTED_FOR_MESSAGE_FORMAT is not possible for TxnOffsetCommit, but it looks like it is, and it is being handled there, so I updated the API documentation.

Reviewers: Justine Olshan <jolshan@confluent.io>
2023-06-07 09:48:14 -07:00
Jorge Esteban Quilcate Otoya 9cfc4b9373
KAFKA-15051: add missing GET plugin/config endpoint (#13803)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-06-07 11:06:22 -04:00
Erik van Oosten 59d30a06fc
KAFKA-10337: await async commits in commitSync even if no offsets given (#13678)
The contract for Consumer#commitSync() guarantees that the callbacks for all prior async commits will be invoked before it returns. Prior to this patch the contract could be violated if an empty offsets map were passed in to Consumer#commitSync().

Reviewers: Philip Nee <philipnee@gmail.com>, David Jacot <djacot@confluent.io>
2023-06-07 16:55:03 +02:00
José Armando García Sancio 8ad0ed3e61
KAFKA-15021; Skip leader epoch bump on ISR shrink (#13765)
When the KRaft controller removes a replica from the ISR because of the controlled shutdown there is no need for the leader epoch to be increased by the KRaft controller. This is accurate as long as the topic partition leader doesn't add the removed replica back to the ISR.

This change also fixes a bug when computing the HWM. When computing the HWM, replicas that are not eligible to join the ISR but are caught up should not be included in the computation. Otherwise, the HWM will never increase for replica.lag.time.max.ms because the shutting down replica is not sending FETCH request. Without this additional fix PRODUCE requests would timeout if the request timeout is greater than replica.lag.time.max.ms.

Because of the bug above the KRaft controller needs to check the MV to guarantee that all brokers support this bug fix before skipping the leader epoch bump.

Reviewers: David Mao <47232755+splett2@users.noreply.github.com>, Divij Vaidya <diviv@amazon.com>, David Jacot <djacot@confluent.io>
2023-06-07 07:20:40 -07:00
Aman Singh e1f29554a5
MINOR: Fix ZkAclMigrationClientTest.testAclsMigrateAndDualWrite test (#13809)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2023-06-07 15:41:08 +05:30
John Roesler f231fe97a4
MINOR: Retrigger collaborator invites: part 2
Adding and removing some users to re-trigger the invites.
2023-06-06 17:03:44 -05:00
John Roesler d0556d0e77
MINOR: Retrigger collaborator invites: part 1
Adding and removing some users to re-trigger the invites.
2023-06-06 10:33:58 -05:00
Alok Thatikunta 3d349ae0d6
MINOR; Add helper util Snapshots.lastContainedLogTimestamp (#13772)
This change refactors the lastContainedLogTimestamp to the Snapshots class, for re-usability. Introduces IdentitySerde based on ByteBuffer, required for using RecordsSnapshotReader. This change also removes the "recordSerde: RecordSerde[_]" argument from the KafkaMetadataLog constructor.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2023-06-06 08:29:15 -07:00
andymg3 db9d845702
KAFKA-14791; Create a builder for PartitionRegistration (#13788)
This creates a builder for PartitionRegistration. The motivation for the builder is that the constructor of PartitionRegistration has four arguments all of type int[] which makes it easy to make a mistake when using it.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2023-06-06 07:58:23 -07:00
David Jacot 7d147cf241
KAFKA-14462; [14/N] Add PartitionWriter (#13675)
This patch introduces the `PartitionWriter` interface in the `group-coordinator` module. The `ReplicaManager` resides in the `core` module and it is thus not accessible from the `group-coordinator` one. The `CoordinatorPartitionWriter` is basically an implementation of the interface residing in `core` which interfaces with the `ReplicaManager`.

One notable difference from the usual produce path is that the `PartitionWriter` returns the offset following the written records. This is then used by the coordinator runtime to track when the request associated with the write can be completed.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
2023-06-06 16:24:48 +02:00
Greg Harris c8cb85274e
MINOR: Refactor DelegatingClassLoader to emit immutable PluginScanResult (#13771)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-06-06 09:37:57 -04:00
hudeqi 9ebe395c57
KAFKA-14866: Remove controller module metrics when broker is shutting down (#13473)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>
2023-06-06 11:10:16 +02:00
mojh7 04f2f6a26a
MINOR: Typo and unused method removal (#13739)
clean up unused private method and removed typos

Reviewers:  Divij Vaidya <diviv@amazon.com>,  Manyanda Chitimbo <manyanda.chitimbo@gmail.com>,  Daniel Scanteianu, Josep Prat <josep.prat@aiven.io>
2023-06-06 10:50:56 +02:00
Chris Egerton 17fd30e6b4
MINOR: Fix flaky DistributedHerderTest cases related to zombie fencing (#13806)
Reviewers: Yash Mayya <yash.mayya@gmail.com>, Chris Egerton <chrise@aiven.io>
2023-06-05 15:50:54 -04:00
Yash Mayya 383a8d6114
MINOR: Handle the config topic read timeout edge case in DistributedHerder's stopConnector method (#13750)
Reviewers: Chris Egerton
2023-06-05 11:10:59 -04:00
Yash Mayya fca7ee7270
MINOR: Remove reference to 'offset backing store' from exception message in KafkaBasedLog (#13810)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-06-05 10:04:17 -04:00
Yash Mayya 8e60368d90
MINOR: Re-introduce Transformation import to fix TransformationChain Javadoc (#13808)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-06-05 09:43:03 -04:00
Gabriel Oliveira 443bd1dd82
MINOR: Add "versions" tag to recently added ReplicaState field on Fetch Request (#13680)
Reviewers: David Jacot <djacot@confluent.io>
2023-06-05 13:40:20 +02:00
Divij Vaidya fe6a827e20
KAFKA-14633: Reduce data copy & buffer allocation during decompression (#13135)
After this change,

    For broker side decompression: JMH benchmark RecordBatchIterationBenchmark demonstrates 20-70% improvement in throughput (see results for RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize).
    For consumer side decompression: JMH benchmark RecordBatchIterationBenchmark a mix bag of single digit regression for some compression type to 10-50% improvement for Zstd (see results for RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize).

Reviewers: Luke Chen <showuon@gmail.com>, Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Ismael Juma <mail@ismaeljuma.com>
2023-06-05 15:04:49 +08:00