Commit Graph

12522 Commits

Author SHA1 Message Date
Kuan-Po (Cooper) Tseng bf9a27fefd
KAFKA-16388 add production-ready test of 3.3 - 3.6 release to MetadataVersionTest.testFromVersionString (#15563)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-24 13:09:21 +08:00
Nikolay 0f216b6448
MINOR: Tuple2 replaced with Map.Entry (#15560)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-23 11:44:05 +08:00
Johnny Hsu 6eba9057b1
KAFKA-16381 use volatile to guarantee KafkaMetric#config visibility across threads (#15550)
Reviewers: vamossagar12 <sagarmeansocean@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-03-23 11:37:55 +08:00
Sanskar Jhajharia 2e8d69b78c
KAFKA-16314: Introducing the AbortableTransactionException (#15486)
As a part of KIP-890, we are introducing a new class of Exceptions which when encountered shall lead to Aborting the ongoing Transaction. The following PR introduces the same with client side handling and server side changes.

On client Side, the code attempts to handle the exception as an Abortable error and ensure that it doesn't take the producer to a fatal state. For each of the Transactional APIs, we have added the appropriate handling. For the produce request, we have verified that the exception transitions the state to Aborted.
On the server side, we have bumped the ProduceRequest, ProduceResponse, TxnOffestCommitRequest and TxnOffsetCommitResponse Version. The appropriate handling on the server side has been added to ensure that the new error case is sent back only for the new clients. The older clients will continue to get the old Invalid_txn_state exception to maintain backward compatibility.

Reviewers: Calvin Liu <caliu@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-03-22 11:26:07 -07:00
Kirk True 159d25a7df
KAFKA-16276: Update transactions_test.py to support KIP-848’s group protocol config (#15567)
Added a new optional group_protocol parameter to the test methods, then passed that down to the methods involved.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-22 14:58:40 +01:00
Kirk True f66610095c
KAFKA-16274: Update replica_scale_test.py to support KIP-848’s group protocol config (#15577)
Added a new optional group_protocol parameter to the test methods, then passed that down to the methods involved.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-22 14:57:37 +01:00
Philip Nee fb03da0df4
KAFKA-16271: Upgrade consumer_rolling_upgrade_test.py (#15578)
Upgrading the test to use the consumer group protocol. The two tests are failing due to Mismatch Assignment

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-22 11:17:49 +01:00
Christo Lolov 997ca14f80
KAFKA-14133: Move stateDirectory mock in TaskManagerTest to Mockito (#15254)
This pull requests migrates the StateDirectory mock in TaskManagerTest from EasyMock to Mockito.
The change is restricted to a single mock to minimize the scope and make it easier for review.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Bruno Cadonna <cadonna@apache.org>
2024-03-22 10:43:53 +01:00
Alyssa Huang 7e85d7d32e
MINOR: KRaft upgrade tests should only use latest stable mv (#15566)
This should help us avoid testing MVs before they are usable (stable).
We revert back from testing 3.8 in this case since 3.7 is the current stable version.

Reviewers: Proven Provenzano <pprovenzano@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-03-21 11:06:40 -07:00
Philip Nee c5804e7649
KAFKA-16273: Update consumer_bench_test.py to use consumer group protocol (#15548)
Adding this as part of the greater effort to modify the system tests to incorporate the use of consumer group protocol from KIP-848. Following is the test results and the tests using protocol = consumer are expected to fail:

================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.11.4
session_id:       2024-03-16--002
run time:         76 minutes 36.150 seconds
tests run:        28
passed:           25
flaky:            0
failed:           3
ignored:          0
================================================================================

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Kirk True <ktrue@confluent.io>
2024-03-21 16:23:51 +01:00
Alyssa Huang 03f7b5aa3a
KAFKA-16206: Fix unnecessary topic config deletion during ZK migration (#14206)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ron Dagostino <rndgstn@gmail.com>
2024-03-21 15:38:42 +01:00
Johnny Hsu a41c10fd49
KAFKA-16318 : add javafoc for kafka metric (#15483)
Add the javadoc for KafkaMetric

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-03-21 19:20:05 +08:00
John Yu cfa7d39360
MINOR : Removed the depreciated information about Zk to Kraft migration. (#15552)
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-03-21 10:57:59 +08:00
Chris Egerton f1d741a9c1
KAFKA-16392: Stop emitting warning log message when parsing source connector offsets with null partitions (#15562)
Reviewers: Yash Mayya <yash.mayya@gmail.com>
2024-03-20 15:54:22 +00:00
Dongnuo Lyu 3e3c618bdc
KAFKA-16313: Offline group protocol migration (#15546)
This patch enables an empty classic group to be automatically converted to a new consumer group and vice versa.

Reviewers: David Jacot <djacot@confluent.io>
2024-03-20 00:49:11 -07:00
Nikolay b6183a4134
KAFKA-14589 ConsumerGroupCommand rewritten in java (#14471)
This PR contains changes to rewrite ConsumerGroupCommand in java and transfer it to tools module

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-20 15:34:45 +08:00
PoAn Yang 34d365fd8a
KAFKA-16222: desanitize entity name when migrate client quotas (#15481)
The entity name is sanitized when it's in Zk mode.
We didn't desanitize it when we migrate client quotas. Add Sanitizer.desanitize to fix it.

Reviewers: Luke Chen <showuon@gmail.com>
2024-03-20 14:53:23 +08:00
Manikumar Reddy 8c0fafba58 MINOR: Update upgrade docs to refer 3.6.2 version 2024-03-20 12:06:43 +05:30
Kuan-Po (Cooper) Tseng 12a1d85362
KAFKA-12187 replace assertTrue(obj instanceof X) with assertInstanceOf (#15512)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-20 10:36:25 +08:00
David Jacot c66d66dc67
KAFKA-16367; Full ConsumerGroupHeartbeat response must be sent when full request is received (#15533)
This patch fixes a bug in the logic which decides when a full ConsumerGroupHeartbeat response must be returned to the client. Prior to it, the logic only relies on the `ownedTopicPartitions` field to check whether the response was a full response. This is not enough because `ownedTopicPartitions` is also set in different situations. This patch changes the logic to check `ownedTopicPartitions`, `subscribedTopicNames` and `rebalanceTimeoutMs` as they are the only three non optional fields.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-03-19 13:48:41 -07:00
Johnny Hsu bf3f088c94
KAFKA-16341 fix the LogValidator for non-compressed type (#15476)
- Fix the verifying logic. If it's LOG_APPEND_TIME, we choose the offset of the first record. Else, we choose the record with the maxTimeStamp.
- rename the shallowOffsetOfMaxTimestamp to offsetOfMaxTimestamp

Reviewers: Jun Rao <junrao@gmail.com>, Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-03-19 23:00:30 +08:00
Artem Livshits 1d6e0b8727
KAFKA-16352: Txn may get get stuck in PrepareCommit or PrepareAbort state (#15524)
Now the removal of entries from the transactionsWithPendingMarkers map
checks the value and all pending marker operations keep the value along
with the operation state.  This way, the pending marker operation can
only delete the state it created and wouldn't accidentally delete the
state from a different epoch (which could lead to "stuck" transactions).

Reviewers: Justine Olshan <jolshan@confluent.io>
2024-03-18 19:08:55 -07:00
José Armando García Sancio 67cb742b54
MINOR; Log reason for deleting a kraft snapshot (#15478)
There are three reasons why KRaft would delete a snapshot. One, it is older than the retention time. Two, the total number of bytes between the log and the snapshot excess the configuration. Three, the latest snapshot is newer than the log.

This change allows KRaft to log the exact reason why a snapshot is getting deleted.

Reviewers: David Arthur <mumrah@gmail.com>, Hailey Ni <hni@confluent.io>
2024-03-18 12:06:42 -07:00
Lucas Brutschy 5c929874b8
KAFKA-16312, KAFKA-16185: Local epochs in reconciliation (#15511)
The goal of this commit is to change the following internals of the reconciliation:

- Introduce a "local epoch" to the local target assignment. When a new target is received by the server, we compare it with the current value. If it is the same, no change. Otherwise, we bump the local epoch and store the new target assignment. Then, on the reconciliation, we also store the epoch in the reconciled assignment and keep using target != current to trigger the reconciliation.
- When we are not in a group (we have not received an assignment), we use null to represent the local target assignment instead of an empty list, to avoid confusions with an empty assignment received by the server. Similarly, we use null to represent the current assignment, when we haven't reconciled the assignment yet.
We also carry the new epoch into the request builder to ensure that we report the owned partitions for the last local epoch.
- To address KAFKA-16312 (call onPartitionsAssigned on empty assignments after joining), we apply the initial assignment returned by the group coordinator (whether empty or not) as a normal reconciliation. This avoids introducing another code path to trigger rebalance listeners - reconciliation is the only way to transition to STABLE. The unneeded parts of reconciliation (autocommit, revocation) will be skipped in the existing. Since a lot of unit tests assumed that not reconciliation behavior is invoked when joining the group with an empty assignment, this required a lot of the changes in the unit tests.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, David Jacot <djacot@confluent.io>
2024-03-18 11:52:23 +01:00
Edoardo Comar e9c50b1f4b
KAFKA-16369: Broker may not shut down when SocketServer fails to bind as Address already in use (#15530)
* KAFKA-16369: wait on enableRequestProcessingFuture

Add a Wait in in KafkaServer (ZK mode) for all the SocketServer ports
to be open, and the Acceptors to be started

The BrokerServer (KRaft mode) had such a wait,
which was missing from the KafkaServer (ZK mode).

Add unit test.
2024-03-18 10:14:43 +00:00
David Jacot 3599823288
MINOR: Remove unused client side assignor fields/classes (#15545)
In https://github.com/apache/kafka/pull/15364, we introduced, thoughtfully, a non-backward compatible record change for the new consumer group protocol. So it is a good opportunity for cleaning unused fields, mainly related to the client side assignor logic which is not implemented yet. It is better to introduce them when we need them and more importantly when we implement it.

Note that starting from 3.8, we won't make such changes anymore. Non-backward compatible changes are still acceptable now because we clearly said that upgrade won't be supported from the KIP-848 EA.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-18 00:52:08 -07:00
Matthias J. Sax 313574e329
MINOR: fix flaky EosIntegrationTest (#15494)
Bumping some timeout due to slow Jenkins build.

Reviewers: Bruno Cadonna <bruno@confluent.io>
2024-03-16 18:06:45 -07:00
TapDang 3f66602626
KAFKA-16190: Member should send full heartbeat when rejoining (#15401)
When the consumer rejoins, heartbeat request builder make sure that all fields are sent in the heartbeat request.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-15 18:41:25 +01:00
Kirk True 386347bbca
KAFKA-16270: Update snapshot_test.py to support KIP-848’s group protocol config (#15538)
Added a new optional `group_protocol` parameter to the test methods, then passed that down to the `setup_consumer` method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new `@matrix` block instead of adding the `group_protocol=["classic", "consumer"]` to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-15 14:21:41 +01:00
Kirk True 57557df3ed
KAFKA-16269: Update reassign_partitions_test.py to support KIP-848’s group protocol config (#15540)
Added a new optional `group_protocol` parameter to the test methods, then passed that down to the `setup_consumer` method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new `@matrix` block instead of adding the `group_protocol=["classic", "consumer"]` to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-15 14:20:46 +01:00
Kirk True 8359f947a7
KAFKA-16268: Update fetch_from_follower_test.py to support KIP-848’s group protocol config (#15539)
Added a new optional `group_protocol` parameter to the test methods, then passed that down to the `setup_consumer` method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new `@matrix` block instead of adding the `group_protocol=["classic", "consumer"]` to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-15 14:19:56 +01:00
Kirk True 7945d322f6
KAFKA-16267: Update consumer_group_command_test.py to support KIP-848’s group protocol config (#15537)
* KAFKA-16267: Update consumer_group_command_test.py to support KIP-848’s group protocol config

Added a new optional group_protocol parameter to the test methods, then passed that down to the setup_consumer method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢

Note: this requires #15330.

* Update consumer_group_command_test.py

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-15 14:19:01 +01:00
Chris Holland e878654e95
MINOR: Cleanup BoundedList to Make Constructors More Safe (#15507)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-15 21:18:24 +08:00
Kirk True af0ec247cc
KAFKA-16231: Update consumer_test.py to support KIP-848’s group protocol config (#15330)
Added a new optional group_protocol parameter to the test methods, then passed that down to the setup_consumer method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-15 14:17:22 +01:00
Dongnuo Lyu fa190cf18e
MINOR: Only enable replay methods to modify timeline data structure (#15528)
The patch prevents the main method (the method generating records) from modifying the timeline data structure `groups`  by calling `getOrMaybeCreateConsumerGroup` in kip-848 new group coordinator. Only replay methods are able to add the newly created group to `groups`.

Reviewers: David Jacot <djacot@confluent.io>
2024-03-15 05:24:59 -07:00
Hector Geraldino 7b9f31e35c
KAFKA-16358: Sort transformations by name in documentation; add missing transformations to documentation; add hyperlinks (#15499)
Reviewers: Yash Mayya <yash.mayya@gmail.com>
2024-03-15 16:26:36 +05:30
A. Sophie Blee-Goldman 96bfac4216
MINOR: Update javadocs and exception string in "deprecated" ProcessorRecordContext#hashcode (#15508)
This PR updates the javadocs for the "deprecated" hashCode() method of ProcessorRecordContext, as well as the UnsupportedOperationException thrown in its implementation, to actually explain why the class is mutable and therefore unsafe for use in hash collections. They now point out the mutable field in the class (namely the Headers)

Reviewers: Matthias Sax <mjsax@apache.org>, Bruno Cadonna <cadonna@apache.org>
2024-03-14 23:08:39 -07:00
Kamal Chandraprakash e4c53d093e
KAFKA-15206: Fix the flaky RemoteIndexCacheTest.testClose test (#15523)
It is possible that due to resource constraint, ShutdownableThread#run might be called later than the ShutdownableThread#close method.

Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>
2024-03-15 10:33:40 +08:00
Luke Chen 834efa6606
KAFKA-16342 fix getOffsetByMaxTimestamp for compressed records (#15474)
Fix getOffsetByMaxTimestamp for compressed records.

This PR adds:

1) For inPlaceAssignment case, compute the correct offset for maxTimestamp when traversing the batch records, and set to ValidationResult in the end, instead of setting to last offset always.

2) For not inPlaceAssignment, set the offsetOfMaxTimestamp for the log create time, like non-compressed, and inPlaceAssignment cases, instead of setting to last offset always.

3) Add tests to verify the fix.

Reviewers: Jun Rao <junrao@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2024-03-15 06:09:45 +08:00
Hector Geraldino 178761eb36
KAFKA-14683 Cleanup WorkerSinkTaskTest (#15506)
1) Rename WorkerSinkTaskMockitoTest back to WorkerSinkTaskTest
2) Tidy up the code a bit
3) rewrite "fail" by "assertThrow"

Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-03-15 03:50:57 +08:00
David Mao 37212bb242
MINOR: AddPartitionsToTxnManager performance optimizations (#15454)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Justine Olshan <jolshan@confluent.io>
2024-03-14 18:53:26 +01:00
Andras Katona 2c49a76573
KAFKA-13922: Adjustments for jacoco, coverage reporting (#11982)
Jacoco and scoverage reporting hasn't been working for a while. This commit fixes report generation. After this PR only subproject level reports are generated as Jenkins and Sonar only cares about that.
This PR doesn't change Kafka's Jenkinsfile.

Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
2024-03-14 10:36:46 +01:00
David Jacot e164d4d426
KAFKA-16249; Improve reconciliation state machine (#15364)
This patch re-work the reconciliation state machine on the server side with the goal to fix a few issues that we have recently discovered.
* When a member acknowledges the revocation of partitions (by not reporting them in the heartbeat), the current implementation may miss it. The issue is that the current implementation re-compute the assignment of a member whenever there is a new target assignment installed. When it happens, it does not consider the reported owned partitions at all. As the member is supposed to only report its own partitions when they change, the member is stuck.
* Similarly, as the current assignment is re-computed whenever there is a new target assignment, the rebalance timeout, as it is currently implemented, becomes useless. The issue is that the rebalance timeout is reset whenever the member enters the revocation state. In other words, in the current implementation, the timer is reset when there are no target available even if the previous revocation is not completed yet.

The patch fixes these two issues by not automatically recomputing the assignment of a member when a new target assignment is available. When the member must revoke partitions, the coordinator waits. Otherwise, it recomputes the next assignment. In other words, revoking is really blocking now.

The patch also proposes to include an explicit state in the record. It makes the implementation cleaner and it also makes it more extensible in the future.

The patch also changes the record format. This is a non-backward compatible change. I think that we should do this change to cleanup the record. As KIP-848 is only in early access in 3.7 and that we clearly state that we don't plane to support upgrade from it, this is acceptable in my opinion.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-03-14 00:54:28 -07:00
Matthias J. Sax 612a1fe1bb
MINOR: Kafka Streams docs fixes (#15517)
- add missing section to TOC
- add default value for client.id

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Bruno Cadonna <bruno@confluent.io>
2024-03-13 21:54:06 -07:00
Matthias J. Sax d88a97adef
MINOR: simplify consumer logic (#15519)
For static member, the `group.instance.id` cannot change.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Lianet Magrans <lianetmr@gmail.com>, David Jacot <david.jacot@gmail.com>
2024-03-13 21:52:25 -07:00
José Armando García Sancio 722967a2b7
MINOR; Make string from array (#15526)
If toString is called on an array it returns the string representing the object reference.  Use mkString instead to print the content of the array.

Reviewers: Luke Chen <showuon@gmail.com>, Justine Olshan <jolshan@confluent.io>, Lingnan Liu <liliu@confluent.io>
2024-03-13 14:36:03 -07:00
Greg Harris aa7bef414e
MINOR: Resolve SSLContextFactory.getNeedClientAuth deprecation (#15468)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-03-13 11:45:38 -07:00
Cheryl Simmons 2c613b2d42
MINOR: Tweak streams config doc (#15518)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2024-03-12 15:21:43 -07:00
Bruno Cadonna 58ddd693e6
KAFKA-16227: Avoid IllegalStateException during fetch initialization (#15491)
The AsyncKafkaConsumer might throw an IllegalStateException during
the initialization of a new fetch. The exception is caused by
 the partition being unassigned by the background thread before
the subscription state is accessed during initialisation.

This commit avoids the IllegalStateException by verifying that
the partition was not unassigned each time the subscription state
is accessed.

Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-12 11:04:37 +01:00
Christo Lolov 8b72a2c72f
KAFKA-14133: Move consumer mock in TaskManagerTest to Mockito - part 3 (#15497)
The previous pull request in this series was #15261.

This pull request continues the migration of the consumer mock in TaskManagerTest test by test for easier reviews.

The next pull request in the series will be #15254 which ought to complete the Mockito migration for the TaskManagerTest class

Reviewer: Bruno Cadonna <cadonna@apache.org>
2024-03-11 12:51:20 +01:00