Commit Graph

14504 Commits

Author SHA1 Message Date
Mickael Maison 57eb5fd7dc
KAFKA-14587: Move AclCommand to tools (#17880)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 20:05:46 +01:00
TengYao Chi e41373cef6
KAFKA-18242 The java code in core module is NOT configured with suitable release version (#18182)
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 17:33:32 +08:00
Kamal Chandraprakash 139e5b15a1
KAFKA-17928: Make remote log manager thread-pool configs dynamic (#17859)
- Disallow configuring -1 for copier and expiration thread pools dynamically

Co-authored-by: Peter Lee <peterxcli@gmail.com>

Reviewers: Peter Lee <peterxcli@gmail.com>, Satish Duggana <satishd@apache.org>
2024-12-14 13:14:05 +05:30
A. Sophie Blee-Goldman 91575892d2
HOTFIX: RocksDBMetricsRecorder#init should null check taskId (#18151)
Appears to be a typo in the code, since the error message indicates this check is for taskId being null, but instead we accidentally check the streams metrics twice

Reviewers: Matthias Sax <mjsax@apache.org>, runo Cadonna <cadonna@apache.org>, Lucas Brutschy <lbrutschy@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
2024-12-13 20:36:08 -08:00
Matthias J. Sax f2f19b7ad9
MINOR: update release scripts (#18178)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 12:15:20 +08:00
David Arthur f3f975ea67
MINOR: Add badge for flaky test report (#18179)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 12:14:50 +08:00
David Arthur 369b8b5607
KAFKA-18223 Add GHA to run report [2/n] (#18170)
Run the flaky test report daily at 6am UTC.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-13 20:59:04 -05:00
Kuan-Po Tseng 9e60fcc87f
KAFKA-18181 Refactor ShareConsumerTest (#18105)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 09:39:44 +08:00
David Jacot 450c10d00c
KAFKA-17507; WriteTxnMarkers API must not return until markers are written and materialized in group coordinator's cache (#18168)
We have observed the below errors in some cluster:

Uncaught exception in scheduled task 'handleTxnCompletion-902667' exception.message:Trying to complete a transactional offset commit for producerId *** and groupId *** even though the offset commit record itself hasn't been appended to the log.

When a transaction is completed, the transaction coordinator sends a WriteTxnMarkers request to all the partitions involved in the transaction to write the markers to them. When the broker receives it, it writes the markers and if markers are written to the __consumer_offsets partitions, it informs the group coordinator that it can materialize the pending transactional offsets in its main cache. The group coordinator does this asynchronously since Apache Kafka 2.0, see this patch.

The above error appends when the asynchronous operation is executed by the scheduler and the operation finds that there are pending transactional offsets that were not written yet. How come?

There is actually an issue is the steps described above. The group coordinator does not wait until the asynchronous operation completes to return to the api layer. Hence the WriteTxnMarkers response may be send back to the transaction coordinator before the async operation is actually completed. Hence it is possible that the next transactional produce to be started also before the operation is completed too. This could explain why the group coordinator has pending transactional offsets that are not written yet.

There is a similar issue when the transaction is aborted. However on this path, we don't have any checks to verify whether all the pending transactional offsets have been written or not so we don't see any errors in our logs. Due to the same race condition, it is possible to actually remove the wrong pending transactional offsets.

PS: The new group coordinator is not impacted by this bug.

Reviewers: Justine Olshan <jolshan@confluent.io>
2024-12-13 13:50:41 -08:00
Alyssa Huang b73e31eb15
KAFKA-17641; Update Vote RPC with PreVote field (#17807)
Introduces v2 of Vote RPC and implements the handling of the new version of the RPC.

Many references to "candidate" in the Vote RPC are changed to the more generic "replica". Replicas sending Vote request with PreVote set to true are not candidate. They are instead prospective candidate that are attempting to become candidate.

Replicas receiving PreVote requests (vote request with PreVote=true) with an epoch equal to their own will _not_ transition to Unattached state. They will only grant the vote if they have not recently fetched from leader and the request's last epoch and offset are up-to-date with theirs.

If a replica receives a PreVote request with an epoch greater than their current epoch, they will transition to Unattached state (setting their epoch to the one from the pre-vote request) and then grant the vote if the request's last epoch and offset are up-to-date with theirs.

To avoid a possible ping-pong scenario. For example, there is 3 node quorum, leader node A disconnects from quorum, node B goes into prospective state first before node C, node B sends pre-vote request to node C still in follower state and receives back that node A is leader, node B transitions to follower while node C transitions to prospective after election timeout. If you repeat this interaction, it is possible for such replicas to transition from Follower to Prospective in perpetuity. This issue is resolved by having follower state nodes grant pre-vote requests only if they have successfully fetched from the leader at least once after becoming a follower.

This change introduces a new suite called KafkaRaftClientPreVoteTest, for additional KRaft protocol tests with respect to pre-vote.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2024-12-13 16:24:30 -05:00
Lianet Magrans 84bc0c26ee
KAFKA-18224: Explicit group protocol setting in streams resetter (#18172)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2024-12-13 14:31:50 -05:00
Logan Zhu 497f500483
KAFKA-18183 replace BytesSerializer with ByteArraySerializer for producer/consumer (#18113)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 01:42:32 +08:00
Ken Huang 6af233e4fc
KAFKA-18203 Add a section for Java version in intellij idea in README (#18134)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 01:37:25 +08:00
Ken Huang 669d8610a2
KAFKA-18228 The MetricsDuringTopicCreationDeletionTest should delete topics to ensure that the metrics are recreated (#18163)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 01:35:08 +08:00
TaiJuWu 161d1cdf85
KAFKA-18218 fix Trogdor system test (#18156)
Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 01:23:58 +08:00
TengYao Chi b37b89c668
KAFKA-9366 Upgrade log4j to log4j2 (#17373)
This pull request replaces Log4j with Log4j2 across the entire project, including dependencies, configurations, and code. The notable changes are listed below:

1. Introduce Log4j2 Instead of Log4j
2. Change Configuration File Format from Properties to YAML
3. Adds warnings to notify users if they are still using Log4j properties, encouraging them to transition to Log4j2 configurations

Co-authored-by: Lee Dongjin <dongjin@apache.org>

Reviewers: Luke Chen <showuon@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 01:14:31 +08:00
Sean Quah b94defa189
KAFKA-18199; Fix size calculation for nullable tagged structs (#18127)
When a struct field is tagged and nullable, it is serialized as
{ varint tag; varint dataLength; nullable data }, where
nullable is serialized as
{ varint isNotNull; if (isNotNull) struct s; }. The length field
includes the is-not-null varint.

This patch fixes a bug in serialization where the written value of
the length field and the value used to compute the size of the length
field differs by 1. In practice this has no impact unless the
serialized length of the struct is 127 bytes, since the varint encodings
of 127 and 128 have different lengths (0x7f vs 0x80 01).

Reviewers: David Jacot <djacot@confluent.io>
2024-12-13 04:31:53 -08:00
PoAn Yang 770d64d2cc
KAFKA-16143: New JMX metrics for AsyncKafkaConsumer (#17199)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Kirk True <ktrue@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2024-12-13 07:20:27 -05:00
PoAn Yang d5ad9228cf
KAFKA-17750; Extend kafka-consumer-groups command line tool to support new consumer group (part 3) (#18141)
This patch extends the `kafka-consumer-groups` command line tool to support the new consumer group as described in KIP-1099.

Signed-off-by: PoAn Yang <payang@apache.org>

Reviewers: David Jacot <djacot@confluent.io>
2024-12-13 01:31:39 -08:00
Gantigmaa Selenge 747dc172e8
KIP-1073: Return fenced brokers in DescribeCluster response (#17524)
mplementation of KIP-1073: Return fenced brokers in DescribeCluster response.
Add new unit and integration tests for describeCluster.

Reviewers: Luke Chen <showuon@gmail.com>
2024-12-13 10:58:11 +08:00
Matthias J. Sax d7c80a7257 MINOR: update Kafka version for docker scan 2024-12-12 16:30:24 -08:00
Logan Zhu 92352a96e8
MINOR: ensure SuppressWarnings annotation is effective for mockValidationIsolation (#18158)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-13 07:57:29 +08:00
Kuan-Po Tseng baa870a582
KAFKA-18214 TestUtils#waitForCondition does not honor the maxWaitMs (#18145)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-13 07:44:57 +08:00
Ken Huang ce77a7413e
KAFKA-18194 Flaky test_broker_rolling_bounce due to metadata update (#18153)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-13 07:33:03 +08:00
A. Sophie Blee-Goldman ef2e4600f3
KAFKA-18026: KIP-1112, migrate stream-stream joins to use ProcesserSupplier#stores (#18111)
Covers wrapping of processors and state stores for KStream-KStream joins.

Includes self-joins and the spurious results fix optimization

Reviewers: Guozhang Wang <guozhang.wang.us@gmail.com>
2024-12-12 14:54:58 -08:00
Almog Gavra 9b776ffc50
KAFKA-18026: KIP-1112 convert StreamToTableNode (#18149)
Covers wrapping of processors and state stores for StreamToTableSource

Reviewers: Guozhang Wang <guozhang.wang.us@gmail.com>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2024-12-12 14:52:21 -08:00
Jason Taylor 3b1bd3812e
KAFKA-16368: Update remote.log.manager.* default thread pool values for KIP-1030 (#18137)
Reviewers: Divij Vaidya <diviv@amazon.com>
2024-12-12 23:51:26 +01:00
santhoshct 5bb1ea403c
KAFKA-18223 Flaky test report script (#17938)
Adds a python script to generate a detailed flaky test report using the Develocity API

Reviewers: David Arthur <mumrah@gmail.com>
2024-12-12 16:50:17 -05:00
Nick Guo 671cbedc1b
KAFKA-18219 Use INFO level instead of ERROR after successfully performing an unclean leader election (#18159)
Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-13 03:57:14 +08:00
Lianet Magrans 7a64623e40
Set protocol for streams tests (#18160)
Reviewers: Bill Bejeck <bill@confluent.io>
2024-12-12 13:33:43 -05:00
Colin Patrick McCabe 65820acad2
MINOR: disable some rebootstrap tests, convert the others to KRaft (#17765)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-13 01:59:20 +08:00
TengYao Chi 772aa241b2
KAFKA-18136: Remove zk migration from code base (#18016)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-12 18:34:29 +01:00
Sushant Mahajan 4c5ea05ec8
KAFKA-18058: Share group state record pruning impl. (#18014)
In this PR, we've added a class ShareCoordinatorOffsetsManager, which tracks the last redundant offset for each share group state topic partition. We have also added a periodic timer job in ShareCoordinatorService which queries for the redundant offset at regular intervals and if a valid value is found, issues the deleteRecords call to the ReplicaManager via the PartitionWriter. In this way the size of the partitions is kept manageable.

Reviewers: Jun Rao <junrao@gmail.com>, David Jacot <djacot@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2024-12-12 07:38:03 +00:00
Matthias J. Sax a0a501952b
MINOR: improve Kafka Streams metrics documentation (#17900)
Reviewers: Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang.wang.us@gmail.com>
2024-12-11 18:34:43 -08:00
Almog Gavra 21563380f3
KAFKA-18026: KIP-1112, migrate table-table joins to use ProcesserSuppliers#stores (#18048)
Covers wrapping of processors and state stores for KTable-KTable joins

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang.wang.us@gmail.com>
2024-12-11 17:37:34 -08:00
Ken Huang 010b9ff6c6
KAFKA-18186 Set `options.release` to make Intellij configure suitable language level automatically (#18104)
Reviewers: "A. Sophie Blee-Goldman" <ableegoldman@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-12 07:39:46 +08:00
snehashisp f4fe6064cc
KAFKA-18215: KIP-891 Connect Multiversioning Support (Configs and Validation changes for Connectors and Converters) (#17741)
Reviewers: Greg Harris <greg.harris@aiven.io>
2024-12-11 15:34:21 -08:00
Matthias J. Sax 6cdb8c352a
KAFKA-18015: add byDuration auto.offset.reset to Kafka Streams (#18115)
Part of KIP-1106.

Adds support for "by_duration" and "none" reset strategy
to the Kafka Streams runtime.

Reviewers: Bill Bejeck <bill@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2024-12-11 15:12:16 -08:00
Matthias J. Sax 990c8c750c
MINOR: remove old procesor API MockInternalProcessorContext (#18103)
Reviewers: Bill Bejeck <bill@confluent.io>
2024-12-11 15:09:13 -08:00
Matthias J. Sax ab2facca58
KAFKA-12829: Remove deprecated KStream.process() for old Processor API (#18088)
Reviewers: Bill Bejeck <bill@confluent.io>
2024-12-11 14:28:47 -08:00
TengYao Chi de2ccb5789
KAFKA-18021: Disabled MirrorCheckpointConnector throws RetriableException on task config generation (#18098)
Reviewers: Greg Harris <greg.harris@aiven.io>
2024-12-11 13:56:38 -08:00
Apoorv Mittal a1703e2cca
KAFKA-17040: Removing exception on further calls to terminated telemetry reporter (#18143)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2024-12-11 15:47:45 -05:00
KApolinario1120 d83f09d014
KAFKA-18015: Add support for duration based offset reset strategy to Kafka Streams (#17973)
Part of KIP-1106.

Adds the public APIs to Kafka Streams, to support the the newly added "by_duration" reset policy,
plus adds the missing "none" reset policy. Deprecates the enum `Topology.AutoOffsetReset` and
all related methods, and replaced them with new overload using the new `AutoOffsetReset` class.

Co-authored-by: Matthias J. Sax <matthias@confluent.io>

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
2024-12-11 10:47:25 -08:00
PoAn Yang 156d551603
MINOR: suppress deprecation warnings for MemberDescription (#18139)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-12 01:05:31 +08:00
Ken Huang 23de98cdc5
KAFKA-17554 disable testFutureCompletionOutsidePoll (#18138)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-12 00:54:36 +08:00
Mickael Maison fac8333f8d
MINOR: Remove ToolsUtils.scala (#18120)
Reviewers: Christo Lolov <lolovc@amazon.com>
2024-12-11 16:42:05 +00:00
Alyssa Huang e979fce94e
KAFKA-17030; Unattached voters will fetch from bootstrap servers (#17352)
Because the set of voters are dynamic (KIP-953), it is possible for a replica to believe they are a voter while the current leader doesn't have that replica in the voter set. In this replicated state, the leader will not sent BeginQuorumEpoch requests to such a replica. This means that such replicas will not be able to discover the leader.

This change will help Unattached rediscover the leader by sending Fetch requests to the the bootstrap servers.
Followers have a similar issue - if they are unable to communicate with the leader they should try contacting the bootstrap servers.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2024-12-11 11:38:14 -05:00
Kirk True d09e222846
KAFKA-18189: CoordinatorRequestManager log message can include incorrect coordinator disconnect time (#18109)
Fixed logic in markCoordinatorUnknown to ensure the warning log contains the correct number of milliseconds the client has been disconnected.

Reviewers: Christo Lolov <lolovc@amazon.com>
2024-12-11 16:22:51 +00:00
Christopher L. Shannon bd6d0fbf3d
KAFKA-16437 Upgrade to Jakarta and Jetty 12 (KIP-1032) (#16754)
This commit implements the changes for KIP-1032. This updates Kafka to Jakarta specs, JavaEE 10 and Jetty 12. The changes here primarily effect Kafka Connect and MM2.

Todo/Notes:

1) I bumped the connect modules to JDK 17 but I also had to bump a couple other things that had a dependency on conect. The tools project depends on connect so that had to be bumped, and streams depends on tools so that needed to be bumped. This means we may need to separate some things if we don't want to enforce JDK 17 on streams.

2) There is an issue with a test in DedicatedMirrorIntegrationTest that I had to change for now that involves escaping characters and not quite sure what to do about it yet. The cause is the Servlet 6 spec changing what is allowed in the path. See: Jetty 12: 400: Ambiguous URI path encoding for path <%=FOO%>~1 (encoded: %3C%25%3DFOO%25%3E%7E1) jetty/jetty.project#11890

3) I had to configure the idle timeout in Jetty requests to match our request timeout so tests didn't fail. This was needed to fix the ConnectWorkerIntegrationTest#testPollTimeoutExpiry() test

Testing is being done by just using the existing tests for Connect and MM2 which should be sufficient.

Reviewers: Greg Harris <greg.harris@aiven.io>, David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-11 23:24:14 +08:00
Kuan-Po Tseng d2ad418cfd
KAFKA-18156 VerifiableConsumer should ignore "--session-timeout" when using CONSUMER protocol (#18036)
Reviewers: TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-11 21:12:46 +08:00