Commit Graph

11852 Commits

Author SHA1 Message Date
Gantigmaa Selenge 486d5f6c64
KAFKA-15566: Fix flaky tests in FetchRequestTest.scala in KRaft mode (#14573)
Fixed some of the failing tests in FetchRequestTest.

testFetchWithPartitionsWithIdError and testCreateIncrementalFetchWithPartitionsInErrorV12 fail with the following error when enabled with KRaft mode. These tests only fail sometimes when running locally but consistently failed when running in the Jenkins Pipeline.

Tests will call the utility function TestUtils.waitUntilLeaderIsKnown after creating the topic partitions so that they wait for the logs to be created on the leader before sending fetch requests.

Enabled all tests except checkLastFetchedEpochValidation with KRaft mode.
Looking at the build history in Jenkins, all the other tests except these 2 tests and checkLastFetchedEpochValidation were passing when they were enabled with KRaft mode. Therefore enabled them with KRaft mode again but left checkLastFetchedEpochValidation to be investigated further.

Reviewers: Luke Chen <showuon@gmail.com>, dengziming <dengziming1993@gmail.com>
2023-10-20 09:59:21 +08:00
Calvin Liu af747fbfed
KAFKA-15581: Introduce ELR (#14312)
This patch introduces preliminary changes for Eligible Leader Replicas (KIP-966)

* New MetadataVersion 16 (3.7-IV1)
* New record versions for PartitionRecord and PartitionChangeRecord
* New tagged fields on PartitionRecord and PartitionChangeRecord
* New static config "eligible.leader.replicas.enable" to gate the whole feature

Reviewers: Artem Livshits <alivshits@confluent.io>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2023-10-19 14:05:15 -04:00
Calvin Liu 14029e2ddd
KAFKA-15582: Identify clean shutdown broker (#14465)
The PR includes:

* Added a new class of CleanShutdownFile which helps write and read from a clean shutdown file.
* Updated the BrokerRegistration API.
* Client side handling for the broker epoch.
* Minimum work on the controller side.

Reviewers: Jun Rao <junrao@gmail.com>
2023-10-19 10:25:23 -07:00
Hanyu Zheng bbdf6de88a
KAFKA-15527: Add reverseRange and reverseAll query over kv-store in IQv2 (#14477)
Implements KIP-985.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-10-19 10:16:19 -07:00
Apoorv Mittal 36abc8dcea
KAFKA-15604: Telemetry API request and response schemas and classes (KIP-714) (#14554)
Initial PR for [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability) - [KAFKA-15601](https://issues.apache.org/jira/browse/KAFKA-15601).

This PR defines json request and response schemas for the new Telemetry APIs and implements the corresponding java classes.

Reviewers: 
Andrew Schofield <andrew_schofield@uk.ibm.com>, Kirk True <ktrue@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Walker Carlson <wcarlson@apache.org>
2023-10-19 10:55:21 -05:00
vamossagar12 8f3731e2bd
KAFKA-15454: Add support for OffsetCommit version 9 in admin client (#14571)
This patch adds support for OffsetCommit version 9 in the admin client. It mainly allows handling two new error codes `STALE_MEMBER_EPOCH` and `GROUP_ID_NOT_FOUND ` introduced as part of KIP-848.

Reviewers: David Jacot <djacot@confluent.io>
2023-10-19 07:48:12 -07:00
Apoorv Mittal 26aa353dc1
KAFKA-15616: Client telemetry states and transition (KIP-714) (#14566)
Part of KIP-714.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Philip Nee <pnee@confluent.io>, Kirk True <ktrue@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2023-10-18 21:43:05 -07:00
Apoorv Mittal 78166101eb
KAFKA-15613: Client API definition and configurations (KIP-714) (#14560)
Part of KIP-714.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2023-10-18 20:47:51 -07:00
Matthias J. Sax 72fdd9f62a
MINOR: add KIP-941 to Kafka Streams upgrade docs (#14577)
Reviewers: Hao Li <hli@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Bill Bejeck <bill@confluent.io>
2023-10-18 17:20:09 -07:00
Lianet Magrans 48449b68fd
KAFKA-15554: Client state changes for handling one assignment at a time & minor improvements (#14413)
This patch includes:
- target assignment changes : accepting only one at a time according to the updated protocol.
- changes for error handling, leaving responsibility in the heartbeatManager and exposing only the functionality for when the state needs to be updated (on successful HB, on fencing, on fatal failure)
- allow transitions for failures when joining
- tests & minor improvements/fixes addressing initial version review

Reviewers: Kirk True <ktrue@confluent.io>, Philip Nee <pnee@confluent.io>, David Jacot <djacot@confluent.io>
2023-10-18 08:10:18 -07:00
Arpit Goyal dc6a53e196
MINOR: Rename lock variable of the entry class (#14569)
The RemoteIndexCache has a variable lock and the child class also have a variable lock in the same class file. Renaming lock of the entry(child class) to avoid confusion.

Reviewers: Luke Chen <showuon@gmail.com>, hudeqi <1217150961@qq.com>
2023-10-18 18:20:55 +08:00
Mickael Maison 8aee297669
MINOR: Various Java cleanups in core (#14561)
Reviewers: Josep Prat <josep.prat@aiven.io>
2023-10-18 11:49:25 +02:00
Matthias J. Sax 9b468fb278
MINOR: Do not end Javadoc comments with `**/` (#14540)
Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bill@confluent.io>, Hao Li <hli@confluent.io>, Josep Prat <josep.prat@aiven.io>
2023-10-17 21:11:04 -07:00
Jeff Kim abee8f711c
KAFKA-14519; [1/N] Implement coordinator runtime metrics (#14417)
Implements the following metrics:

kafka.server:type=group-coordinator-metrics,name=num-partitions,state=loading
kafka.server:type=group-coordinator-metrics,name=num-partitions,state=active
kafka.server:type=group-coordinator-metrics,name=num-partitions,state=failed
kafka.server:type=group-coordinator-metrics,name=event-queue-size
kafka.server:type=group-coordinator-metrics,name=partition-load-time-max
kafka.server:type=group-coordinator-metrics,name=partition-load-time-avg
kafka.server:type=group-coordinator-metrics,name=thread-idle-ratio-min
kafka.server:type=group-coordinator-metrics,name=thread-idle-ratio-avg
The PR makes these metrics generic so that in the future the transaction coordinator runtime can implement the same metrics in a similar fashion.

Also, CoordinatorLoaderImpl#load will now return LoadSummary which encapsulates the start time, end time, number of records/bytes.

Co-authored-by: David Jacot <djacot@confluent.io>

Reviewers:  Ritika Reddy <rreddy@confluent.io>, Calvin Liu <caliu@confluent.io>, David Jacot <djacot@confluent.io>, Justine Olshan <jolshan@confluent.io>
2023-10-17 16:06:23 -07:00
Lucas Brutschy e7e399b940
MINOR: allow removing a suspended task from task registry. (#14555)
When we get a suspended task re-assigned in the eager rebalance protocol, we have to add the task back to the state updater so that it has a chance to catch up with its change log.

This was prevented by a check in Tasks, which disallows removing SUSPENDED tasks from the task registry. I couldn't find a reason why this must be an invariant of the task registry, so this weakens the check.

The error happens in the integration between TaskRegistry and TaskManager. However, this change anyway adds unit tests to more closely specify the intended behavior of the two modules.

Reviewers: Bruno Cadonna <bruno@confluent.io>
2023-10-17 14:32:41 +02:00
Mickael Maison 9d04c7a045
MINOR: Various Scala cleanups in core (#14558)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2023-10-17 12:04:14 +02:00
Omnia G.H Ibrahim 9af1e74b5e
KAFKA-14596: Move TopicCommand to tools (#13201)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
2023-10-17 11:40:15 +02:00
Ismael Juma 69e591db3a
MINOR: Rewrite/Move KafkaNetworkChannel to the `raft` module (#14559)
This is now possible since `InterBrokerSend` was moved from `core` to `server-common`.
Also rewrite/move `KafkaNetworkChannelTest`.

The scala version of `KafkaNetworkChannelTest` passed with the changes here (before I
deleted it).

Reviewers: Justine Olshan <jolshan@confluent.io>, José Armando García Sancio <jsancio@users.noreply.github.com>
2023-10-16 20:10:31 -07:00
Luke Chen 7376d2c5b1
MINOR: add quick start for tiered storage feature (#14528)
Some users complained they don't have a way to determine if there is something wrong in the RSM plug-in they implemented, or there's something wrong in Kafka itself. Also, if there are users who just want to try the tiered storage feature out before implementing anything, it would be good we have an RSM implementation by default.

Per the discussion in the KIP, there will be no default RSM implementation in Kafka, but we can use the LocalTieredStorage implemented for integration test, to resolve the issues above.

Reviewers: Christo Lolov <lolovc@amazon.com>, Divij Vaidya <diviv@amazon.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Satish Duggana <satishd@apache.org>
2023-10-17 10:30:11 +08:00
Hanyu Zheng 732bffcae6
KAFKA-15569: test and add test cases in IQv2StoreIntegrationTest (#14523)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-10-16 17:30:05 -07:00
mannoopj da314ee48c
KAFKA-15532: non active controllers return 0 for ZkWriteBeforelag (#14478)
Since only the active controller is performing the dual-write to ZK during a migration, it should be the only controller
to report the ZkWriteBehindLag metric.

Currently, if the controller fails over during a migration, the previous active controller will incorrectly report its last
value for ZkWriteBehindLag forever. Instead, it should report zero.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>
2023-10-16 15:22:50 -07:00
dengziming 5c9db5e735
KAFKA-15390: Do not return fenced broker in FetchResponse.preferredReplica (#14272)
Do not return fenced brokers from metadataCache.getPartitionReplicaEndpoints, since that could lead to
them getting used as preferred read replicas.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-10-16 15:08:40 -07:00
Ismael Juma 1073d434ec
KAFKA-14481: Move LogSegment/LogSegments to storage module (#14529)
A few notes:
* Delete a few methods from `UnifiedLog` that were simply invoking the related method in `LogFileUtils`
* Fix `CoreUtils.swallow` to use the passed in `logging`
* Fix `LogCleanerParameterizedIntegrationTest` to close `log` before reopening
* Minor tweaks in `LogSegment` for readability
 
For broader context on this change, please check:

* KAFKA-14470: Move log layer to storage module

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>
2023-10-16 06:37:30 -07:00
bachmanity1 eb187745cd
MINOR: Fix docs for ReplicationBytes(Out|In)PerSec metrics (#14228)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Taras Ledkov
2023-10-16 15:12:38 +02:00
Hector Geraldino 4150595b0a
KAFKA-14684: Replace EasyMock/PowerMock with Mockito in WorkerSinkTaskThreadedTest (#14505)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Christo Lolov <christololov@gmail.com>
2023-10-16 11:24:52 +02:00
hudeqi b0b8693c72
KAFKA-15536: Dynamically resize remoteIndexCache (#14511)
Dynamically resize remoteIndexCache

Reviewers: Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2023-10-16 15:24:36 +08:00
Matthias J. Sax d4c661c017
MINOR: cleanup warnings in Kafka Streams code base (#14549)
Reviewers: Guozhang Wang <wangguoz@gmail.com>, A. Sophie Blee-Goldman <sophie@responsive.dev>
2023-10-15 19:32:32 -07:00
Matthias J. Sax 649e2cbc8f
MINOR: Fix `Consumed` to return new object instead of `this` (#14550)
We embrace immutability and thus should return a new object instead of
`this`, similar to other config classed we use in the DSL.

Side JavaDocs cleanup for a bunch of classes.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2023-10-15 19:28:54 -07:00
Yash Mayya 1c8bb61a43
KAFKA-15387: Deprecate Connect's redundant task configurations endpoint (#14361)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Sagar Rao <sagarmeansocean@gmail.com>
2023-10-14 14:46:50 +05:30
Matthias J. Sax cd1b7639cb
MINOR: cleanup some warning in Kafka Streams examples (#14547)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2023-10-13 19:00:22 -07:00
Matthias J. Sax 364bc3c5c4
MINOR: Fix directory name inconsistency in the Kafka Streams tutorial (#14541)
Reviewers: Bruno Cadonna <bruno@confluent.io>
2023-10-13 09:28:39 -07:00
Matthias J. Sax dc0f0db864
MINOR: fix typo (#14542)
Reviewers: Bruno Cadonna <bruno@confluent.io>
2023-10-13 09:28:00 -07:00
Satish Duggana cc951e3f81
KAFKA-15593: Add 3.6 to core upgrade and compatibility tests (#14527)
Reviewers:  Christo Lolov <lolovc@amazon.com>, Josep Prat <josep.prat@aiven.io>
2023-10-13 20:51:34 +05:30
Federico Valeri a86681b6f9
MINOR: Add upgrade documentation for 3.6.0 (#14534)
This change adds the upgrade documentation for 3.6.0 and fixes the position of the notable changes in 3.5.0.
In previous releases, notable changes always come after the upgrade instructions.

Reviewers:  Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
2023-10-13 17:20:03 +05:30
Omnia G.H Ibrahim 4bad90835b
KAFKA-15465: Don't throw if MirrorMaker not authorized to create internal topics. (#14388)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ahmed Hibot
2023-10-13 12:53:09 +02:00
Lianet Magrans 58dfa1cc81
MINOR - KAFKA-15550: Validation for negative target times in offsetsForTimes (#14503)
The current KafkaConsumer offsetsForTimes fails with IllegalArgumentException if negative target timestamps are provided as arguments. This change includes the same validation and tests for the new consumer implementation (and some improved comments for the updateFetchPositions)

Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
2023-10-13 09:59:57 +02:00
Mickael Maison 13b2edd9af
KAFKA-15596: Upgrade ZooKeeper to 3.8.3 (#14535)
Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>
2023-10-12 17:30:23 +02:00
Ismael Juma 4cf86c5d2f
KAFKA-15492: Upgrade and enable spotbugs when building with Java 21 (#14533)
Spotbugs was temporarily disabled as part of KAFKA-15485 to support Kafka build with JDK 21. This PR upgrades the spotbugs version to 4.8.0 which adds support for JDK 21 and enables it's usage on build again.

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-10-12 14:09:10 +02:00
Bruno Cadonna c7f730d9d9
MINOR: Only commit running active and standby tasks when tasks corrupted (#14508)
When tasks are found corrupted, Kafka Streams tries to commit
the non-corrupted tasks before closing and reviving the corrupted
active tasks. Besides active running tasks, Kafka Streams tries
to commit restoring active tasks and standby tasks. However,
restoring active tasks do not need to be committed since they
do not have offsets to commit and the current code does not
write a checkpoint. Furthermore, trying to commit restoring
active tasks with the state updater enabled results in the
following error:

java.lang.UnsupportedOperationException: This task is read-only
at org.apache.kafka.streams.processor.internals.ReadOnlyTask.commitNeeded(ReadOnlyTask.java:209)
...

since commitNeeded() is not a read-only method for active tasks.

In future, we can consider writing a checkpoint for active
restoring tasks in this situation. Additionally, we should
fix commitNeeded() in active tasks to be read-only.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2023-10-12 13:24:54 +02:00
Yash Mayya 41c0d44402
KAFKA-15570: Add unit tests for MemoryConfigBackingStore (#14518)
Reviewers: Chris Egerton <chrise@aiven.io>, Kalpesh Patel <kpatel@confluent.io>
2023-10-12 14:25:40 +05:30
Jeff Kim 7b5d640cc6
KAFKA-14987; Implement Group/Offset expiration in the new coordinator (#14467)
This patch implements the groups and offsets expiration in the new group coordinator.

Reviewers: Ritika Reddy <rreddy@confluent.io>, David Jacot <djacot@confluent.io>
2023-10-11 23:45:13 -07:00
Satish Duggana 4302653d9e
MINOR Updated documentation.html to have 3.5 and 3.6 previous release doc links (#14510)
Reviewers: Luke Chen <showuon@gmail.com>, kpatelatwork <kpatel@confluent.io>
2023-10-12 11:07:10 +05:30
Federico Valeri aec07f76d7
KAFKA-15537: Fix metadata downgrade documentation (#14484)
In KIP-778 we introduced the "unsafe" (lossy) downgrade in case metadata has changes in one of the versions between target and current, as defined in MetadataVersion.

The documentation says it is possible:

"Note that the cluster metadata version cannot be downgraded to a pre-production 3.0.x, 3.1.x, or 3.2.x version once it has been upgraded. However, it is possible to downgrade to production versions such as 3.3-IV0, 3.3-IV1, etc."

The command line tool shows that this doesn't work:

bin/kafka-features.sh --bootstrap-server :9092 downgrade --metadata 3.4 --unsafe
Could not downgrade metadata.version to 8. Invalid metadata.version 8. Unsafe metadata downgrade is not supported in this version.
1 out of 1 operation(s) failed.

In addition to unsafe, also safe metadata downgrades are not supported in practice. For example, when you upgrade to 3.5, you land on 3.5-IV2 as metadata version, which has metadata changes and won't let you to downgrade. This is true for every other release at the moment.

This change fixes the documentation to reflect that, and improves the error messages.

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

Reviewers: Luke Chen <showuon@gmail.com>, Jakub Scholz <github@scholzj.com>
2023-10-12 11:12:44 +08:00
Levani Kokhreidze 7d1847c4c3
MINOR: Fix KafkaStreams#streamThreadLeaveConsumerGroup logging (#14526)
Fixes logging for KafkaStreams#streamThreadLeaveConsumerGroup.

In order not to lose the trace of the whole exception, passing Exception e as a second argument, while message is pre-formatted and passed as string as a first argument. With this, we won't loose the stack trace of the exception.

Reviewers: Anna Sophie Blee-Goldman <sophie@responsive.dev>
2023-10-11 16:14:25 -07:00
Levani Kokhreidze 5dd155f350
KAFKA-15571: `StateRestoreListener#onRestoreSuspended` is never called because `DelegatingStateRestoreListener` doesn't implement `onRestoreSuspended` (#14519)
With https://issues.apache.org/jira/browse/KAFKA-10575 StateRestoreListener#onRestoreSuspended was added. But local tests show that it is never called because DelegatingStateRestoreListener was not updated to call a new method

Reviewers: Anna Sophie Blee-Goldman <sophie@responsive.dev>, Bruno Cadonna <cadonna@confluent.io>
2023-10-11 16:04:34 -07:00
dengziming 674285b86b
MINOR: Disable flaky kraft-test in FetchRequestTest (#14525)
We introduced a bunch of flaky tests in #14295 , which are normal when running locally but will always fail in CI, lets rollback them unless we find the cause before the end of today.

Reviewers: Luke Chen <showuon@gmail.com>, Justine Olshan <jolshan@confluent.io>
2023-10-11 15:22:10 -07:00
Calvin Liu d46781d4db
KAFKA-15221; Fix the race between fetch requests from a rebooted follower. (#14053)
A race can happen in the following sequence.
1. Stale Fetch triggers the ISR expansion.
2. The first time we check whether the replica is eligible. Catch up? Yes. broker epoch match? Yes (the metadata cache update has not happened)
3. Metadata cache update happens.
4. During the second time check the eligibility
    a. Catch up? Yes
    b. A new fetch request comes in. It cancels the replica caught-up and updates the broker epoch
    c. broker epoch match? Yes. New fetch epoch = new metadata cache epoch
5. Send an AlterPartition request with the new broker epoch.
----------------
The solution is to make sure that the 4.a) ,4.c) and 5) use the same replica state.

Reviewers: David Mao <47232755+splett2@users.noreply.github.com>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>
2023-10-11 09:44:36 -07:00
Mickael Maison cc66d1feee
MINOR: Add javadoc to all ConfigDef.Types values (#14515)
Reviewers: Josep Prat <josep.prat@aiven.io>
2023-10-11 18:13:00 +02:00
Arnout Engelen 1983ebebc7
MINOR: fix dependencycheck warnings (#14476)
Add suppressions and skip benchmarking/testing projects

Reviewers: Josep Prat <josep.prat@aiven.io>
2023-10-11 16:18:19 +02:00
Christo Lolov a0e3d01fef
KAFKA-14133: Move MeteredTimestampedKeyValueStoreTest, ReadOnlyWindowStoreFacadeTest and TimestampedWindowStoreBuilderTest to Mockito (#14412)
Reviewers: Divij Vaidya <diviv@amazon.com>, Yash Mayya <yash.mayya@gmail.com>
2023-10-11 11:12:31 +02:00