Commit Graph

5840 Commits

Author SHA1 Message Date
Nick Guo 101e15bb1c
KAFKA-18867 add tests to describe topic configs with empty name (#19075)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-04 14:56:25 +08:00
Mahsa Seifikar 2154e55abf
MINOR: Prevent broker fencing by adjusting resendExponentialBackoff in BrokerLifecycleManager (#19061)
This PR reduces `maxInterval` for `resendExponentialBackoff` in
`BrokerLifecycleManager` class from `broker.session.timeout.ms` to half
of its value. Setting `maxInterval` to `broker.session.timeout.ms`
caused brokers to be fenced if a resend attempt occurred near the
timeout threshold, leading to unnecessary broker fencing.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2025-03-03 12:03:15 -08:00
Apoorv Mittal a6c53d0c37
KAFKA-18878: Added share session cache and delayed share fetch metrics (KIP-1103) (#19059)
The PR implements the ShareSessionCache and DelayedShareFetchMetrics as
defined in KIP-1103.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-03-03 16:44:34 +00:00
Lucas Brutschy a04dd21f26
KAFKA-18613: Auto-creation of internal topics in streams group heartbeat (#18981)
Implements auto-topic creation when handling the streams group
heartbeat.

Inside KafkaApis, the handler for streamsGroupHeartbeat uses the result
of the streams group heartbeat inside the group coordinator to attempt
to create all missing internal topics using AutoTopicCreationManager.
CREATE TOPIC ACLs are checked. The unit tests class
AutoTopicCreationManagerTest is brought back (it was recently deleted
during a ZK removal PR), but testing only the kraft-related
functionality.

Reviewers: Bruno Cadonna <bruno@confluent.io>

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
2025-03-03 08:48:00 +01:00
Xuan-Zhang Gong ceac4f0a1d
KAFKA-18880 Remove kafka.cluster.Broker and BrokerEndPointNotAvailableException (#19047)
Remove kafka.cluster.Broker and BrokerEndPointNotAvailableException as they were used by zk path.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-02 10:54:32 +08:00
TengYao Chi e0c77140b2
KAFKA-17039 KIP-919 supports for unregisterBroker (#19063)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-01 23:55:35 +08:00
Nick Guo 98bb79e732
KAFKA-17981 add Integration test for ConfigCommand to add config `key=[val1,val2]` (#17771)
Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-01 13:15:25 +08:00
Apoorv Mittal 8cf969e00a
KAFKA-18734: Implemented share partition metrics (KIP-1103) (#19045)
The PR implements the SharePartitionMetrics as defined in KIP-1103, with
one change. The metric `FetchLockRatio` is defined as `Meter` in KIP but
is implemented as `HIstogram`. There was a discussion about same on
KIP-1103 discussion where we thought that `FetchLockRatio` is
pre-aggregated but while implemeting the rate from `Meter` can go above
100 as `Meter` defines rate per time period. Hence it makes more sense
to implement metric `FetchLockRatio` as `Histogram`.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-28 14:22:27 +00:00
Apoorv Mittal 8b605bd362
MINOR: Removing share partition manager flaky annotation (#19053)
There isn't any flaky test for SharePartitionManager in last 7 days, removing flaky annotation.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-28 08:49:59 +00:00
Dongnuo Lyu 36f19057e1
KAFKA-18813: ConsumerGroupHeartbeat API and ConsumerGroupDescribe API must check topic describe (#18989)
This patch filters out the topic describe unauthorized topics from the
ConsumerGroupHeartbeat and ConsumerGroupDescribe response.

In ConsumerGroupHeartbeat, 
- if the request has `subscribedTopicNames` set, we directly check the
authz in `KafkaApis` and return a topic auth failure in the response if
any of the topics is denied.
- Otherwise, we check the authz only if a regex refresh is triggered and
we do it based on the acl of the consumer that triggered the refresh. If
any of the topic is denied, we filter it out from the resolved
subscription.

In ConsumerGroupDescribe, we check the authz of the coordinator
response. If any of the topic in the group is denied, we remove the
described info and add a topic auth failure to the described group.
(similar to the group auth failure)

Reviewers: David Jacot <djacot@confluent.io>, Lianet Magrans
<lmagrans@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>,
Chia-Ping Tsai <chia7712@gmail.com>, TaiJuWu <tjwu1217@gmail.com>,
TengYao Chi <kitingiao@gmail.com>
2025-02-26 13:05:36 -05:00
Lucas Brutschy cb7c54ccd3
KAFKA-18614, KAFKA-18613: Add streams group request plumbing (#18979)
This change implements the basic RPC handling StreamsGroupHeartbeat and
StreamsGroupDescribe. This includes:
 - Adding an option to enable streams groups on the broker
- Passing describe and heartbeats to the right shard of the group
coordinator
- The handler inside the GroupMetadatManager for StreamsGroupDescribe is
fairly trivial, and is included directly in this PR.
- The handler for StreamsGroupHeartbeat is complex and not included in
this PR yet. Instead, a UnsupportedOperationException is thrown.
However, the interface is already defined: The result of a
streamsGroupHeartbeat is a response, together with a list of internal
topics to be created.

The heartbeat implementation inside the `GroupMetadataManager`, which
actually implements the assignment / reconciliation logic, will come in
a follow-up PR. Also, automatic creation of internal topics will be
created in a follow-up PR.

Reviewers: Bill Bejeck <bill@confluent.io>
2025-02-26 16:33:26 +01:00
Abhinav Dixit 4b5a16bf6f
KAFKA-18757: Create full-function SimpleAssignor to match KIP-932 description (#18864)
### About
The current `SimpleAssignor` in AK assigned all subscribed topic
partitions to all the share group members. This does not match the
description given in
[KIP-932](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255070434#KIP932:QueuesforKafka-TheSimpleAssignor).
Here are the rules as mentioned in the KIP by which the assignment
should happen. We have changed the step 3 implementation here due to the
reasons
[described](https://github.com/apache/kafka/pull/18864#issuecomment-2659266502)
-

1. The assignor hashes the member IDs of the members and maps the
partitions assigned to the members based on the hash. This gives
approximately even balance.
2. If any partitions were not assigned any members by (1) and do not
have members already assigned in the current assignment, members are
assigned round-robin until each partition has at least one member
assigned to it.
3. We combine the current and new assignment. (Original rule - If any
partitions were assigned members by (1) and also have members in the
current assignment assigned by (2), the members assigned by (2) are
removed.)

### Tests
The added code has been verified with unit tests and the already present
integration tests.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, TaiJuWu <tjwu1217@gmail.com>
2025-02-26 11:02:23 +00:00
José Armando García Sancio 4a8a0637e0
KAFKA-18723; Better handle invalid records during replication (#18852)
For the KRaft implementation there is a race between the network thread,
which read bytes in the log segments, and the KRaft driver thread, which
truncates the log and appends records to the log. This race can cause
the network thread to send corrupted records or inconsistent records.
The corrupted records case is handle by catching and logging the
CorruptRecordException. The inconsistent records case is handle by only
appending record batches who's partition leader epoch is less than or
equal to the fetching replica's epoch and the epoch didn't change
between the request and response.

For the ISR implementation there is also a race between the network
thread and the replica fetcher thread, which truncates the log and
appends records to the log. This race can cause the network thread send
corrupted records or inconsistent records. The replica fetcher thread
already handles the corrupted record case. The inconsistent records case
is handle by only appending record batches who's partition leader epoch
is less than or equal to the leader epoch in the FETCH request.

Reviewers: Jun Rao <junrao@apache.org>, Alyssa Huang <ahuang@confluent.io>, Chia-Ping Tsai <chia7712@apache.org>
2025-02-25 20:09:19 -05:00
Apoorv Mittal df5839a9f4
KAFKA-17351: Improved handling of compacted topics in share partition (2/N) (#19010)
The PR handles fetch for `compacted` topics. The fix was required only
when complete batch disappears from the topic log, and same batch is
marked re-available in Share Partition state cache. Subsequent log reads
will not result the disappeared batch in read response hence respective
batch will be left as available in the state cache.

The PR checks for the first fetched/read batch base offset and if it's
greater than the position from where the read occurred (fetch offset)
then if there exists any `available` batches in the state cache then
they will be archived.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>
2025-02-25 14:11:39 +00:00
xijiu 1edc30bf30
KAFKA-17836 Move RackAwareTest to server module (#19021)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-25 18:15:34 +08:00
TaiJuWu 1c82b89b4c
KAFKA-18712 Move Endpoint to server module (#18803)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Mickael Maison <mickael.maison@gmail.com>, Christo Lolov <lolovc@amazon.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-25 14:02:51 +08:00
PoAn Yang 10873e4210
KAFKA-18281: Kafka is improperly validating non-advertised listeners for routable controller addresses (#18387)
When a cluster is configured with a dynamic controller quorum, KRaft replica's endpoint are computed using the advertised.listeners property and not the quorum.controller.voters property. This change in the configuration makes it difficult to keeping all previous node configurations compatible with the new endpoint discovery functionality.

The least intrusive solution is to rely on Kafka's reverse hostname lookup when the hostname is not specified. The effective advertised controller listener now remove '0.0.0.0' hostname if the endpoint came from the listener configuration and not the advertised.listener configuration.

Reviewers: José Armando García Sancio <jsancio@apache.org>, Alyssa Huang <ahuang@confluent.io>
2025-02-24 21:51:28 -05:00
Nick Guo d23a61738a
KAFKA-17937 Cleanup AbstractFetcherThreadTest (#18900)
- Remove AbstractFetcherThreadWithIbp26Test as it tests unsupported IBP
- cleanup AbstractFetcherThreadTest to remove unreachable paths, variables, and code

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-25 07:45:47 +08:00
Apoorv Mittal 48a506b7b8
KAFKA-18522: Slice records for share fetch (#18804)
The PR handles slicing of fetched records based on acquire response for
share fetch. There could be additional bytes fetched from log but
acquired offsets can be a subset, typically with `max fetch records`
configuration. Rather sending additional bytes of fetched data to client
we should slice the file and wire only needed batches.

Note: If the acquired offsets are within a batch then we need to send
the entire batch within the file record. Hence rather checking for
individual batches, PR finds the first and last acquired offset, and
trims the file for all batches between (inclusive) these two offsets.

Reviewers: Christo Lolov <lolovc@amazon.com>, Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>
2025-02-24 09:55:24 -08:00
mingdaoy 289e958c39
MINOR: Fix validateResourceNameIsNodeId's exception message (#19017)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-24 09:30:02 +08:00
Ismael Juma 13cb87c2d0
MINOR: Remove request log space added inadvertently (#19011)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-23 11:30:19 +08:00
Apoorv Mittal 6e45ab7d84
KAFKA-17351: Update tests and acquire API to allow discard batches from compacted topics (1/N) (#18978)
The PR does following:
1.  Adds `fetchOffset` to `acquire` API in SharePartition.
2. Adds a ShareFetchPartitionData class efficiently handle the
propagation of fetchOffset information.
3. Updates SharePartitionTests to make common code so such improvements
does not require all tests changes for future PRs.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-22 16:14:09 +00:00
Sushant Mahajan 4f28973bd1
KAFKA-18827: Initialize share state, share coordinator impl. [1/N] (#18968)
In this PR, we have added the share coordinator and KafkaApis side impl
of the intialize share group state RPC.
ref:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka#KIP932:QueuesforKafka-InitializeShareGroupStateAPI

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-22 16:12:08 +00:00
Apoorv Mittal f543eac4fe
KAFKA-18733: Implemented fetch ratio and partition acquire time metrics (3/N) (#18959)
PR implements the final set of ShareGroupMetrics,
RequestTopicPartitionsFetchRatio and TopicPartitionsAcquireTimeMs, as
defined in KIP-1103:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption

Note: Metric `RequestTopicPartitionsFetchRatio` is calculated as
percentage as Histogram API doesn't record double.


Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>
2025-02-21 17:01:39 +00:00
Calvin Liu 8f13e7c207
MINOR: Move the ELR default version to 4.1 (#18954)
- ELR is enabled (ELRV_1) by default if the cluster is created with its bootstrap metadata version >= IBP_4_1_IV0.
- ELRV_1 can be manually enabled iff the metadata version is >= IBP_4_0_IV1.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin P. McCabe <cmccabe@apache.org>, David Jacot <djacot@confluent.io>
2025-02-21 16:13:11 +01:00
Shivsundar R 7da1a6cbff
KAFKA-18033: Remove flaky tag in ShareConsumerTest (#18995)
3 tests which were marked flaky in ShareConsumerTest do not have any
failure on trunk since the test was converted to use `ClusterTestExtensions`.

Reviewers: Sushant Mahajan <smahajan@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-02-21 13:50:08 +00:00
TengYao Chi 767a62ade6
KAFKA-18737 KafkaDockerWrapper setup functions fails due to storage format command (#18844)
The current Docker Hub documentation for Kafka is based on the use of static voters. Since Kafka 4.0 utilizes dynamic voters, users following the doc of docker hub may encounter unexpected behavior. Due to the limited time available for the 4.0.0 release, a simple and quick solution is to revert to using static voters within the Docker image. This can be achieved by adding a configuration file with static voter definitions to the kafka/docker folder, keeping it separate from the main kafka/config directory. This approach allows us to encourage the use of dynamic voters in typical deployments while maintaining compatibility within the Docker image.

Reviewers: Vedarth Sharma <142404391+VedarthConfluent@users.noreply.github.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-21 20:43:41 +08:00
TengYao Chi d31cbf59de
KAFKA-18831 Migrating to log4j2 introduce behavior changes of adjusting level dynamically (#18969)
fix the following behavior changes.

1) in log4j 1, users can't change the logger by parent if the logger is declared by properties explicitly. For example, `org.apache.kafka.controller` has level explicitly in the properties. Hence, we can't use "org.apache.kafka=INFO" to change the level of `org.apache.kafka.controller` to INFO. By contrast, log4j2 allows us to change all child loggers by the parent logger.

2) in log4j2, we can change the level of root to impact all loggers' level. By contrast, log4j 1 can't. 

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-21 16:12:58 +08:00
Calvin Liu 1eecd02ce8
MINOR: Deflake EligibleLeaderReplicasIntegrationTest (#18923)
Make sure to give enough time for the partition ISR updates.

Reviewers: David Jacot <djacot@confluent.io>
2025-02-20 05:14:15 -08:00
Matthias J. Sax 538a60e1b3
MINOR: disallow rawtypes and fail build (#18877)
Cleanup code to avoid rawtype, and add suppressions where necessary.
Change the build to fail on rawtype warning.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-02-19 13:11:49 -08:00
Ismael Juma 3a59a526d9
MIINOR: Remove redundant quorum parameter from *AdminIntegrationTest classes (#18965)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-02-19 15:57:47 -05:00
Shivsundar R 3603c8fe35
KAFKA-18829: Added check before converting to IMPLICIT mode (#18964)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-19 17:34:28 +00:00
Ismael Juma 3dba3125e9
KAFKA-18601: Assume a baseline of 3.3 for server protocol versions (#18845)
3.3.0 was the first KRaft release that was deemed production-ready and also
when KIP-778 (KRaft to KRaft upgrades) landed. Given that, it's reasonable
for 4.x to only support upgrades from 3.3.0 or newer (the metadata version also
needs to be set to "3.3" or newer before upgrading).

Noteworthy changes:
1. `AlterPartition` no longer includes topic names, which makes it possible to
simplify `AlterParitionManager` logic.
2. Metadata versions older than `IBP_3_3_IV3` have been removed and
`IBP_3_3_IV3` is now the minimum version.
3. `MINIMUM_BOOTSTRAP_VERSION` has been removed.
4. Removed `isLeaderRecoverySupported`, `isNoOpsRecordSupported`,
`isKRaftSupported`, `isBrokerRegistrationChangeRecordSupported` and
`isInControlledShutdownStateSupported` - these are always `true` now.
Also removed related conditional code.
5. Removed default metadata version or metadata version fallbacks in
multiple places - we now fail-fast instead of potentially using an incorrect
metadata version.
6. Update `MetadataBatchLoader.resetToImage` to set `hasSeenRecord`
based on whether image is empty - this was a previously existing issue that
became more apparent after the changes in this PR.
7. Remove `ibp` parameter from `BootstrapDirectory`
8. A number of tests were not useful anymore and have been removed.

I will update the upgrade notes via a separate PR as there are a few things that
need changing and it would be easier to do so that way.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Justine Olshan <jolshan@confluen.io>, Ken Huang <s7133700@gmail.com>
2025-02-19 05:35:42 -08:00
xijiu 4c4458c17a
KAFKA-18799 Remove AdminUtils (#18946)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-19 06:25:43 +08:00
PoAn Yang 1132f08c57
KAFKA-18773 Migrate the log4j1 config to log4j 2 for native image and README (#18872)
- update reflection-config.json and resource-config.json to include log4j2 and jackson
- remove unused jackson scala library
- fix the incorrect path of log4j2.yaml
- adopt workaround (--standalone) to make this PR work and it will be fixed by KAFKA-18737)

Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-19 00:48:46 +08:00
TaiJuWu 934b0159bb
KAFKA-18089: Upgrade Caffeine lib to 3.1.8 (#18004)
- Fixed the RemoteIndexCacheTest that fails with caffeine > 3.1.1

Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-02-18 21:51:38 +05:30
Parker Chang ed366e6b89
MINOR: Align assertFutureThrows method signature with JUnit conventions (#18825)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-02-18 15:56:42 +00:00
Mickael Maison 0a2fab9310
KAFKA-14484: Decouple UnifiedLog and RemoteLogManager (#18460)
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2025-02-18 15:10:31 +01:00
Andrew Schofield 6c14f64245
MINOR: Rename NoOpShareStatePersister for consistency (#18933)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-18 14:07:59 +00:00
Chirag Wadhwa 63229a768c
KAFKA-16718 [1/n]: Added DeleteShareGroupOffsets request and response schema (#18927)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-18 14:06:24 +00:00
Andrew Schofield 385b7ad355
MINOR: Align share group admin authz with consumer group (#18936)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2025-02-18 09:12:07 +00:00
Kamal Chandraprakash da3643c6b4
KAFKA-18787: RemoteIndexCache fails to delete invalid files on init (#18888)
The stale/invalid files that ends-with ".deleted" and ".tmp" should be cleaned when the broker gets restarted.

- fix the remote-index-cache test to use the logDir instead of topicDir
- fix the flaky test

Reviewers: Luke Chen <showuon@gmail.com>
2025-02-18 12:56:03 +05:30
Apoorv Mittal 06ce3e890b
KAFKA-18733: Updating share group record acks metric (2/N) (#18924)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-17 18:12:58 +00:00
PoAn Yang 2b6e868538
KAFKA-18784 Fix ConsumerWithLegacyMessageFormatIntegrationTest (#18889)
In PR #18267, we removed old message format for cases in ConsumerWithLegacyMessageFormatIntegrationTest. Although test cases can pass, they don't fulfill original purpose. We can't send old message format since 4.0, so I change cases to append old records by ReplicaManager directly.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-17 20:43:29 +08:00
Andrew Schofield 9b7ad6ec32
MINOR: Mark testQuotaOverrideDelete as flaky (#18925)
Reviewers: poorv Mittal <apoorvmittal10@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-17 15:20:35 +08:00
TengYao Chi 5cbe00e375
MINOR: Remove unused member in DynamicBrokerConfig (#18915)
Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-17 04:46:25 +08:00
Ming-Yen Chung e828767062
KAFKA-18790 Fix testCustomQuotaCallback (#18906)
Frequently updating the trust store can cause unexpected termination of the AsyncConsumer background thread.

1. To resolve this issue, reuse the same AdminClient instead of recreating it.
2. Add error logging when fail to initialize resources for the consumer network thread.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-15 03:07:59 +08:00
Jimmy Wang 6a6b80215d
KAFKA-16717 [1/2]: Add AdminClient.alterShareGroupOffsets (#18819)
KAFKA-16720 aims to add the support for the AlterShareGroupOffsets AdminClient. Key Changes in the PR:

1. Added handing of alterShareGroupOffsets() in KafkaAdminClient and introduce AlterShareGroupOffsetRequest/AlterShareGroupOffsetResponse/AlterShareGroupOffsetsOptions classes.
2. Corresponding test in KafkaAdminClientTest.
3. Added ALTER_SHARE_GROUP_OFFSETS API (will finish it in next PR and the share coordinator pieces)

Reviewers: poorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-15 02:35:46 +08:00
Apoorv Mittal 53543bcf63
KAFKA-18733: Updating share group metrics (1/N) (#18826)
Reviewers: Sushant Mahajan <smahajan@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2025-02-14 08:48:41 +00:00
陳昱霖(Yu-Lin Chen) 2bbd25841e
KAFKA-18298 Fix flaky testConsumerGroupsDeprecatedConsumerGroupState and testConsumerGroups in PlaintextAdminIntegrationTest (#18513)
It's related to KAFKA-18298 and KAFKA-18297. The root cause of the flaky tests is member rejoin after member removal. To prevent members from rejoining after being removed, before removing group members, calling `consumers.close` in ConsumerThread . This fix also extract the flaky member removal test  to new test `testConsumerGroupWithMemberRemoval`.

Flow of member removal test: 
1. Set 2 static consumer + 1 dynamic consumer
2. Close all consumers.
3. remove one static member
4. remove remaining members
 
Before KIP-1092, the member count is different between ClassicConsumer/AsyncConsumer. (AsyncConsumer will remove dynamic member after consumer closed.)

To get more details, please refer to the discussion under KAFKA-18297 and this PR:
- discussion : [Link](https://issues.apache.org/jira/browse/KAFKA-18297?focusedCommentId=17912537&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17912537)
- review: https://github.com/apache/kafka/pull/18513#pullrequestreview-2589110367

This PR fixed below flaky errors:

1. **PlaintextAdminIntegrationTest#testConsumerGroups**
  a.  `org.opentest4j.AssertionFailedError: expected: <2> but was: <3>` ([Report](https://ge.apache.org/s/lt3lpviv45cns/tests/task/:core:test/details/kafka.api.PlaintextAdminIntegrationTest/testConsumerGroups(String%2C%20String)%5B1%5D?top-execution=1))
  b.  `org.opentest4j.AssertionFailedError: expected: <true> but was: <false>` ([Report](https://ge.apache.org/s/jlxo446xalpoa/tests/task/:core:test/details/kafka.api.PlaintextAdminIntegrationTest/testConsumerGroups(String%2C%20String)%5B1%5D?top-execution=1))

2. **PlaintextAdminIntegrationTest#testConsumerGroupsDeprecatedConsumerGroupState**
  a.  `org.opentest4j.AssertionFailedError: expected: <2> but was: <3>` ([Report](https://ge.apache.org/s/ndoj6s2stb446/tests/task/:core:test/details/kafka.api.PlaintextAdminIntegrationTest/testConsumerGroupsDeprecatedConsumerGroupState(String%2C%20String)%5B1%5D?top-execution=1))
  b. `org.opentest4j.AssertionFailedError: expected: <true> but was: <false>` ([Report](https://ge.apache.org/s/kh3jze2tc5qeu/tests/task/:core:test/details/kafka.api.PlaintextAdminIntegrationTest/testConsumerGroupsDeprecatedConsumerGroupState(String%2C%20String)%5B1%5D?top-execution=1))

Reviewers: David Jacot <djacot@confluent.io>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-14 07:28:45 +08:00
Andrew Schofield 952113e8e0
KAFKA-16720: Support multiple groups in DescribeShareGroupOffsets RPC (#18834)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
2025-02-13 18:27:05 +00:00
Calvin Liu 9cb271f1e1
KAFKA-18654[2/2]: Transction V2 retry add partitions on the server side when handling produce request. (#18810)
During the transaction commit phase, it is normal to hit CONCURRENT_TRANSACTION error before the transaction markers are fully propagated. Instead of letting the client to retry the produce request, it is better to retry on the server side.

Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>
2025-02-13 09:30:58 -08:00
Apoorv Mittal a13d815a0d
MINOR: Updated share partition manager tests to close and other fixes (#18862)
Reviewers: Abhinav Dixit <adixit@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2025-02-13 13:37:37 +00:00
Ken Huang 9494bebee6
KAFKA-18728 Move ListOffsetsPartitionStatus to server module (#18807)
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-02-13 10:36:46 +05:30
Jhen-Yung Hsu b0e5cdfc57
KAFKA-18777 add `PartitionsWithLateTransactionsCount` to BrokerMetricNamesTest (#18869)
Rewrite BrokerMetricNamesTest using ReplicaManager.MetricNames, ensuring that all metrics are always included. This helps prevent issues like PartitionsWithLateTransactionsCount not being correctly included in the test before.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-12 22:09:42 +08:00
PoAn Yang 63fc9b3cb8
KAFKA-18771: fix Flaky test KRaftClusterTest .testDescribeQuorumRequestToControllers (#18859)
The case testDescribeQuorumRequestToControllers shutdowns raft client but not the controller. This makes client has chance to send a request to the controller and get NOT_LEADER_OR_FOLLOWER error. However, if the raft client finishes shutdown before handling the request, the request will not be handled. Shutdown the controller before doing KafkaFuture#get for the client request, so we can make sure the request is handled by another controller eventually.

Signed-off-by: PoAn Yang <payang@apache.org>

Reviewers: Luke Chen <showuon@gmail.com>
2025-02-12 16:16:43 +08:00
Justine Olshan 400363b7e2
KAFKA-18035: TransactionsTest testBumpTransactionalEpochWithTV2Disabled failed on trunk (#18451)
Sometimes we didn't get into abortable state before aborting, so the epoch didn't get bumped. Now we force abortable state with an attempt to send before aborting so the epoch bump occurs as expected.

Reviewers: Jeff Kim <jeff.kim@confluent.io>
2025-02-11 14:01:43 -08:00
Edoardo Comar 7e405ccc65
KAFKA-18758: NullPointerException in shutdown following InvalidConfigurationException (#18833)
* KAFKA-18758:  NullPointerException in shutdown following InvalidConfigurationException

Add checks for null in shutdown as BrokerLifecycleManager is not instantiaited if LogManager constructor throws an Exception
2025-02-11 10:06:55 +00:00
Sushant Mahajan 675a0889de
KAFKA-18764: Throttle on share state RPCs auth failure. (#18855)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-11 09:54:24 +00:00
Mickael Maison ece91e9247
KAFKA-14484: Move UnifiedLog static methods to storage (#18039)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-11 09:55:32 +01:00
TengYao Chi f5dd661cb5
KAFKA-18396: Migrate log4j1 configuration to log4j2 in KafkaDockerWrapper (#18394)
After log4j migration, we need to update the logging configuration in KafkaDockerWrapper from log4j1 to log4j2.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2025-02-11 13:25:23 +05:30
TaiJuWu 9fc7500684
KAFKA-18770 close the RM created by testDelayedShareFetchPurgatoryOperationExpiration (#18853)
it's crucial to utilize a try-finally block to ensure proper closure of the ReplicaManager. Failing to do so can result in an unreleased thread from the purgatory, potentially leading to errors in subsequent integration tests that incorporate thread leak detection.

Reviewers: poorv Mittal <apoorvmittal10@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-11 07:35:13 +08:00
Ken Huang 581e94840f
KAFKA-18366 Remove KafkaConfig.interBrokerProtocolVersion (#18820)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-11 06:18:02 +08:00
Jhen-Yung Hsu 4e36368d08
KAFKA-18743 Remove leader.imbalance.per.broker.percentage as it is not supported by Kraft (#18821)
Remove `leader.imbalance.per.broker.percentage` from config.
Add `leader.imbalance.per.broker.percentage` to release note

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-11 04:01:57 +08:00
Ken Huang 70adf746c4
KAFKA-18225 ClientQuotaCallback#updateClusterMetadata is unsupported by kraft (#18196)
This commit ensures that the ClientQuotaCallback#updateClusterMetadata method is executed in KRaft mode. This method is triggered whenever a topic or cluster metadata change occurs. However, in KRaft mode, the current implementation of the updateClusterMetadata API is inefficient due to the requirement of creating a full Cluster object. To address this, a follow-up issue (KAFKA-18239) has been created to explore more efficient mechanisms for providing cluster information to the ClientQuotaCallback without incurring the overhead of a full Cluster object creation.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-11 01:03:02 +08:00
PoAn Yang b22c7d5b5c
KAFKA-17833: Convert DescribeAuthorizedOperationsTest to use KRaft (#18252)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-02-07 15:44:27 +01:00
Piotr P. Karwasz 666571216b
KAFKA-18483 Disable `Log4jController` and `Loggers` if Log4j Core absent (#18496)
If Log4j Core is absent, most calls to Log4jController and Loggers will end up with a NoClassDefFoundError.

This changeset:

- Profits from the major version bump to rename k.util.Log4jController to LoggingController.
- Removes o.a.l.l.Level from the signature of public methods of o.a.k.connect.runtime.Loggers and replaces it with String.
- Provides an additional no-op implementation of k.util.LoggingController and o.a.k.connect.runtime.Loggers: if Log4j Core is not present on the runtime classpath the no-op implementation will be used.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-07 00:04:33 +08:00
Colin Patrick McCabe b2b2408692
KAFKA-18360 Remove zookeeper configurations (#18566)
Remove broker.id.generation.enable and reserved.broker.max.id, which are not used in KRaft mode.
Remove inter.broker.protocol.version, which is not used in KRaft mode.

Reviewers: PoAn Yang <payang@apache.org>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-06 22:22:11 +08:00
Ken Huang a3d9d881e1
KAFKA-18530 Remove ZooKeeperInternals (#18641)
Since zk has been removed in 4.0, config handlers no longer need to handle the "<default>" value. This PR streamlines the config update process by eliminating the unnecessary string checks for "<default>"

Reviewers: Christo Lolov <lolovc@amazon.com>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-06 17:48:17 +08:00
Ming-Yen Chung 34e7136b7a
MINOR: Fix wrong config property in KafkaConfigTest (#18815)
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-06 17:09:52 +08:00
Kuan-Po Tseng b99be961b8
KAFKA-18206: EmbeddedKafkaCluster must set features (#18189)
related to KAFKA-18206, set features in EmbeddedKafkaCluster in both streams and connect module, note that this PR also fix potential transaction with empty records in sendPrivileged method as transaction version 2 doesn't allow this kind of scenario.

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-02-05 09:14:36 -08:00
Chirag Wadhwa 01587d09d8
KAFKA-18494-3: solution for the bug relating to gaps in the share partition cachedStates post initialization (#18696)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Abhinav Dixit <adixit@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2025-02-05 15:16:25 +00:00
Sanskar Jhajharia 7dbed2f6e8
[KAFKA-16720] AdminClient Support for ListShareGroupOffsets (2/2) (#18671)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Sushant Mahajan <smahajan@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2025-02-05 14:38:09 +00:00
TengYao Chi 66363160c5
KAFKA-18645: New consumer should align close timeout handling with classic consumer (#18702)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-05 09:08:51 -05:00
PoAn Yang 21645ebf0b
KAFKA-18705: Move ConfigRepository to metadata module (#18784)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Christo Lolov <lolovc@amazon.com>
2025-02-05 10:13:36 +00:00
Justine Olshan 00dddee347
MINOR: Add missing test tag to UnifiedLogTest.scala (#18794)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-04 13:56:14 -08:00
Sean Quah 42e7cbb67e
KAFKA-18690: Keep leader metadata for RE2J-assigned partitions (#18777)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-02-04 13:22:28 -05:00
Justine Olshan 822b8ab3d7
KAFKA-18691: Flaky test testFencingOnTransactionExpiration (#18793)
It appears this test was failing because the transaction was never aborting and the concurrent transactions errors would not go away.

ccab9eb introduced the test failure because it requires the transaction to complete, but I suspect the lack of completion was happening before the change.

The timeout for the write is based on the transactional timeout, and 100ms seemed too small -- thus the requests to update the state would often repeatedly time out.

Also removed the loop since it was not necessary.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Calvin Liu <caliu@confluent.io>
2025-02-04 08:45:34 -08:00
Luke Chen 612e1299e4
KAFKA-18230: Handle not controller or not leader error in admin client (#18165)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-04 16:51:24 +01:00
Calvin Liu ad031b99d3
KAFKA-18635: reenable the unclean shutdown detection (#18277)
We need to re-enable the unclean shutdown detection when in ELR mode, which was inadvertently removed during the development process.

Reviewers: David Mao <dmao@confluent.io>,  Jun Rao <junrao@gmail.com>
2025-02-03 22:26:57 -08:00
Ming-Yen Chung 9f78771a1f
KAFKA-18693 Remove PasswordEncoder (#18790)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-04 13:18:41 +08:00
Justine Olshan ab8ef87c7f
KAFKA-18654 [1/2]: Transaction Version 2 performance regression due to early return (#18720)
https://issues.apache.org/jira/browse/KAFKA-18575 solved a critical race condition by returning with CONCURRENT_TRANSACTIONS early when the transaction was still completing.
In testing, it was discovered that this early return could cause performance regressions.

Prior to KIP-890 the addpartitions call was a separate call from the producer. There was a previous change https://issues.apache.org/jira/browse/KAFKA-5477 that decreased the retry backoff to 20ms. With KIP-890 and making the call through the produce path, we go back to the default retry backoff which takes longer. Prior to 18575 we introduce a slight delay when sending to the coordinator, so prior to 18575, we are less likely to return quickly and get stuck in this backoff. However, based on results from produce benchmarks, we can still run into the default backoff in some scenarios.

This PR reverts KAFKA-18575, and doesn't return early and wait until the coordinator for checking if a transaction is ongoing. Instead, it will fix the handling with the verification guard so we don't hit the edge condition.

Also cleans up some of the verification text that was unclear.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Artem Livshits <alivshits@confluent.io>
2025-02-03 15:24:34 -08:00
Ken Huang 272d947f96
KAFKA-18545: Remove Zookeeper logic from LogManager (#18592)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Mickael Maison <mickael.maison@gmail.com>
2025-02-03 17:16:35 +00:00
Ken Huang 7fdd11295c
KAFKA-18685: Cleanup DynamicLogConfig constructor (#18764)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Christo Lolov <lolovc@amazon.com>
2025-02-03 15:38:05 +00:00
PoAn Yang f6f41dc5eb
KAFKA-17631 Convert SaslApiVersionsRequestTest to kraft (#18330)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-03 21:01:38 +08:00
Jhen-Yung Hsu 9ba2621620
MINOR: Remove the test for ZooKeeper metrics used by ZooKeeperClient (#18775)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-03 20:06:01 +08:00
David Jacot bf05d2c914
KAFKA-18672; CoordinatorRecordSerde must validate value version (#18749)
CoordinatorRecordSerde does not validate the version of the value to check whether the version is supported by the current version of the software. This is problematic if a future and unsupported version of the record is read by an older version of the software because it would misinterpret the bytes. Hence CoordinatorRecordSerde must throw an error if the version is unknown. This is also consistent with the handling in the old coordinator.

Reviewers: Jeff Kim <jeff.kim@confluent.io>
2025-02-03 02:19:27 -08:00
Ismael Juma 78aff4fede
KAFKA-18659: librdkafka compressed produce fails unless api versions returns produce v0 (#18727)
Return produce v0-v2 as supported versions in `ApiVersionsResponse`, but disable support
for it everywhere else.

Since clients pick the highest supported version by both client and broker during version
negotiation, this solves the problem with minimal tech debt (even though it's not ideal that
`ApiVersionsResponse` becomes inconsistent with the actual protocol support).

Add one test for the socket server handling (in `ProcessorTest`) and one test for the
client behavior (in `ProduceRequestTest`). Adjust a couple of api versions tests to verify
the new behavior.

Finally, include a few clean-ups in `ApiKeys`, `Protocol`, `ProduceRequest`,
`ProduceRequestTest` and `BrokerApiVersionsCommandTest`.

Reference to related librdkafka issue:
https://github.com/confluentinc/librdkafka/issues/4956

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
2025-02-01 16:08:54 -08:00
kevin-wu24 184b891871
KAFKA-16524; Metrics for KIP-853 (#18304)
This change implement some of the metrics enumerated in KIP-853.

The KafkaRaftMetrics object now exposes number-of-voters, number-of-observers and uncommitted-voter-change. The number-of-observers and uncommitted-voter-change metrics are only present on the active controller or leader, since it does not make sense for other replicas to report these metrics.

In order to make these two metrics thread-safe, KafkaRaftMetrics needs to be passed into LeaderState, and therefore QuorumState. This introduces a circularity since the KafkaRaftMetrics constructor takes in QuorumState. To break the circularity for now, the logic using QuorumState will be moved to the KafkaRaftMetrics#initialize method.

The BrokerServerMetrics object now exposes ignored-static-voters. The ControllerServerMetrics object now exposes IgnoredStaticVoters. To implement both metrics for "ignored static voters", this PR introduces the ExternalKRaftMetrics interface, which allows for higher layer metrics objects to be accessible within the raft module.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2025-01-30 18:35:01 -05:00
Justine Olshan ccab9eb8b4
KAFKA-18660: Transactions Version 2 doesn't handle epoch overflow correctly (#18730)
Fixed the typo that used the wrong producer ID and epoch when returning so that we handle epoch overflow correctly.

We also had to rearrange the concurrent transaction handling so that we don't self-fence when we start the new transaction with the new producer ID.

I also tested this with a modified version of the code where epoch overflow happens on the first epoch bump (every request has a new producer id)

Reviewers: Artem Livshits <alivshits@confluent.io>, Jeff Kim <jeff.kim@confluent.io>
2025-01-30 13:42:10 -08:00
Ken Huang 4b29fd6383
KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18548)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>
2025-01-30 11:22:54 -05:00
Sushant Mahajan be96807ac8
MINOR: Refactor share coord cache helper to share package. (#18743)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-30 13:33:42 +00:00
TengYao Chi 9dd73d43b0
KAFKA-18569: New consumer close may wait on unneeded FindCoordinator (#18590)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-29 14:15:56 -05:00
PoAn Yang 4dd0bcbde8
KAFKA-18383 Remove reserved.broker.max.id and broker.id.generation.enable (#18478)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-30 02:55:09 +08:00
Calvin Liu a3b34c1315
KAFKA-18662: Return CONCURRENT_TRANSACTIONS on produce request in TV2 (#18733)
While testing, it was found that the not_enough_replicas error was super common and could be easily confused. Since we are already bumping the request, we can signify that the produce request may return this error and new clients can handle it 

(Note, the java client should be able to handle this already as a retriable error, but other client libraries may need to implement this change)

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-01-29 10:15:48 -08:00
Sushant Mahajan 632aedcf4f
KAFKA-18632: Multibroker test improvements. (#18718)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-29 17:03:43 +00:00
Abhinav Dixit dd1f2b8aab
KAFKA-18653: Fix mocks and potential thread leak issues causing silent RejectedExecutionException in share group broker tests (#18725)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-29 16:24:30 +00:00
Ismael Juma ca5d2cf76d
KAFKA-18646: Null records in fetch response breaks librdkafka (#18726)
Ensure we always return empty records (including cases where an error is returned).
We also remove `nullable` from `records` since it is effectively expected to be
non-null by a large percentage of clients in the wild.

This behavior regressed in fe56fc9 (KAFKA-18269). Empty records were
previously set via `FetchResponse.recordsOrFail(partitionData)` in the
now-removed `maybeConvertFetchedData` method.

Added an integration test that fails without this fix and also update many
tests to set `records` to `empty` instead of leaving them as `null`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>
2025-01-29 07:04:12 -08:00
TengYao Chi 97a228070e
KAFKA-18619: New consumer topic metadata events should set requireMetadata flag (#18668)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-01-29 08:36:05 -05:00
Ismael Juma e6d72c9e60
KAFKA-18648: Add back support for metadata version 0-3 (#18716)
During testing, we identified that kafka-python (and aiokafka) relies on metadata request v0 and
hence we need to add these back to comply with the premise of KIP-896 - i.e. it should not
break the clients listed within it.

I reverted the changes from #18218 related to the removal of metadata versions 0-3.

I will submit a separate PR to undeprecate these API versions on the relevant 3.x branches.

kafka-python (and aiokafka) work correctly (produce & consume) with this change on
top of the 4.0 branch.

Reviewers: David Arthur <mumrah@gmail.com>
2025-01-28 18:35:33 -08:00
Apoorv Mittal c7619ef8d1
KAFKA-17951: Share parition rotate strategy (#18651)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>
2025-01-28 11:44:48 +00:00
Sushant Mahajan f32932cc25
KAFKA-18629: Delete share group state impl [1/N] (#18712)
Reviewers: Christo Lolov <lolovc@amazon.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-28 11:43:01 +00:00
Ken Huang 5631be20a6
MINOR: Remove ZooKeeper mentions in comments (#18646)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-28 12:35:46 +01:00
Apoorv Mittal 04567cdb22
KAFKA-18657: Fixing SharePartitionManager flaky test (#18710)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-28 08:06:58 +00:00
TaiJuWu e89b30d14e
KAFKA-18528: MultipleListenersWithSameSecurityProtocolBaseTest and GssapiAuthenticationTest should run for async consumer (#18555)
Reviewers: Kirk True <ktrue@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2025-01-27 15:49:44 -05:00
Sushant Mahajan b92cd9d236
KAFKA-18632: Added few share consumer multibroker tests. (#18679)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-27 12:56:56 +00:00
Chung, Ming-Yen a8f6fc9cc4
KAFKA-18631 Remove ZkConfigs (#18693)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-26 04:37:49 +08:00
PoAn Yang be7415cb8b
KAFKA-18555 Avoid casting MetadataCache to KRaftMetadataCache (#18632)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-25 23:02:28 +08:00
Ken Huang c40e7a1341
KAFKA-18533 Remove KafkaConfig zookeeper related logic (#18547)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-25 22:52:21 +08:00
Chung, Ming-Yen 43af241b50
KAFKA-18639 Enable the @Flaky annotation for some flaky tests (#18701)
The following tests were previously reported as flaky but were only annotated with a comment in pull request #18558 due to module dependency limitations:

    testAdminClientApisAuthenticationFailure
    testOutdatedCoordinatorAssignment
    testThrottledProducerConsumer

With the introduction of the new test infrastructure #18602 , which allows all modules to use the @Flaky annotation, these tests should now be updated to include the @Flaky annotation.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-25 22:44:35 +08:00
mingdaoy c23d4a0d73
KAFKA-18499 Clean up zookeeper from LogConfig (#18583)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-25 22:31:46 +08:00
TaiJuWu 023f9c26e6
KAFKA-18529: ConsumerRebootstrapTest should run for async consumer (#18554)
Reviewers: Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Lianet Magrans <lmagrans@confluent.io>
2025-01-24 20:33:20 +01:00
Apoorv Mittal 70eab7778d
KAFKA-17894: Implemented broker topic metrics for Share Group 1/N (KIP-1103) (#18444)
The PR implements the BrokerTopicMetrics defined in KIP-1103.

The PR also corrected the share-acknowledgement-rate and share-acknowledgement-count metrics defined in KIP-932 as they are moved to BrokerTopicMetrics, necessary changes to KIP-932 broker metrics will be done once we complete KIP-1103.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>
2025-01-24 09:34:54 -08:00
TengYao Chi 2f1bf2f2ab
KAFKA-18630: Clean ReplicaManagerBuilder (#18687)
Reviewers: Christo Lolov <lolovc@amazon.com>
2025-01-24 17:23:48 +00:00
David Arthur 8c0a0e07ce
KAFKA-17587 Refactor test infrastructure (#18602)
This patch reorganizes our test infrastructure into three Gradle modules:

":test-common:test-common-internal-api" is now a minimal dependency which exposes interfaces and annotations only. It has one project dependency on server-common to expose commonly used data classes (MetadataVersion, Feature, etc). Since this pulls in server-common, this module is Java 17+. It cannot be used by ":clients" or other Java 11 modules.

":test-common:test-common-util" includes the auto-quarantined JUnit extension. The @Flaky annotation has been moved here. Since this module has no project dependencies, we can add it to the Java 11 list so that ":clients" and others can utilize the @Flaky annotation

":test-common:test-common-runtime" now includes all of the test infrastructure code (TestKitNodes, etc). This module carries heavy dependencies (core, etc) and so it should not normally be included as a compile-time dependency.

In addition to this reorganization, this patch leverages JUnit SPI service discovery so that modules can utilize the integration test framework without depending on ":core". This will allow us to start moving integration tests out of core and into the appropriate sub-module. This is done by adding ":test-common:test-common-runtime" as a testRuntimeOnly dependency rather than as a testImplementation dependency. A trivial example was added to QuorumControllerTest to illustrate this.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-24 09:03:43 -05:00
Ken Huang 0c9df75295
KAFKA-18474: Remove zkBroker listener (#18477)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang <payang@apache.org>
2025-01-24 05:53:32 -08:00
David Jacot 80d2a8a42d
KAFKA-18616; Refactor DumpLogSegments's MessageParsers (#18688)
All the work that we have done to automate and to simplify the coordinator records allows us to simplify the related MessageParsers in DumpLogSegments. They can all share the same based implementation. This is nice because it ensures that we handle all those records similarly.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-24 04:59:30 -08:00
TengYao Chi 5d81fe20c8
KAFKA-18590 Cleanup DelegationTokenManager (#18618)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-24 20:12:03 +08:00
TengYao Chi fa2df3bca7
KAFKA-18559 Cleanup FinalizedFeatures (#18593)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-24 19:39:01 +08:00
Logan Zhu 356f0d815c
KAFKA-18597 Fix max-buffer-utilization-percent is always 0 (#18627)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-24 18:21:34 +08:00
TengYao Chi 66868fc1fa
KAFKA-18620: Remove UnifiedLog#legacyFetchOffsetsBefore (#18686)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-24 11:11:05 +01:00
TengYao Chi 40890faa1b
KAFKA-18592 Cleanup ReplicaManager (#18621)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Christo Lolov <lolovc@amazon.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-24 01:34:36 +08:00
TaiJuWu ce4eeaa379
MINOR: restore `testGetAllTopicMetadataShouldNotCreateTopicOrReturnUnknownTopicPartition` (#18633)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-24 01:27:18 +08:00
Sushant Mahajan 01afba8fdb
MINOR: Refactor ShareConsumerTest to use ClusterTestExtensions. (#18656)
Reviewers: ShivsundarR <shr@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-23 16:35:33 +00:00
Ken Huang 7e46087570
MINOR: rename `resendBrokerRegistrationUnlessZkMode` to `resendBrokerRegistration` (#18645)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-24 00:33:05 +08:00
TengYao Chi bdc92fd5a1
MINOR: Cleanup zk condition in TransactionsTest, QuorumTestHarness and PlaintextConsumerAssignorsTest (#18639)
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-23 19:53:10 +08:00
David Jacot bc807083fb
KAFKA-18486; [1/2] Update LocalLeaderEndPointTest (#18666)
This patch is a first step towards removing `ReplicaManager#becomeLeaderOrFollower`. It updates the `LocalLeaderEndPointTest` tests.

Reviewers: Christo Lolov <lolovc@amazon.com>, Ismael Juma <ismael@juma.me.uk>
2025-01-23 10:49:16 +01:00
Justine Olshan 94a1bfb128
KAFKA-18575: Transaction Version 2 doesn't correctly handle race condition with completing and new transaction(#18604)
There is a subtle race condition with transactions V2 if a transaction is still completing when checking if we need to add a partition, but it completes when the request reaches the coordinator.

One approach was to remove the verification for TV2 and just check the epoch on write, but a simpler one is to simply return concurrent transactions from the partition leader (before attempting to add the partition). I've done this and added a test for this behavior.

Locally, I reproduced the race but adding a 1 second sleep when handling the WriteTxnMarkersRequest and a 2 second delay before adding the partition to the AddPartitionsToTxnManager. Without this change, the race happened on every second transaction as the first one completed. With this change, the error went away.

As a followup, we may want to clean up some of the code and comments with respect to verification as the code is used by both TV0 + verification and TV2. But that doesn't need to complete for 4.0. This does :)

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Artem Livshits <alivshits@confluent.io>, Calvin Liu <caliu@confluent.io>
2025-01-22 13:44:08 -08:00
Lianet Magrans 410065a65d
KAFKA-18517: Enable ConsumerBounceTest to run for new async consumer (#18532)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Kirk True <ktrue@confluent.io>
2025-01-22 18:02:38 +01:00
Xiaobing Fang f4d90398cc
MINOR: Fix `LogCleanerManagerTest.testLogsUnderCleanupIneligibleForCompaction()` for `LogMessageTimestampType = "LogAppendTime"` (#12333)
While setting Defaults.LogMessageTimestampType to "LogAppendTime", `LogCleanerManagerTest.testLogsUnderCleanupIneligibleForCompaction()` fails with a InvalidTimestampException.

This PR fixes this by regenerating the records instead of previous approach of re-using same records in the test.

Reviewers: Divij Vaidya <diviv@amazon.com>, Kvicii <kvicii.yu@gmail.com>

---------

Co-authored-by: fangxiaobing <fangxiaobing@kuaishou.com>
2025-01-22 17:50:39 +01:00
Ken Huang 341e535942
KAFKA-18519: Remove Json.scala, cleanup AclEntry.scala (#18614)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-22 16:12:06 +01:00
Chung, Ming-Yen 084fcbd327
KAFKA-18599: Remove Optional wrapping for forwardingManager in ApiVersionManager (#18630)
`forwardingManager` is always present now.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-22 06:50:16 -08:00
TengYao Chi a3da6bbb0c
MINOR: Cleanup ControllerCOntext and StateChangeLogger (#18588)
These methods were previously invoked by ZK components, but we have just removed them.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-21 18:41:17 -08:00
David Jacot b368c38684
KAFKA-18302; Update CoordinatorRecord (#18512)
This patch does a few things:
1) Replace ApiMessageAndVersion by ApiMessage in CoordinatorRecord for the key
2) Leverage the fact that ApiMessage exposes the apiKey. Hence we don't need to specify the key anymore.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-21 18:11:26 +01:00
David Jacot 256ccd0c0d
KAFKA-18487; Remove ReplicaManager#stopReplicas (#18647)
This patch removes `ReplicaManager#stopReplicas`. I have ensured that removed unit tests are covered by other existing tests or are updated to use kraft.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-21 11:47:16 +01:00
Dimitar Dimitrov 31d8e68ed1
KAFKA-18583; Fix getPartitionReplicaEndpoints for KRaft (#18635)
Although `MetadataCache`'s `getPartitionReplicaEndpoints` takes a single topic-partition, the `KRaftMetadataCache` implementation iterates over all partitions of the matching topic. This is not necessary and can cause significant performance degradation when the topic has a relatively high number of partitions.

Note that this is not a recent regression - it has been a part of `KRaftMetadataCache` since its creation.

Reviewers: Ismael Juma <ismael@juma.me.uk>, David Jacot <djacot@confluent.io>
2025-01-21 10:51:59 +01:00
David Jacot 76bf38a4fd
KAFKA-18604; Update transaction coordinator (#18636)
This patch updates the transaction coordinator record to use the new coordinator record definition.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-21 08:36:23 +01:00
Ismael Juma 87b37a4065
KAFKA-14552: Assume a baseline of 3.0 for server protocol versions (#18497)
Kafka 4.0 will remove support for zk mode and will require conversion to kraft
before upgrading to 4.0. The minimum kraft version is 3.0 (aka 3.0-IV1).

This provides an opportunity to remove exclusively server side protocols versions
that only exist to allow direct upgrades from versions older than 3.0 or that are
used only by zk mode.

Since KRaft became production ready in 3.3, we should consider setting the
baseline to 3.3. But that requires more discussion and it can be done via a
separate change (KAFKA-18601).

Protocol changes:
* Remove RequestHeader v0 (only used by ControlledShutdown v0)
* Remove WriteTxnMarkers v0
* Remove all versions of ControlledShutdown, LeaderAndIsr, StopReplica, UpdateMetadata

In order to remove all versions safely, extend generator to support setting
"versions" to "none". In this case, we no longer generate the `*Data` classes,
but we still reserve the id for the relevant protocol api (so it doesn't get
accidentally used for something else). The protocol documentation is correct
after these changes.

We kept a simplified version of `LeaderAndIsr{Request|Response}` because
it's used by many tests that are still relevant in kraft mode. Once
KAFKA-18486 is done, it may be possible to remove it (I left a comment on
the ticket). Similarly, KAFKA-18487 may make it possible to remove
the introduced `StopReplicaPartitionState` (left a comment on that ticket too).

There are a number of places that were adjusted to include an
`ApiKeys.hasValidVersion` check.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-20 13:51:44 -08:00
TengYao Chi 837fb1ed02
MINOR: Remove unused QuotaConfgHandler (#18617)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-21 03:02:42 +08:00
TengYao Chi f1ee0557f8
MINOR: Remove zk related statement from ControllerConfigurationValidator (#18637)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-20 17:10:24 +01:00
Ken Huang 892a446789
KAFKA-18594: Cleanup BrokerLifecycleManager (#18626)
Reviewers: Christo Lolov <lolovc@amazon.com>
2025-01-20 15:17:57 +00:00
Ken Huang 71495a2013
KAFKA-18568: Fix flaky test ClientIdQuotaTest (#18612)
The reason for flakiness is PR #18080 which modifies the linger.ms config from 0 to 5. ClientIdQuotaTest are testing "Low enough quota that a producer sending a small payload in a tight loop should get throttled", thus this config change Influence this test scenario.

This commits uses the older value of 0ms for linger.ms for ClientIdQuotaTest tests.

Reviewers: Ismael Juma <ismael@juma.me.uk>, TaiJuWu <tjwu1217@gmail.com>, Divij Vaidya <diviv@amazon.com>
2025-01-20 16:05:47 +01:00
TengYao Chi a842c02b88
KAFKA-18553: Update javadoc and comments of ConfigType (#18567)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <amittal@confluent.io>
2025-01-20 15:20:36 +01:00
Sanskar Jhajharia bcbc72e29b
[KAFKA-16720] AdminClient Support for ListShareGroupOffsets (1/n) (#18571)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-20 07:47:14 +00:00
Ken Huang 96499029b7
KAFKA-18588 Remove TopicKey.scala (#18624)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-20 10:30:14 +08:00
TaiJuWu 20e616ecc1
KAFKA-18578: Remove `UpdateMetadataRequest` from `MetadataCacheTest` (#18628)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-19 12:18:43 -08:00
Ken Huang c044eb61a1
KAFKA-18593 Remove ZkCachedControllerId In MetadataCache (#18625)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-19 10:15:06 -08:00
TengYao Chi 697cf641bc
KAFKA-17668: Clean-up LogCleaner#maxOverCleanerThreads and LogCleanerManager#maintainUncleanablePartitions (#17390)
Since maxOverCleanerThreads does not have a corresponding unit test,
I have added a unit test for it. maintainUncleanablePartitions has been
thoroughly tested in tests, so I simply replaced the old implementation
with the new one.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-19 07:21:52 -08:00
Ken Huang 3082a85e0e
MINOR: Remove KafkaApis unused method (#18620)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-19 22:45:22 +08:00
Ken Huang 516d5240b9
KAFKA-18429 Remove ZkFinalizedFeatureCache and StateChangeFailedException (#18616)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-19 19:13:47 +08:00
TengYao Chi 485d36d3b3
KAFKA-18589 Remove unused interBrokerProtocolVersion from GroupMetadataManager (#18619)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-19 18:59:24 +08:00
Ken Huang eb1c8419aa
KAFKA-18516 Remove RackAwareMode (#18598)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-19 18:49:12 +08:00
Ken Huang ce5d7be1dc
KAFKA-18492 Cleanup RequestHandlerHelper (#18608)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-19 18:41:41 +08:00
Ken Huang bb3944c1d0
KAFKA-18427: Remove ZooKeeperClient (#18613)
Reviewers: Ismael Juma <ismael@juma.me.uk, TingIāu "Ting" Kì <51072200+frankvicky@users.noreply.github.com>
2025-01-18 21:14:46 -08:00
TaiJuWu 4772805cd5
KAFKA-18540: Remove `UpdataMetadataRequest` from `KafkaApisTest` (#18591)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-18 21:12:52 -08:00
Ken Huang 6eddaeba58
KAFKA-18532: Clean Partition.scala zookeeper logic (#18594)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-18 17:54:06 -08:00
TaiJuWu ff3de0cedc
MINOR: restore testUnauthorizedTopicMetadataRequest (#18578)
This was removed during removal of zk code (#18542), but
we should instead convert it to work with kraft.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-18 17:34:10 -08:00
PoAn Yang 85ec8b50c8
KAFKA-18423: Remove ZkData and related unused references (#18605)
Also removed:

* ZkNodeChangeNotificationListener
* KafkaZkClient

Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-18 17:07:42 -08:00
TengYao Chi 029d9184c6
KAFKA-18565 Cleanup SaslSetup (#18586)
Reviewers: Christo Lolov <lolovc@amazon.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-19 03:14:59 +08:00
Ken Huang a814e21da8
KAFKA-18430 Remove ZkNodeChangeNotificationListener (#18606)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-19 01:34:30 +08:00
PoAn Yang e124d3975b
KAFKA-806: Index may not always observe log.index.interval.bytes (#18012)
Currently, each log.append() will add at most 1 index entry, even when the appended data is larger than log.index.interval.bytes. One potential issue is that if a follower restarts after being down for a long time, it may fetch data much bigger than log.index.interval.bytes at a time. This means that fewer index entries are created, which can increase the fetch time from the consumers.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>
2025-01-17 11:51:05 -08:00
Ken Huang a6faec179a
KAFKA-18515 Remove DelegationTokenManagerZk (#18595)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Christo Lolov <lolovc@amazon.com>
2025-01-17 16:47:33 +00:00
Ismael Juma 3996e90ff7
Remove casts to KRaftMetadataCache (#18579)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-17 07:09:34 -08:00
Alyssa Huang 4583b033f0
KAFKA-17642: PreVote response handling and ProspectiveState (#18240)
This PR implements the second part of KIP-996 and KAFKA-16164 (tasks KAFKA-16607, KAFKA-17642, KAFKA-17643, KAFKA-17675) which encompass the response handling of PreVotes, addition of new ProspectiveState, update to metrics, and addition of Raft simulation tests.

Voters now transition to ProspectiveState first before CandidateState to prevent unnecessary epoch bumps. Voters in ProspectiveState send PreVotes requests which are Vote requests with PreVote set to true.

Follower grants PreVotes if it has not yet fetched successfully from leader. Leader denies all PreVotes. Unattached, Prospective, Candidate, and Resigned will grant PreVotes if the requesting replica's log is at least as long as theirs. Granted PreVotes are not persisted like standard votes. It is possible for a voter to grant several PreVotes in the same epoch.

The only state which is allowed to transition directly to CandidateState is ProspectiveState. This happens on reception of majority of granted PreVotes or if at least one voter doesn't support PreVote requests.

Prospective will transition to Follower after election loss/timeout if it was already aware of last known leader and the leader's endpoint, or at any point if it discovers the leader.

Prospective will transition to Unattached after election loss/timeout if it does not know the leader endpoints.

After electionTimeout, Resigned now always transitions to Unattached and increases the epoch.

Prospective grants standard votes if it has not already granted a standard vote (no votedKey), has no leaderId, and the recipient's log is current enough

Candidate no longer backs off after election timeout. Candidate still backs off after election loss.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2025-01-17 09:38:03 -05:00
TengYao Chi 3191fe56fc
KAFKA-18413: Remove AdminZkClient (#18585)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-17 15:10:31 +01:00
PoAn Yang 5a1fb1588d
KAFKA-18373: Remove ZkMetadataCache (#18553)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2025-01-17 11:49:11 +01:00
PoAn Yang 0e502e0b47
KAFKA-18431: Remove KafkaController (#18573)
Remove KafkaController and related unused references:

* ControllerChannelContext
* ControllerChannelManager
* ControllerEventManager
* ControllerState
* PartitionStateMachine
* ReplicaStateMachine
* TopicDeletionManager
* ZkBrokerEpochManager

Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-16 18:56:05 -08:00
Kirk True 4f0a91393c
KAFKA-17623: Flaky testSeekPositionAndPauseNewlyAssignedPartitionOnPartitionsAssignedCallback (#18515)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-01-16 22:02:23 +01:00
Ken Huang cabbb613a2
KAFKA-18407: Remove ZkAdminManager, DelayedCreatePartitions, CreatePartitionsMetadata, ZkConfigRepository, DelayedDeleteTopics (#18574)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-16 19:32:22 +01:00
TengYao Chi 60cc2a0dea
KAFKA-18556: Remove JaasModule#zkDigestModule, JaasTestUtils#zkSections (#18568)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-16 18:17:06 +01:00
Mickael Maison 6eb44ad869
KAFKA-14485: Move LogCleaner exceptions to storage module (#18534)
Reviewers: Luke Chen <showuon@gmail.com>, Ken Huang <s7133700@gmail.com>
2025-01-16 17:26:29 +01:00
Jason Taylor 54fe0f0135
KAFKA-16368: Add a new constraint for segment.bytes to min 1MB for KIP-1030 (#18140)
Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-16 16:07:00 +01:00
TengYao Chi 724cf84de9
MINOR: Replace deprecated RichOptional.asScala with toScala (#18529)
Reference: https://www.scala-lang.org/api/2.13.0/scala/jdk/OptionConverters$$RichOptional.html

Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-16 06:32:49 -08:00
Jason Taylor 23c459286b
KAFKA-16368: Update defaults for LOG_MESSAGE_TIMESTAMP_AFTER_MAX_MS_DEFAULT and NUM_RECOVERY_THREADS_PER_DATA_DIR_CONFIG (#18106)
Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-16 15:30:00 +01:00
Ken Huang ce1b079884
KAFKA-18542 Cleanup AlterPartitionManager (#18552)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-16 19:50:02 +08:00
Ken Huang 762bbcb711
KAFKA-18406 Remove ZkBrokerEpochManager.scala (#18561)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-16 19:35:09 +08:00
PoAn Yang 25f2ed090a
KAFKA-18405 Remove ZooKeeper logic from DynamicBrokerConfig (#18508)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-16 19:23:59 +08:00
Jason Taylor 11c10fe4da
KAFKA-16368: Update default linger.ms to 5ms for KIP-1030 (#18080)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Divij Vaidya <diviv@amazon.com>
2025-01-16 10:50:06 +01:00
PoAn Yang 14daa23b59
KAFKA-18331: Make process.roles and node.id required configs (#18414)
In 4.0, there is no ZK mode and both of these configs are required in kraft mode.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-15 23:55:51 -08:00
Chung, Ming-Yen 60d08a7abb
KAFKA-18552: Remove unnecessary version check in testHandleOffsetFetch* (#18559)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-15 19:39:19 -08:00
Ken Huang b9ccab42fe
KAFKA-18472: Remove MetadataSupport (#18483)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-15 19:38:33 -08:00
Dmitry Werner 92fd99bda1
KAFKA-18479: Remove keepPartitionMetadataFile in UnifiedLog and LogMan… (#18491)
Reviewers: Jun Rao <junrao@gmail.com>
2025-01-15 13:59:28 -08:00
Apoorv Mittal 3fa998475b
KAFKA-18539 Remove optional managers in KafkaApis (#18550)
Removed Optional for SharePartitionManager and ClientMetricsManager as zookeeper code is being removed. Also removed asScala and asJava conversion in KafkaApis.handleListClientMetricsResources, moved to java stream.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-16 04:46:05 +08:00
Sushant Mahajan 47f22faac3
MINOR: Added flaky references for a few tests. (#18558)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-15 19:24:52 +00:00
Apoorv Mittal 06400177bb
KAFKA-18452: Implemented batch size in acquired records (#18459)
The PR implements a way to divide acquired batches into batch size as desirable by client.

The BatchSize is the soft limit and should align the batches in response and cached state in broker at the log batch boundaries.

Reviewers:  Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>, Jun Rao <junrao@gmail.com>
2025-01-15 11:08:05 -08:00
Kuan-Po Tseng d3b4c1bdf4
KAFKA-18401: Transaction version 2 does not support commit transaction without records (#18448)
Fix the issue where producer.commitTransaction under transaction version 2 throws error if no partition or offset is added to transaction. The solution is to avoid sending the endTxnRequest unless producer.send or producer.sendOffsetsToTransaction is triggered.

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-01-15 10:21:11 -08:00
Apoorv Mittal e12db663f0
KAFKA-18514 Remove server dependency on share coordinator (#18536)
The PR removes dependency of server module on share-coordinator, rather it should be other way. Moved the ShareCoordinatorConfig class from server to share-coordinator.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-16 00:47:01 +08:00
Sushant Mahajan e3a56f3162
KAFKA-18513: Validate share state topic records produced in tests. (#18521)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-15 16:10:07 +00:00
TaiJuWu d96b68252c
KAFKA-18399 Remove ZooKeeper from KafkaApis (12/N): clean up ZKMetadataCache, KafkaController and raftSupport (#18542)
Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-15 23:28:57 +08:00
Ted Yan 5b8319e6c2
KAFKA-18399 Remove ZooKeeper from KafkaApis (11/N): CREATE_ACLS and DELETE_ACLS (#18540)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-15 18:46:50 +08:00
TaiJuWu 3fcaec7f4e
KAFKA-18399 Remove ZooKeeper from KafkaApis (10/N): ALTER_CONFIG and INCREMENETAL_ALTER_CONFIG (#18432)
Reviewers: Christo Lolov <lolovc@amazon.com>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-15 18:25:45 +08:00
PoAn Yang ae661dec34
KAFAK-18451: Flaky RemoteLogManagerTest#testRLMOpsWhenMetadataIsNotReady (#18520)
The REMOTE_LOG_MANAGER_TASK_INTERVAL_MS_PROP in RemoteLogManagerTest is 100 which is too small. If assertions verifyNoMoreInteractions can't run in 100ms, the scheduler will run RLMTask again and the case will fail.

Reviewers: Luke Chen <showuon@gmail.com>
2025-01-15 16:18:06 +08:00
Ismael Juma f3a93551fa
Revert "KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18050)" (#18544)
This reverts commit 70d6312a3a.

Reviewers: Luke Chen <showuon@gmail.com>
2025-01-15 16:16:47 +08:00
TengYao Chi 118e1835cc
KAFKA-18502 Remove kafka.controller.Election (#18518)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-15 05:42:16 +08:00
mingdaoy 9f955973fe
KAFKA-18399 Remove ZooKeeper from KafkaApis (9/N): ALTER_CLIENT_QUOTAS and ALLOCATE_PRODUCER_IDS (#18465)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-15 05:06:16 +08:00
TaiJuWu ddc3faa88f
KAFKA-18399 Remove ZooKeeper from KafkaApis (8/N): ELECT_LEADERS , ALTER_PARTITION, UPDATE_FEATURES (#18453)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-15 04:53:03 +08:00
Ken Huang c00c576b65
MINOR: Fix some comments typo (#18509)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-14 15:00:02 +00:00
Sanskar Jhajharia e3e4c17959
Add DescribeShareGroupOffsets API [KIP-932] (#18500)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-14 14:33:39 +00:00
David Jacot f6912a9a6a
MINOR: Fix typo in DumpLogSegments' TransactionLogMessageParser (#18505)
The value should use `version` instead of `type`.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-14 08:10:10 +01:00
Peter Lee da0c3beffa
KAFKA-18491 Remove zkClient & maybeUpdateMetadataCache from ReplicaManager (#18507)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-14 03:20:23 +08:00
Ken Huang 70d6312a3a
KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18050)
Reviewers: Kirk True <ktrue@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2025-01-13 15:29:14 +01:00
Abhinav Dixit 4e24c50ba3
KAFKA-18404: Remove partitionMaxBytes usage from DelayedShareFetch (#17870)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-13 13:56:19 +00:00
TengYao Chi 5d95d91807
MINOR: Javadoc fixes in KRaftMetadataCache (#18493)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-13 14:37:03 +01:00
TaiJuWu 9f5d9f3cd4
KAFKA-18399 Remove ZooKeeper from KafkaApis (7/N): CREATE_TOPICS, DELETE_TOPICS, CREATE_PARTITIONS (#18433)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-13 21:23:59 +08:00
Ken Huang 0b60d08d23
KAFKA-18341: Remove KafkaConfig GroupType config check and warn log (#18320)
As ZK mode is being removed for 4.0, we don't need this check anymore.

Reviewers: David Jacot <djacot@confluent.io>
2025-01-13 02:42:17 -08:00
David Jacot 273719227e
KAFKA-18457; Update DumpLogSegments to use coordinator record json converters (#18480)
This patch updates the ShareGroupStateMessageParser and OffsetsMessageParser used by the DumpLogSegments command line tool to use the recently introduced json converters for those records. It basically means that new records are automatically supported.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-13 11:28:54 +01:00
Ken Huang 33556aedc3
KAFKA-18399 Remove ZooKeeper from KafkaApis (6/N): `handleCreateTokenRequest`, `handleRenewTokenRequestZk`, `handleExpireTokenRequestZk` (#18447)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-13 01:42:32 +08:00
Ken Huang 6a8ffe7f64
KAFKA-18399 Remove ZooKeeper from KafkaApis (5/N): ALTER_PARTITION_REASSIGNMENTS, LIST_PARTITION_REASSIGNMENTS (#18464)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-13 01:33:24 +08:00
PoAn Yang 3cf2e45dc4
KAFKA-18399 Remove ZooKeeper from KafkaApis (4/N): OFFSET_COMMIT and OFFSET_FETCH (#18461)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-12 20:54:10 +08:00
TengYao Chi 81b1508acb
KAFKA-18399 Remove ZooKeeper from KafkaApis (3/N): USER_SCRAM_CREDENTIALS (#18456)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-12 20:39:49 +08:00
Xuan-Zhang Gong 1ee715473c
KAFKA-18446 Remove MetadataCacheControllerNodeProvider (#18437)
Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-12 02:12:35 +08:00
Ismael Juma d4aee71e36
KAFKA-18465: Remove MetadataVersions older than 3.0-IV1 (#18468)
Apache Kafka 4.0 will only support KRaft and 3.0-IV1 is the minimum version supported by KRaft. So, we can assume that Apache Kafka 4.0 will only communicate with brokers that are 3.0-IV1 or newer.

Note that KRaft was only marked as production-ready in 3.3, so we could go further and set the baseline to 3.3. I think we should have that discussion, but it made sense to start with the non controversial parts.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <david.jacot@gmail.com>
2025-01-11 09:42:39 -08:00
Justine Olshan 32dbbe6a1f
KAFKA-18464: Empty Abort Transaction can fence producer incorrectly with Transactions V2 (#18467)
To avoid self-fencing in the commit/abort + empty abort scenario, return the concurrent transactions error when we have pending state and do the epoch check second. In this scenario, we will complete the previous transaction before proceeding to the empty abort.

Added a test that failed before the change.

Note -- only the pending state is checked earlier. This is because we don’t return from EndTxn (the first commit) until we already written to the log, transitioned to PrepareX, and have the pending CompleteX state. We don't need to worry about the cases of an EndTxn request coming in with PrepareX without the pending state because that would be an older request and/or retry which are already covered.

Reviewers: Artem Livshits <alivshits@confluent.io>, Jeff Kim <jeff.kim@confluent.io>
2025-01-10 16:51:58 -08:00
PoAn Yang a7342a2e62
MINOR: fix flaky RemoteLogManagerTest#testStopPartitionsWithDeletion (#18474)
The test has become flakier recently and it's easy to reproduce by running the single test (vs
running the the class test suite).

The root cause is that following functions call `RemoteLogMetadataManager#listRemoteLogSegments`.
It returns iterator. If one of function goes through iterator first, another can't get expected result.
I changed `thenReturn` to `thenAnswer` to avoid the issue.

The race is between:
* RLMExpirationTask#cleanupExpiredRemoteLogSegments
* RemoteLogManager#deleteRemoteLogPartition

Reviewers: Ismael Juma <ismael@juma.me.uk>

Signed-off-by: PoAn Yang <payang@apache.org>
2025-01-10 06:33:10 -08:00
TaiJuWu 5684fc7a2e
KAFKA-18399 Remove ZooKeeper from KafkaApis (2/N): CONTROLLED_SHUTDOWN and ENVELOPE (#18422)
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-10 04:12:12 +08:00
Ismael Juma cf7029c026
KAFKA-13093: Log compaction should write new segments with record version v2 (KIP-724) (#18321)
Convert v0/v1 record batches to v2 during compaction even if said record batches would be
written with no change otherwise. A few important details:

1. V0 compressed record batch with multiple records is converted into single V2 record batch
2. V0 uncompressed records are converted into single record V2 record batches
3. V0 records are converted to V2 records with timestampType set to `CreateTime` and the
timestamp is `-1`.
4. The `KAFKA-4298` workaround is no longer needed since the conversion to V2 fixes
the issue too.
5. Removed a log warning applicable to consumers older than 0.10.1 - they are no longer
supported.
6. Added back the ability to append records with v0/v1 (for testing only).
7. The creation of the leader epoch cache is no longer optional since the record version
config is effectively always V2.

Add integration tests, these tests existed before #18267 - restored, modified and
extended them.

Reviewers: Jun Rao <jun@confluent.io>
2025-01-09 09:37:23 -08:00
Peter Lee a116753cc8
KAFKA-17986 Fix ConsumerRebootstrapTest and ProducerRebootstrapTest (#18175)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-10 01:02:34 +08:00
Ken Huang 307059c770
KAFKA-18399 Remove ZooKeeper from KafkaApis (1/N): `LEADER_AND_ISR`, `STOP_REPLICA`, `UPDATE_METADATA` (#18417)
Delete the handlers for LEADER_AND_ISR, STOP_REPLICA, and UPDATE_METADATA. Also, remove the corresponding unit tests in KafkaApisTest.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-10 00:01:38 +08:00
TengYao Chi 6fa5537ceb
MINOR: Replace ImplicitConversions with CollectionConverters (#18412)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-09 10:00:55 +01:00
Colin Patrick McCabe c28d9a3486
KAFKA-18435 Remove zookeeper dependencies in build.gradle (#18450)
Remove Apache ZooKeeper from the Apache Kafka build. Also remove commons IO, commons CLI, and netty, which were dependencies we took only because of ZooKeeper.

In order to keep the size of this PR manageable, I did not remove all classes which formerly interfaced with ZK. I just removed the ZK types. Fortunately, Kafka generally wrapped ZK data structures rather than using them directly.

Some classes were pretty entangled with ZK, so it was easier just to stub them out. For ZkNodeChangeNotificationListener.scala, PartitionStateMachine.scala, ReplicaStateMachine.scala, KafkaZkClient.scala, and ZookeeperClient.scala, I replaced all the functions with "throw new UnsupportedOperationException". Since the tests for these classes have been removed, as well as the ZK-based broker code, this should be OK as an incremental step.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-09 10:25:17 +08:00
Logan Zhu 5efaae65c6
KAFKA-18432 Remove unused code from AutoTopicCreationManager (#18438)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-09 02:52:28 +08:00
PoAn Yang 4b1b67e3c4
KAFKA-18434: enrich the authorization error message of connecting to controller (#18436)
Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-08 18:56:29 +01:00
TengYao Chi af3f9e3a5a
KAFKA-18426 Remove FinalizedFeatureChangeListener (#18441)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-09 01:24:03 +08:00
Justine Olshan f79d7dc3f4
KAFKA-18035: Readd flaky annotation back for testBumpTransactionalEpochWithTV2Disabled (#18426)
Reviewers: David Jacot <djacot@confluent.io>
2025-01-08 09:06:41 -08:00
TengYao Chi aa22676c48
KAFKA-18425 Remove OffsetTrackingListener (#18443)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-09 00:18:49 +08:00
Andrew Schofield 3f9d2c2db0
KAFKA-18433: Add BatchSize to ShareFetch request (1/N) (#18439)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
2025-01-08 15:29:43 +00:00
Peter Lee 0377e807ff
MINOR: Use Producer interface and ClusterInstance producer factory (#18197)
Reviewers: David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 20:53:30 +08:00
Peter Lee 08ef22d888
KAFKA-18173 Remove duplicate `assertFutureError` (#18296)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 20:24:35 +08:00
Mickael Maison d1aa370db8
KAFKA-15599: Move SegmentPosition/TimingWheelExpirationService to raft module (#18094)
Reviewers: Divij Vaidya <diviv@amazon.com>, Jason Taylor <jastaylr@amazon.com>
2025-01-08 12:49:38 +01:00
TaiJuWu 0c435e3855
KAFKA-18353 Remove zk config `control.plane.listener.name` (#18329)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 18:38:37 +08:00
TengYao Chi 8568157f7f
KAFKA-18443 Remove ZkFourLetterWords (#18429)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 17:24:36 +08:00
Jhen-Yung Hsu f95726a211
KAFKA-18417 Remove controlled.shutdown.max.retries and controlled.shutdown.retry.backoff.ms (#18431)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 17:13:42 +08:00
Ken Huang 6aef94e9ec
KAFKA-18411 Remove ZkProducerIdManager (#18413)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 04:23:27 +08:00
TengYao Chi 2a073a14d2
KAFKA-18414 Remove KRaftRegistrationResult (#18401)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 04:09:28 +08:00
Ken Huang 9d93a4f68f
KAFKA-18384 Remove ZkAlterPartitionManager (#18364)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-07 18:08:57 +01:00
TengYao Chi af255a0c37
KAFKA-18412: Remove EmbeddedZookeeper (#18399)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-07 17:08:00 +01:00
NICOLAS GUYOMAR 2fc35c81be
MINOR : Improve Exception log in NotEnoughReplicasException(#12394)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2025-01-07 20:21:02 +05:30
David Jacot 48b522fe86
MINOR: Improve PlaintextAdminIntegrationTest#testConsumerGroups (#18409)
Looking at the [history](https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Europe%2FZurich&tests.container=kafka.api.PlaintextAdminIntegrationTest&tests.test=testConsumerGroups(String%2C%20String)%5B2%5D), I found out that one source of flakiness is due to syncCommit failing with CommitFailedException. We can ignore it and retry on the next iteration.

```
[2025-01-07 10:17:00,783] ERROR [Consumer instanceId=test_instance_id_1, clientId=test_client_id, groupId=test_group_id] OffsetCommit failed for member VfImExrxT3-w_HNJcTkqnw with stale member epoch error. Last epoch sent: 2 (org.apache.kafka.clients.consumer.internals.CommitRequestManager:773)
Exception in thread "Thread-6" org.apache.kafka.clients.consumer.CommitFailedException: OffsetCommit failed with stale member epoch.The member epoch is stale. The member must retry after receiving its updated member epoch via the ConsumerGroupHeartbeat API.
        at org.apache.kafka.clients.consumer.internals.CommitRequestManager.commitSyncExceptionForError(CommitRequestManager.java:481)
        at org.apache.kafka.clients.consumer.internals.CommitRequestManager.lambda$commitSyncWithRetries$7(CommitRequestManager.java:472)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
        at org.apache.kafka.clients.consumer.internals.CommitRequestManager$OffsetCommitRequestState.onResponse(CommitRequestManager.java:776)
        at org.apache.kafka.clients.consumer.internals.CommitRequestManager$RetriableRequestState.handleClientResponse(CommitRequestManager.java:893)
        at org.apache.kafka.clients.consumer.internals.CommitRequestManager$RetriableRequestState.lambda$buildRequestWithResponseHandling$0(CommitRequestManager.java:883)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
        at org.apache.kafka.clients.consumer.internals.NetworkClientDelegate$FutureCompletionHandler.onComplete(NetworkClientDelegate.java:433)
        at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:154)
        at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:669)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:661)
        at org.apache.kafka.clients.consumer.internals.NetworkClientDelegate.poll(NetworkClientDelegate.java:153)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.runOnce(ConsumerNetworkThread.java:160)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.run(ConsumerNetworkThread.java:106)
```

Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-01-07 06:27:26 -08:00
Colin Patrick McCabe d8236bec44
MINOR: Remove RaftManager.maybeDeleteMetadataLogDir and AutoTopicCreationManagerTest.scala (#17365)
Remove RaftManager.maybeDeleteMetadataLogDir since it was only used during ZK migration, and that code has been removed.

Similarly, remove RaftManagerTest.testKRaftBrokerDoesNotDeleteMetadataLog which tested that function.

Remove AutoTopicCreationManagerTest since it tests the ZK-mode-only AutoTopicReationManager.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-07 21:18:58 +08:00
Ken Huang d874aa42f3
KAFKA-18368 Remove TestUtils#MockZkConnect and remove zkConnect from TestUtils#createBrokerConfig (#18352)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-07 21:03:13 +08:00
David Jacot 7b6e94642a
KAFKA-18303; Update ShareCoordinator to use new record format (#18396)
Following https://github.com/apache/kafka/pull/18261, this patch updates the Share Coordinator to use the new record format.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-06 23:59:07 -08:00
David Arthur c4840f5e93
KAFKA-16446: Improve controller event duration logging (#15622)
There are times when the controller has a high event processing time, such as during startup, or when creating a topic with many partitions. We can see these processing times in the p99 metric (kafka.controller:type=ControllerEventManager,name=EventQueueProcessingTimeMs), however it's difficult to see exactly which event is causing high processing time.

With DEBUG logs, we see every event along with its processing time. Even with this, it's a bit tedious to find the event with a high processing time.

This PR logs all events which take longer than 2 seconds at ERROR level. This will help identify events that are taking far too long, and which could be disruptive to the operation of the controller. The slow event logging looks like this:

```
[2024-12-20 15:03:39,754] ERROR [QuorumController id=1] Exceptionally slow controller event createTopics took 5240 ms.  (org.apache.kafka.controller.EventPerformanceMonitor)
```

Also, every 60 seconds, it logs some event time statistics, including average time, maximum time, and the name of the event which took the longest. This periodic message looks like this:

```
[2024-12-20 15:35:04,798] INFO [QuorumController id=1] In the last 60000 ms period, 333 events were completed, which took an average of 12.34 ms each. The slowest event was handleCommit[baseOffset=0], which took 41.90 ms. (org.apache.kafka.controller.EventPerformanceMonitor)
```

An operator can disable these logs by adding the following to their log4j config:

```
org.apache.kafka.controller.EventPerformanceMonitor=OFF
```

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2025-01-06 13:34:46 -08:00
Zhihong Yu c116354908
MINOR: Use clear method to speed up removal (#18400)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-06 17:50:18 +00:00
Mickael Maison 64bbdb1a03
KAFKA-17616: Remove KafkaServer (#18384)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Ken Huang <s7133700@gmail.com>
2025-01-06 14:36:08 +01:00
Ken Huang a628d9bc4d
KAFKA-18365 Remove zookeeper.connect in Test (#18353)
Reviewers: TaiJuWu <tjwu1217@gmail.com>, PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-04 23:04:21 +08:00
Ismael Juma 73ab7ee4ea
MINOR: Use `Files.readString/writeString` and `String.repeat` to simplify code (#18372)
The 3 methods were introduced in Java 11.

Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-02 17:50:27 -08:00
David Arthur a2a8d87153
MINOR remove some flaky annotations (#18357)
Remove the flaky annotation from the following tests

* RemoteLogManagerTest#testFetchOffsetByTimestampWithTieredStorageDoesNotFetchIndexWhenExistsLocally
* All the children of BaseConsumerTest#testCoordinatorFailover
* TransactionsTest#testFailureToFenceEpoch
* TransactionsTest#testReadCommittedConsumerShouldNotSeeUndecidedData
* MetricsDuringTopicCreationDeletionTest#testMetricsDuringTopicCreateDelete
* ProduceRequestTest#testProduceWithInvalidTimestamp

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-02 11:04:50 -05:00
xijiu dbf4602c13
KAFKA-18367 Remove ZkConfigManager (#18363)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-01 21:16:25 +08:00
xijiu 5f8cf0e1f5
KAFKA-17421 Add integration tests for ConsumerRecord#leaderEpoch (#18254)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-01 00:37:31 +08:00
TengYao Chi 3161115ada
KAFKA-18361 Remove PasswordEncoderConfigs (#18347)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-31 00:18:23 +08:00