Commit Graph

179 Commits

Author SHA1 Message Date
Dongnuo Lyu 21d60eabab
KAFKA-16673; Simplify `GroupMetadataManager#toTopicPartitions` by using `ConsumerProtocolSubscription` instead of `ConsumerPartitionAssignor.Subscription` (#16309)
In `GroupMetadataManager#toTopicPartitions`, we generate a list of `ConsumerGroupHeartbeatRequestData.TopicPartitions` from the input deserialized subscription. Currently the input subscription is `ConsumerPartitionAssignor.Subscription`, where the topic partitions are stored as (topic-partition) pairs, whereas in `ConsumerGroupHeartbeatRequestData.TopicPartitions`, we need the topic partitions to be stored as (topic-partition list) pairs.

`ConsumerProtocolSubscription` is an intermediate data structure in the deserialization where the topic partitions are stored as (topic-partition list) pairs. This pr uses `ConsumerProtocolSubscription` instead as the input subscription to make `toTopicPartitions` more efficient. 

Reviewers: David Jacot <djacot@confluent.io>
2024-06-17 02:47:52 -07:00
Omnia Ibrahim e99da2446c
KAFKA-15853: Move KafkaConfig.configDef out of core (#16116)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-14 17:26:00 +02:00
gongxuanzhang 6d9ef0e12a
KAFKA-10787 Apply spotless to `group-coordinator` and `group-coordinator-api` (#16298)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-14 12:46:28 +08:00
Dongnuo Lyu 11c85a93c3
MINOR: Make online downgrade failure logs less noisy and update the timeouts scheduled in `convertToConsumerGroup` (#16290)
This patch: 
- changes the order of the checks in `validateOnlineDowngrade`, so that only when the last member using the consumer protocol leave and the group still has classic member(s), `online downgrade is disabled` is logged if the policy doesn't allow downgrade.
- changes the session timeout in `convertToConsumerGroup` from `consumerGroupSessionTimeoutMs` to `member.classicProtocolSessionTimeout().get()`.

Reviewers: David Jacot <djacot@confluent.io>
2024-06-13 02:11:01 -07:00
gongxuanzhang 596b945072
KAFKA-16643 Add ModifierOrder checkstyle rule (#15890)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-13 15:39:32 +08:00
David Jacot 638844f833
KAFKA-16770; [2/2] Coalesce records into bigger batches (#16215)
This patch is the continuation of https://github.com/apache/kafka/pull/15964. It introduces the records coalescing to the CoordinatorRuntime. It also introduces a new configuration `group.coordinator.append.linger.ms` which allows administrators to chose the linger time or disable it with zero. The new configuration defaults to 10ms.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-06-11 23:29:50 -07:00
David Jacot 98f7da9172
KAFKA-16930; UniformHeterogeneousAssignmentBuilder throws NPE when one member has no subscriptions (#16283)
Fix the following NPE:

```
java.lang.NullPointerException: Cannot invoke "org.apache.kafka.coordinator.group.assignor.MemberAssignment.targetPartitions()" because the return value of "java.util.Map.get(Object)" is null
	at org.apache.kafka.coordinator.group.assignor.GeneralUniformAssignmentBuilder.canMemberParticipateInReassignment(GeneralUniformAssignmentBuilder.java:248)
	at org.apache.kafka.coordinator.group.assignor.GeneralUniformAssignmentBuilder.balance(GeneralUniformAssignmentBuilder.java:336)
	at org.apache.kafka.coordinator.group.assignor.GeneralUniformAssignmentBuilder.buildAssignment(GeneralUniformAssignmentBuilder.java:157)
	at org.apache.kafka.coordinator.group.assignor.UniformAssignor.assign(UniformAssignor.java:84)
	at org.apache.kafka.coordinator.group.consumer.TargetAssignmentBuilder.build(TargetAssignmentBuilder.java:302)
	at org.apache.kafka.coordinator.group.GroupMetadataManager.updateTargetAssignment(GroupMetadataManager.java:1913)
	at org.apache.kafka.coordinator.group.GroupMetadataManager.consumerGroupHeartbeat(GroupMetadataManager.java:1518)
	at org.apache.kafka.coordinator.group.GroupMetadataManager.consumerGroupHeartbeat(GroupMetadataManager.java:2254)
	at org.apache.kafka.coordinator.group.GroupCoordinatorShard.consumerGroupHeartbeat(GroupCoordinatorShard.java:308)
	at org.apache.kafka.coordinator.group.GroupCoordinatorService.lambda$consumerGroupHeartbeat$0(GroupCoordinatorService.java:298)
	at org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime$CoordinatorWriteEvent.lambda$run$0(CoordinatorRuntime.java:769)
	at org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime.withActiveContextOrThrow(CoordinatorRuntime.java:1582)
	at org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime.access$1400(CoordinatorRuntime.java:96)
	at org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime$CoordinatorWriteEvent.run(CoordinatorRuntime.java:767)
	at org.apache.kafka.coordinator.group.runtime.MultiThreadedEventProcessor$EventProcessorThread.handleEvents(MultiThreadedEventProcessor.java:144)
	at org.apache.kafka.coordinator.group.runtime.MultiThreadedEventProcessor$EventProcessorThread.run(MultiThreadedEventProcessor.java:176) 
```

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Justine Olshan <jolshan@confluent.io>
2024-06-11 11:43:56 -07:00
David Jacot 049cfeac02
MINOR: Rename uniform assignor's internal builders (#16233)
This patch renames the uniform assignor's builders to match the `SubscriptionType` which is used to determine which one is called. It removes the abstract class `AbstractUniformAssignmentBuilder` which is not necessary anymore. It also applies minor refactoring.

Reviewers: Ritika Reddy <rreddy@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-10 05:26:56 -07:00
David Jacot 7d832cf74f
KAFKA-14701; Move `PartitionAssignor` to new `group-coordinator-api` module (#16198)
This patch moves the `PartitionAssignor` interface and all the related classes to a newly created `group-coordinator/api` module, following the pattern used by the storage and tools modules.

Reviewers: Ritika Reddy <rreddy@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-06 12:19:20 -07:00
Dongnuo Lyu 7ddfa64759
MINOR: Adjust validateOffsetCommit/Fetch in ConsumerGroup to ensure compatibility with classic protocol members (#16145)
During online migration, there could be ConsumerGroup that has members that uses the classic protocol. In the current implementation, `STALE_MEMBER_EPOCH` could be thrown in ConsumerGroup offset fetch/commit validation but it's not supported by the classic protocol. Thus this patch changed `ConsumerGroup#validateOffsetCommit` and `ConsumerGroup#validateOffsetFetch` to ensure compatibility.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>
2024-06-04 23:08:38 -07:00
Ritika Reddy 078dd9a311
KAFKA-16821; Member Subscription Spec Interface (#16068)
This patch reworks the `PartitionAssignor` interface to use interfaces instead of POJOs. It mainly introduces the `MemberSubscriptionSpec` interface that represents a member subscription and changes the `GroupSpec` interfaces to expose the subscriptions and the assignments via different methods.

The patch does not change the performance.

before:
```
Benchmark                                     (memberCount)  (partitionsToMemberRatio)  (topicCount)  Mode  Cnt  Score   Error  Units
TargetAssignmentBuilderBenchmark.build                10000                         10           100  avgt    5  3.462 ± 0.687  ms/op
TargetAssignmentBuilderBenchmark.build                10000                         10          1000  avgt    5  3.626 ± 0.412  ms/op
JMH benchmarks done
```

after:
```
Benchmark                                     (memberCount)  (partitionsToMemberRatio)  (topicCount)  Mode  Cnt  Score   Error  Units
TargetAssignmentBuilderBenchmark.build                10000                         10           100  avgt    5  3.677 ± 0.683  ms/op
TargetAssignmentBuilderBenchmark.build                10000                         10          1000  avgt    5  3.991 ± 0.065  ms/op
JMH benchmarks done
```

Reviewers: David Jacot <djacot@confluent.io>
2024-06-04 06:44:37 -07:00
David Jacot 7d82f7625e
MINOR: Log time taken to compute the target assignment (#16185)
The time taken to compute a new assignment is critical. This patches extending the existing logging to log it too. This is very useful information to have.

Reviewers: Luke Chen <showuon@gmail.com>
2024-06-04 06:38:56 -07:00
Jeff Kim d7bc43ed06
KAFKA-16664; Re-add EventAccumulator.poll(long, TimeUnit) (#16144)
We have revamped the thread idle ratio metric in https://github.com/apache/kafka/pull/15835. https://github.com/apache/kafka/pull/15835#discussion_r1588068337 describes a case where the metric loses accuracy and in order to set a lower bound to the accuracy, this patch re-adds a poll with a timeout that was removed as part of https://github.com/apache/kafka/pull/15430.

Reviewers: David Jacot <djacot@confluent.io>
2024-06-03 23:27:35 -07:00
TingIāu "Ting" Kì 7973aa6a39
KAFKA-16861: Don't convert to group to classic if the size is larger than group max size. (#16163)
Fix the bug where the group downgrade to a classic one when a member leaves, even though the consumer group size is still larger than `classicGroupMaxSize`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
2024-06-03 11:36:07 -07:00
David Jacot 979f8d9aa3
MINOR: Small refactor in TargetAssignmentBuilder (#16174)
This patch is a small refactoring which mainly aims at avoid to construct a copy of the new target assignment in the TargetAssignmentBuilder because the copy is not used by the caller. The change relies on the exiting tests and it does not really have an impact on performance (e.g. validated with TargetAssignmentBuilderBenchmark).

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-03 11:32:39 -07:00
David Jacot fb566e48bf
KAFKA-16864; Optimize uniform (homogenous) assignor (#16088)
This patch optimizes uniform (homogenous) assignor by avoiding creating a copy of all the assignments. Instead, the assignor creates a copy only if the assignment is updated. It is a sort of copy-on-write. This change reduces the overhead of the TargetAssignmentBuilder when ran with the uniform (homogenous) assignor.

Trunk:

```
Benchmark                                     (memberCount)  (partitionsToMemberRatio)  (topicCount)  Mode  Cnt   Score   Error  Units
TargetAssignmentBuilderBenchmark.build                10000                         10           100  avgt    5  24.535 ± 1.583  ms/op
TargetAssignmentBuilderBenchmark.build                10000                         10          1000  avgt    5  24.094 ± 0.223  ms/op
JMH benchmarks done
```

```
Benchmark                                       (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionType)  (topicCount)  Mode  Cnt   Score   Error  Units
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS           100  avgt    5  14.697 ± 0.133  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS          1000  avgt    5  15.073 ± 0.135  ms/op
JMH benchmarks done
```

Patch:

```
Benchmark                                     (memberCount)  (partitionsToMemberRatio)  (topicCount)  Mode  Cnt  Score   Error  Units
TargetAssignmentBuilderBenchmark.build                10000                         10           100  avgt    5  3.376 ± 0.577  ms/op
TargetAssignmentBuilderBenchmark.build                10000                         10          1000  avgt    5  3.731 ± 0.359  ms/op
JMH benchmarks done
```

```
Benchmark                                       (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionType)  (topicCount)  Mode  Cnt  Score   Error  Units
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS           100  avgt    5  1.975 ± 0.086  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS          1000  avgt    5  2.026 ± 0.190  ms/op
JMH benchmarks done
```

Reviewers: Ritika Reddy <rreddy@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-05-31 13:17:59 -07:00
Andrew Schofield 2d9994e0de
KAFKA-16722: Introduce ConsumerGroupPartitionAssignor interface (#15998)
KIP-932 introduces share groups to go alongside consumer groups. Both kinds of group use server-side assignors but it is unlikely that a single assignor class would be suitable for both. As a result, the KIP introduces specific interfaces for consumer group and share group partition assignors.

This PR introduces only the consumer group interface, `o.a.k.coordinator.group.assignor.ConsumerGroupPartitionAssignor`. The share group interface will come in a later release. The existing implementations of the general `PartitionAssignor` interface have been changed to implement `ConsumerGroupPartitionAssignor` instead and all other code changes are just propagating the change throughout the codebase.

Note that the code in the group coordinator that actually calculates assignments uses the general `PartitionAssignor` interface so that it can be used with both kinds of group, even though the assignors themselves are specific.

Reviewers: Apoorv Mittal <amittal@confluent.io>, David Jacot <djacot@confluent.io>
2024-05-29 08:31:52 -07:00
Dongnuo Lyu eefd114c4a
KAFKA-16832; LeaveGroup API for upgrading ConsumerGroup (#16057)
This patch implements the LeaveGroup API to the consumer groups that are in the mixed mode.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>
2024-05-28 23:21:30 -07:00
Ritika Reddy a8d166c00e
KAFKA-16625; Reverse lookup map from topic partitions to members (#15974)
This patch speeds up the computation of the unassigned partitions by exposing the inverted target assignment. It allows the assignor to check whether a partition is assigned or not.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>
2024-05-25 09:06:15 -07:00
Jeff Kim d585a494a4
KAFKA-16831: CoordinatorRuntime should initialize MemoryRecordsBuilder with max batch size write limit (#16059)
CoordinatorRuntime should initialize MemoryRecordsBuilder with max batch size write limit. Otherwise, we default the write limit to the min buffer size of 16384 for the write limit. This causes the coordinator to threw RecordTooLargeException even when it's under the 1MB max batch size limit.

Reviewers: David Jacot <djacot@confluent.io>
2024-05-24 13:33:57 -07:00
Jeff Kim 520aa8665c
KAFKA-16626; Lazily convert subscribed topic names to topic ids (#15970)
This patch aims to remove the data structure that stores the conversion from topic names to topic ids which was taking time similar to the actual assignment computation. Instead, we reuse the already existing ConsumerGroupMember.subscribedTopicNames() and do the conversion to topic ids when the iterator is requested.

Reviewers: David Jacot <djacot@confluent.io>
2024-05-24 00:51:50 -07:00
Dongnuo Lyu 14b5c4d1e8
KAFKA-16793; Heartbeat API for upgrading ConsumerGroup (#15988)
This patch implements the heartbeat api to the members that use the classic protocol in a ConsumerGroup.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>
2024-05-22 23:27:00 -07:00
Jeff Kim e692feed34
MINOR: fix flaky testRecordThreadIdleRatio (#15987)
DelayEventAccumulator should return immediately if there are no events in the queue. Also removed some unused fields inside EventProcessorThread.

Reviewers: Gaurav Narula <gaurav_narula2@apple.com>, Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
2024-05-22 23:24:23 -07:00
Mickael Maison affe8da54c
KAFKA-7632: Support Compression Levels (KIP-390) (#15516)
Reviewers: Jun Rao <jun@confluent.io>,  Luke Chen <showuon@gmail.com>
Co-authored-by: Lee Dongjin <dongjin@apache.org>
2024-05-21 17:58:49 +02:00
David Jacot b4c2d66801
KAFKA-16770; [1/N] Coalesce records into bigger batches (#15964)
We have discovered during large scale performance tests that the current write path of the new coordinator does not scale well. The issue is that each write operation writes synchronously from the coordinator threads. Coalescing records into bigger batches helps drastically because it amortizes the cost of writes. Aligning the batches with the snapshots of the timelines data structures also reduces the number of in-flight snapshots.

This patch is the first of a series of patches that will bring records coalescing into the coordinator runtime. As a first step, we had to rework the PartitionWriter interface and move the logic to build MemoryRecords from it to the CoordinatorRuntime. The main changes are in these two classes. The others are related mechanical changes.

Reviewers: Justine Olshan <jolshan@confluent.io>
2024-05-20 23:47:09 -07:00
David Jacot 5b34574e86
MINOR: Refactor write timeout in CoordinatorRuntime (#15976)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-18 01:47:44 +08:00
Dongnuo Lyu b8c96389b4
KAFKA-16762: SyncGroup API for upgrading ConsumerGroup (#15954)
This patch implements the sync group api for the consumer groups that are in the mixed mode. In classicGroupSyncToConsumerGroup, the assignedPartitions calculated in the JoinGroup will be returned as the assignment in the sync response and the member session timeout will be rescheduled.

Reviewers: David Jacot <djacot@confluent.io>
2024-05-17 07:12:40 -07:00
David Jacot ffb31e172a
MINOR: Remove usage of Stream API in CoordinatorRecordHelpers (#15969)
This patch removes the usage of the Stream API in CoordinatorRecordHelpers. I saw it in a couple of profiles so it is better to remove it.

Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-15 23:13:55 -07:00
David Jacot bf88013a28
MINOR: Rename `Record` to `CoordinatorRecord` (#15949)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-15 13:57:19 +08:00
Dongnuo Lyu 0e023e1f73
MINOR: Add classic member session timeout to ClassicMemberMetadata (#15921)
The heartbeat api to the consumer group with classic protocol members schedules the session timeout. At present, there's no way to get the classic member session timeout in heartbeat to consumer group.

This patch stores the session timeout into the ClassicMemberMetadata in ConsumerGroupMemberMetadataValue and update it when it's provided in the join request.

Reviewers: David Jacot <djacot@confluent.io>
2024-05-14 20:41:20 +02:00
Jeff Kim df5735dda5
MINOR: fix flaky testRecordThreadIdleRatioTwoThreads test (#15937)
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-14 23:20:36 +08:00
Ritika Reddy ccd83cafea
KAFKA-16694; Remove Rack Awareness Code from the Server Side Assignors (#15903)
Reviewers: David Jacot <djacot@confluent.io>
2024-05-14 00:13:35 -07:00
David Jacot f9169b7d3a
KAFKA-16735; Deprecate offsets.commit.required.acks (#15931)
This patch deprecates `offsets.commit.required.acks` in Apache Kafka 3.8 as described in KIP-1041: https://cwiki.apache.org/confluence/x/9YobEg.

Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-13 11:30:34 -07:00
Ritika Reddy ee16eee5de
KAFKA-16587: Add subscription model information to group state (#15785)
This patch introduces the SubscriptionType to the group state and passes it along to the partition assignor. A group is "homogeneous" when all the members are subscribed to the same topics; or it is "heterogeneous" otherwise. This mainly helps the uniform assignor because it does not have to re-compute this information to determine which algorithm to use.

trunk:
Benchmark                                       (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionModel)  (topicCount)  Mode  Cnt    Score    Error  Units
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false            100                         10          HOMOGENEOUS           100  avgt    5    0.136 ±  0.001  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false            100                         10          HOMOGENEOUS          1000  avgt    5    0.198 ±  0.002  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false           1000                         10          HOMOGENEOUS           100  avgt    5    1.767 ±  0.138  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false           1000                         10          HOMOGENEOUS          1000  avgt    5    1.540 ±  0.020  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10          HOMOGENEOUS           100  avgt    5   32.419 ±  7.173  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10          HOMOGENEOUS          1000  avgt    5   26.731 ±  1.985  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false            100                         10          HOMOGENEOUS           100  avgt    5    0.242 ±  0.006  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false            100                         10          HOMOGENEOUS          1000  avgt    5    1.002 ±  0.006  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false           1000                         10          HOMOGENEOUS           100  avgt    5    2.544 ±  0.168  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false           1000                         10          HOMOGENEOUS          1000  avgt    5   10.749 ±  0.207  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10          HOMOGENEOUS           100  avgt    5   26.832 ±  0.154  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10          HOMOGENEOUS          1000  avgt    5  106.209 ±  0.301  ms/op
JMH benchmarks done

patch:
Benchmark                                       (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionType)  (topicCount)  Mode  Cnt   Score   Error  Units
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false            100                         10         HOMOGENEOUS           100  avgt    5   0.131 ± 0.001  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false            100                         10         HOMOGENEOUS          1000  avgt    5   0.185 ± 0.004  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false           1000                         10         HOMOGENEOUS           100  avgt    5   1.943 ± 0.091  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false           1000                         10         HOMOGENEOUS          1000  avgt    5   1.450 ± 0.139  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10         HOMOGENEOUS           100  avgt    5  30.803 ± 2.644  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10         HOMOGENEOUS          1000  avgt    5  24.251 ± 1.230  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false            100                         10         HOMOGENEOUS           100  avgt    5   0.155 ± 0.004  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false            100                         10         HOMOGENEOUS          1000  avgt    5   0.235 ± 0.010  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false           1000                         10         HOMOGENEOUS           100  avgt    5   1.602 ± 0.046  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false           1000                         10         HOMOGENEOUS          1000  avgt    5   1.901 ± 0.174  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS           100  avgt    5  16.098 ± 1.905  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS          1000  avgt    5  17.681 ± 0.174  ms/op
JMH benchmarks done

Reviewers: David Jacot <djacot@confluent.io>
2024-05-13 02:19:05 -07:00
Jeff Kim 8a9dd2beda
KAFKA-16663; Cancel write timeout TimerTask on successful event completion (#15902)
Write events create and add a TimerTask to schedule the timeout operation. The issue is that we pile up the number of timer tasks which are essentially no-ops if replication was successful. They stay in memory for 15 seconds (default write timeout) and as the rate of write increases, the impact on memory usage increases.

Instead, cancel the corresponding write timeout task when the write event is committed to the log. This also applies to complete transaction events.

Reviewers: David Jacot <djacot@confluent.io>
2024-05-13 00:18:32 -07:00
Jeff Kim 21bf715622
KAFKA-16307; Fix coordinator thread idle ratio (#15835)
This PR fixes the thread idle ratio. We take a similar approach to the kafka request handler idle ratio: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaRequestHandler.scala#L108-L117

Instead of calculating the actual ratio per thread, we record the time each thread stays idle while waiting for a new event, divided by the number of threads as an approximation.

Reviewers: David Jacot <djacot@confluent.io>
2024-05-07 06:21:09 -07:00
Dongnuo Lyu 459eaec666
KAFKA-16615; JoinGroup API for upgrading ConsumerGroup (#15798)
The patch implements JoinGroup API for the new consumer groups. It allow members using the classic rebalance protocol with the consumer embedded protocol to join a new consumer group.

Reviewers: David Jacot <djacot@confluent.io>
2024-05-06 23:59:10 -07:00
David Jacot 42754336e1
MINOR: Remove `ConsumerGroupPartitionMetadataValue.Epoch` field (#15854)
ConsumerGroupPartitionMetadataValue.Epoch is not used anywhere so we can remove it. Note that we already have non-backward compatible changes lined up for 3.8 so it is fine to do it.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-06 05:02:39 -07:00
Chia Chuan Yu 55a00be4e9
MINOR: Replaced Utils.join() with JDK API. (#15823)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-06 15:13:01 +08:00
David Jacot 2c0b8b6920
MINOR: ConsumerGroup#getOrMaybeCreateMember should not add the member to the group (#15847)
While reviewing https://github.com/apache/kafka/pull/15785, I noticed that the member is added to the group directly in `ConsumerGroup#getOrMaybeCreateMember`. This does not hurt but confuses people because the state must not be mutated at this point. It should only be mutated when records are replayed. I think that it is better to remove it in order to make it clear.

Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-03 06:24:26 -07:00
Dongnuo Lyu 1e8415160f
MINOR: Add replayRecords to CoordinatorResult (#15818)
The patch adds a boolean attribute `replayRecords` that specifies whether the records should be replayed.

Reviewers: David Jacot <djacot@confluent.io>
2024-04-30 09:14:02 -07:00
Dongnuo Lyu 994077e43e
MINOR: Fix the flaky testConsumerGroupHeartbeatWithStableClassicGroup by sorting the topic partition list (#15816)
We are seeing flaky test in `testConsumerGroupHeartbeatWithStableClassicGroup` where the error is caused by the different ordering in the expected and actual values. The patch sorts the topic partition list in the records to fix the issue.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Igor Soarez <soarez@apple.com>, David Jacot <djacot@confluent.io>
2024-04-29 00:43:49 -07:00
Dongnuo Lyu dc1d8fc330
KAFKA-16554: Online downgrade triggering and group type conversion (#15721)
Online downgrade from a consumer group to a classic group is triggered when the last consumer that uses the consumer protocol leaves the group. A rebalance is manually triggered after the group conversion. This patch adds consumer group downgrade validation and conversion.

Reviewers: David Jacot <djacot@confluent.io>
2024-04-25 07:44:25 -07:00
Omnia Ibrahim 363f4d2847
KAFKA-15853 Move consumer group and group coordinator configs out of core (#15684)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-17 20:41:22 +08:00
Dongnuo Lyu a9f65a5d7f
KAFKA-16436; Online upgrade triggering and group type conversion (#15662)
This patch introduces the conversion from a classic group to a consumer group when a member joins with the new consumer group protocol (epoch is 0) but only if the conversion is enabled.

Reviewers: David Jacot <djacot@confluent.io>
2024-04-17 04:57:44 -07:00
Dongnuo Lyu 619f27015f
KAFKA-16294: Add group protocol migration enabling config (#15411)
This patch adds the `group.consumer.migration.policy` config which controls how consumer groups can be converted from classic group to consumer group and vice versa. The config is kept as an internal one while we develop the feature.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>
2024-04-10 10:59:26 -07:00
Dongnuo Lyu 9bc48af1c1
MINOR: Add type check to classic group timeout operations (#15587)
When implementing the group type conversion from a classic group to a consumer group, if the replay of conversion records fails, the group should be reverted back including its timeouts.

A possible solution is to keep all the classic group timeouts and add a type check to the timeout operations. If the group is successfully upgraded, it won't be able to pass the type check and its operations will be executed without actually doing anything; if the group upgrade fails, the group map will be reverted and the timeout operations will be executed as is.

We've already have group type check in consumer group timeout operations. This patch adds similar type check to those classic group timeout operations.

Reviewers: David Jacot <djacot@confluent.io>
2024-04-10 00:36:49 -07:00
Erik van Oosten 8e61f04228
MINOR: Fix usage of none in javadoc (#15674)
- Use `Empty` instead of 'none' when referring to `Optional` values.
- `Headers.lastHeader` returns `null` when no header is found.
- Fix minor spelling mistakes.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-08 08:43:05 +08:00
Jeff Kim b3116f4f76
KAFKA-16148: Implement GroupMetadataManager#onUnloaded (#15446)
This patch completes all awaiting futures when a group is unloaded.

Reviewers: David Jacot <djacot@confluent.io>
2024-04-02 03:16:02 -07:00
Sean Quah ad960635a9
KAFKA-16386: Convert NETWORK_EXCEPTIONs from KIP-890 transaction verification (#15559)
KIP-890 Part 1 introduced verification of transactions with the
transaction coordinator on the `Produce` and `TxnOffsetCommit` paths.
This introduced the possibility of new errors when responding to those
requests. For backwards compatibility with older clients, a choice was
made to convert some of the new retriable errors to existing errors that
are expected and retried correctly by older clients.

`NETWORK_EXCEPTION` was forgotten about and not converted, but can occur
if, for example, the transaction coordinator is temporarily refusing
connections. Now, we convert it to:
 * `NOT_ENOUGH_REPLICAS` on the `Produce` path, just like the other
   retriable errors that can arise from transaction verification.
 * `COORDINATOR_LOAD_IN_PROGRESS` on the `TxnOffsetCommit` path. This
   error does not force coordinator lookup on clients, unlike
   `COORDINATOR_NOT_AVAILABLE`. Note that this deviates from KIP-890,
   which says that retriable errors should be converted to
   `COORDINATOR_NOT_AVAILABLE`.

Reviewers: Artem Livshits <alivshits@confluent.io>, David Jacot <djacot@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-03-25 16:08:23 -07:00