* Add MetadataHash field to ConsumerGroupMetadataValue,
ShareGroupMetadataValue, and StreamGroupMetadataValue.
* Add metadataHash field to
GroupCoordinatorRecordHelpers#newConsumerGroupEpochRecord,
GroupCoordinatorRecordHelpers#newShareGroupEpochRecord, and
StreamsCoordinatorRecordHelpers#newStreamsGroupEpochRecord.
* Add deprecated message to ConsumerGroupPartitionMetadataKey and
ConsumerGroupPartitionMetadataValue.
* ShareGroupPartitionMetadataKey / ShareGroupPartitionMetadataValue /
StreamGroupPartitionMetadataKey / StreamGroupPartitionMetadataValue will
be removed in next PR.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, David Jacot <djacot@confluent.io>
---------
Signed-off-by: PoAn Yang <payang@apache.org>
If the streams rebalance protocol is enabled in
StreamsUncaughtExceptionHandlerIntegrationTest, the streams application
does not shut down correctly upon error.
There are two causes for this. First, sometimes, the SHUTDOWN_APPLICATION
code only sent with the leave heartbeat, but that is not handled broker
side. Second, the SHUTDOWN_APPLICATION code wasn't properly handled
client-side at all.
Reviewers: Bruno Cadonna <cadonna@apache.org>, Bill Bejeck
<bill@confluent.io>, PoAn Yang <payang@apache.org>
This PR just resolves an NPE when a topic assigned in a share group is
deleted. The NPE is caused by code which uses the current metadata image
to convert from a topic ID to the topic name. For a deleted topic, there
is no longer any entry in the image. A future PR will properly handle
the topic deletion.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, PoAn Yang <payang@apache.org>
If a streams, share or consumer group is described, all group IDs sent
to all shards of the group coordinator. This change fixes it. It tested
in the unit tests, since it's somewhat inconvenient to test the passed
read operation lambda.
Reviewers: David Jacot <djacot@confluent.io>, Andrew Schofield
<aschofield@confluent.io>
This patch does a few code changes:
* It cleans up the GroupCoordinatorService;
* It moves the helper methods to validate request to Utils;
* It moves the helper methods to create the assignment for the
ConsumerGroupHeartbeatResponse and the ShareGroupHeartbeatResponse from
the GroupMetadataManager to the respective classes.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jeff Kim <jeff.kim@confluent.io>
The heartbeat logic for share groups is tricky when the set of
topic-partitions eligible for assignment changes. We have observed epoch
mismatches when brokers are restarted, which should not be possible.
Improving the logging so we can see the previous member epoch and tally
this with the logged state.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Sushant Mahajan <smahajan@confluent.io>
Change the log messages which used to warn that KIP-932 was an Early
Access feature to say that it is now a Preview feature. This will make
the broker logs far less noisy when share groups are enabled.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
This PR removes the group.share.max.groups config. This config was used
to calculate the maximum size of share session cache. But with the new
config group.share.max.share.sessions in place with exactly this
purpose, the ShareSessionCache initialization has also been passed the
new config.
Refer: [KAFKA-19156](https://issues.apache.org/jira/browse/KAFKA-19156)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
This patch extends the OffsetCommit API to support topic ids. From
version 10 of the API, topic ids must be used. Originally, we wanted to
support both using topic ids and topic names from version 10 but it
turns out that it makes everything more complicated. Hence we propose to
only support topic ids from version 10. Clients which only support using
topic names can either lookup the topic ids using the Metadata API or
stay on using an earlier version.
The patch only contains the server side changes and it keeps the version
10 as unstable for now. We will mark the version as stable when the
client side changes are merged in.
Reviewers: Lianet Magrans <lmagrans@confluent.io>, PoAn Yang <payang@apache.org>
This patch updates the `GroupCoordinator` interface to use
`AuthorizableRequestContext` instead of using `RequestContext`. It makes
the interface more generic. The only downside is that the request
version in `AuthorizableRequestContext` is an `int` instead of a `short`
so we had to adapt it in a few places. We opted for using `int` directly
wherever possible.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
This also adds metrics to StandardAuthorizer
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ken Huang
<s7133700@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TaiJuWu
<tjwu1217@gmail.com>
* When deleting share groups, only initialized and initializing
information in the metadata record is considered. However, it could
happen that the deleting topics also contains info due to other RPCs
(share group offsets delete).
* We need to account for existing information while writing the metadata
record in the delete flow.
* This PR aims to add the impl for the same. New tests have been added
to check the functionality.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Update the `group.coordinator.append.linger.ms` document, alterting
users to the potentail consequences of increasing value, It should
explain that increasing that value would improves write efficiency but
also increases response latency for requests.
Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>, PoAn Yang <payang@apache.org>
Currently, in the deleteShareGroupOffsets method in
GroupCoordinatorService, the user request was simply forwarded to the
persister without checking if the requested share partitions were
initialized for the group or not. This PR introduces such a check to
make sure that the persister deleteState request is only called for
share partitions that have been initialized for the group.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Sushant Mahajan <smahajan@confluent.io>
Remove `OffsetConfig` which was used by old coordinator.
Reviewers: PoAn Yang <payang@apache.org>, Ken Huang
<s7133700@gmail.com>, David Jacot <djacot@confluent.io>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Enable KIP-1071 parameter in `StandbyTaskCreationIntegrationTest`.
Required a fix: In `ChangelogTopic.setup`, we actually need to return
both the source-topic (optimized) and the non-source-topic changelog
topics, since otherwise we will not find the partition number later on.
Extended `EmbeddedKafkaCluster` to set the number of standby replicas
dynamically for the group. We need to initialize it to one for the
integration test to go through.
Reviewers: Bill Bejeck <bbejeck@apache.org>
* Currently, the delete share group code flow uses
`group.subscribedTopicNames()` to fetch information about all the share
partitions to which a share group is subscribed to.
* However, this is incorrect since once the group is EMPTY, a
precondition for delete, the aforementioned method will return an empty
list.
* In this PR we have leveraged the `ShareGroupStatePartitionMetadata`
record to grab the `initialized` and `initializing` partitions to build
the delete candidates, which remedies the situation.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Add a note to the group protocol configuration that streams groups are
in early access and should not be used in production.
Also update an outdated comment related to disabling the protocol.
Reviewers: Bruno Cadonna <cadonna@apache.org>
This PR adds the share group dynamic config `share.isolation.level`.
Until now, share groups only supported `READ_UNCOMMITTED` isolation
level type. With this PR, we aim to support `READ_COMMITTED` isolation
type to share groups.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
This PR contains the implementation of KafkaAdminClient and
GroupCoordinator for DeleteShareGroupOffsets RPC.
- Added `deleteShareGroupOffsets` to `KafkaAdminClient`
- Added implementation for `handleDeleteShareGroupOffsetsRequest` in
`KafkaApis.scala`
- Added `deleteShareGroupOffsets` to `GroupCoordinator` as well.
internally this makes use of `persister.deleteState` to persist the
changes in share coordinator
Reviewers: Andrew Schofield <aschofield@confluent.io>, Sushant Mahajan <smahajan@confluent.io>
When a member indicates that the application should shut down, set a
soft-state flag on the streams group and continuously set the status
`SHUTDOWN_APPLICATION` to all members, until the group is empty, which
resets the flag.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>, Jeff Kim <jeff.kim@confluent.io>
Implements the use of session timeout, standby tasks and heartbeat
interval
configurations in the streams group heartbeat.
Piggy-backed is another test that streams groups react to changes
in the topic metadata.
Reviewers: Bill Bejeck <bbejeck@apache.org>, Bruno Cadonna
<cadonna@apache.org>, Matthias J. Sax <matthias@confluent.io>
With the Streams rebalance protocol, when the partition count
for changelog topics are computed the topic manager only
considers external source topics and throws if there are no
source topics. However, subtopologies are allowed to only
have repartition topics as source topics.
This commit also considers repartition source topics when
computing the partition count for changelog topics.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
This PR approaches completion of Admin.listShareGroupOffsets() and
kafka-share-groups.sh --describe --offsets.
Prior to this patch, kafka-share-groups.sh was only able to describe the
offsets for partitions which were assigned to active members. Now, the
Admin.listShareGroupOffsets() uses the persister's knowledge of the
share-partitions which have initialised state. Then, it uses this list
to obtain a complete set of offset information.
The PR also implements the topic-based authorisation checking. If
Admin.listShareGroupOffsets() is called with a list of topic-partitions
specified, the authz checking is performed on the supplied list,
returning errors for any topics to which the client is not authorised.
If Admin.listShareGroupOffsets() is called without a list of
topic-partitions specified, the list of topics is discovered from the
persister as described above, and then the response is filtered down to
only show the topics to which the client is authorised. This is
consistent with other similar RPCs in the Kafka protocol, such as
OffsetFetch.
Reviewers: David Arthur <mumrah@gmail.com>, Sushant Mahajan <smahajan@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>
This PR includes `share.session.timeout.ms` and
`share.heartbeat.interval.ms` in the `extractShareGroupConfigMap` method
in `ShareGroupConfig`. With this change, the default value of
`share.session.timeout.ms` and `share.heartbeat.interval.ms` for every
group will be set as the value of the static configs
`group.share.session.timeout.ms` and `group.share.heartbeat.interval.ms`
respectively
Reviewers: Andrew Schofield <aschofield@confluent.io>
* Currently, when we get a heartbeat with new share partition
subscriptions, we return an initialize request to the caller which the
caller executes asynchronously as a timer task.
* Meanwhile, if a new heartbeat request comes with same or null
subscription - the same initialize request will be returned since the
`GroupMetadataManager` has no idea about the older in flight request.
* In this PR, we have added a new field to the
`ShareGroupStatePartitionMetadata` record `initializingTopics` where
this information can be recorded in the GMM. Consequently, the
subsequent heartbeats can check this field and not return duplicate
initialize requests.
* If any errors are encountered while initializing by the
`GroupCoordinatorService` an additional method
`uninitializeShareGroupState` has been added which will remove the
requisite info from the `initializingFields`.
* New tests have been added wherever applicable and older ones updated.
Reviewers: Andrew Schofield <aschofield@confluent.io>
This PR makes the required changes for retaining a share group forever
post creation. A new field `shouldExpire` is added to all types
implementing the interface `Group`. It returns false only in case of
`ShareGroup` and true otherwise. A check is also added in
`cleanupGroupMetadata` in `GroupCoordinatorShard` and it proceeds with
group expiration only if `shouldExpire` returns true
Reviewers: Andrew Schofield <aschofield@confluent.io>, PoAn Yang <payang@apache.org>
PR add `MaxRecords` to share fetch request and also adds
`AcquisitionLockTimeout` to share fetch response. PR also removes
internal broker config of `max.fetch.records`.
Reviewers: Andrew Schofield <aschofield@confluent.io>
In the GroupMetadataManager, we may call an RPC on an incorrect group
type. This adds unit tests to validate the behavior when an RPC is used
on an incorrect group type.
Reviewers: Bill Bejeck <bill@confluent.io>
Tests for joining with missing source topics,
internal topics, incorrectly partitioned topics,
and stale topologies.
Reviewers: Bill Bejeck <bill@confluent.io>
Basic streams group heartbeat handling. The main part of are the unit
tests that make sure that we behave, for the most part, like a consumer
group.
- No support for static membership
- No support for configurations (using constants instead)
- No support for regular expressions
Reviewers: Bill Bejeck <bill@confluent.io>, Bruno Cadonna
<cadonna@apache.org>
* This PR adds impl for the initialize share groups call from the Group
Coordinator perspective.
* The initialize call on persister instance will be invoked by the
`GroupCoordinatorService`, based on the response of the
`GroupCoordinatorShard.shareGroupHeartbeat`. If there is new topic
subscription or member assignment change (topic paritions incremented),
the delta share partitions corresponding to the share group in question
are returned as an optional initialize request.
* The request is then sent to the share coordinator as an encapsulated
timer task because we want the heartbeat response to go asynchronously.
* Tests have been added for `GroupCoordinatorService` and
`GroupMetadataManager`. Existing tests have also been updated.
* A new formatter `ShareGroupStatePartitionMetadataFormatter` has been
added for debugging.
Reviewers: Andrew Schofield <aschofield@confluent.io>
jira: https://issues.apache.org/jira/browse/KAFKA-18980
We should use the number of `records` rather than topic partitions,but
this is not a bug as the number of `records` should be equal to the
number of topic partitions.
Also, we previously used `expiredPartitions` for logging, but now it is
not being used anymore, so we can remove it.
Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
This patch is the third of a series of patches to remove the old group
coordinator. With the release of Apache Kafka 4.0, the so-called new
group coordinator is the default and only option available now.
It removes the old group coordinator and cleans up the
`GroupCoordinator` interface.
Reviewers: Jeff Kim <jeff.kim@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
This patch is the second of a series of patches to remove the old group
coordinator. With the release of Apache Kafka 4.0, the so-called new
group coordinator is the default and only option available now.
The patch removes `group.coordinator.new.enable` (internal config) and
all its usages (integration tests, unit tests, etc.). It also cleans up
`KafkaApis` to remove logic only used by the old group coordinator.
Reviewers: Jeff Kim <jeff.kim@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
This change implements the broker-side configs proposed in KIP-1071.
The configurations implemented by this PR are only those that were specifically aimed to be included in `AK 4.1`.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
The current homogeneous SimpleAssignor for share groups is not very good
at revoking partitions which have previously been assigned when the
number of members increases. This PR improves the situation.
It also fixes the sorting of assignments in `kafka-consumer-groups.sh`
and `kafka-share-groups.sh` so that it sorts partition indices
numerically instead of alphabetically. It also adds the missing number
of partitions column for share groups.
This PR adds `scheduleShareGroupSessionTimeout` for all the persisted
members of a share group when the group coordinator is loaded.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Recently, we found a regression that could have been detected by static
analysis, since a local variable wasn't being passed to a method during
a refactoring, and was left unused. It was fixed in
[7a749b5](7a749b589f),
but almost slipped into 4.0. Unused variables are typically detected by
IDEs, but this is insufficient to prevent these kinds of bugs. This
change enables unused local variable detection in checkstyle for Kafka.
A few notes on the usage:
- There are two situations in which people actually want to have a local
variable but not use it. First, there are `for (Type ignored:
collection)` loops which have to loop `collection.length` number of
times, but that do not use `ignored` in the loop body. These are
typically still easier to read than a classical `for` loop. Second, some
IDEs detect it if a return value of a function such as `File.delete` is
not being used. In this case, people sometimes store the result in an
unused local variable to make ignoring the return value explicit and to
avoid the squiggly lines.
- In Java 22, unsued local variables can be omitted by using a single
underscore `_`. This is supported by checkstyle. In pre-22 versions,
IntelliJ allows such variables to be named `ignored` to suppress the
unused local variable warning. This pattern is often (but not
consistently) used in the Kafka codebase. This is, however, not
supported by checkstyle.
Since we cannot switch to Java 22, yet, and we want to use automated
detection using checkstyle, we have to resort to prefixing the unused
local variables with `@SuppressWarnings("UnusedLocalVariable")`. We have
to apply this in 11 cases across the Kafka codebase. While not being
pretty, I'd argue it's worth it to prevent bugs like the one fixed in
[7a749b5](7a749b589f).
Reviewers: Andrew Schofield <aschofield@confluent.io>, David Arthur
<mumrah@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Bruno
Cadonna <cadonna@apache.org>, Kirk True <ktrue@confluent.io>
We defined multiple `ConfigDef`s in `GroupCoordinatorConfig` in then we
merge them in a few places because we always use them together. Having
multiple `ConfigDef`s does not seem necessary to me. This patch changes
it to have just one.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
The code in GroupMetadataManager to request metadata refresh got pretty
ugly with the addition of share and stream groups. It seems preferable
to put the method in the base class.
Reviewers: Andrew Schofield <aschofield@confluent.io>
When regular expressions are resolved, they do not update the group by
topics data structure. Hence, topic changes (e.g. deletion) do not
trigger a rebalance of the group.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
The new group coordinator prints the following line at fixed interval
even if no groups were deleted:
```
Generated 0 tombstone records while cleaning up group metadata in 0 milliseconds. (org.apache.kafka.coordinator.group.GroupCoordinatorShard)
```
The time component has some value in its own but it may be better to not
print if when there are not records in order to reduce the spam in the
logs.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Given that now we support Java 17 on our brokers, this PR replace the
use of :
- `Collections.singletonList()` and `Collections.emptyList()` with
`List.of()`
- `Collections.singletonMap()` and `Collections.emptyMap()` with
`Map.of()`
- `Collections.singleton()` and `Collections.emptySet()` with `Set.of()`
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This patch filters out the topic describe unauthorized topics from the
ConsumerGroupHeartbeat and ConsumerGroupDescribe response.
In ConsumerGroupHeartbeat,
- if the request has `subscribedTopicNames` set, we directly check the
authz in `KafkaApis` and return a topic auth failure in the response if
any of the topics is denied.
- Otherwise, we check the authz only if a regex refresh is triggered and
we do it based on the acl of the consumer that triggered the refresh. If
any of the topic is denied, we filter it out from the resolved
subscription.
In ConsumerGroupDescribe, we check the authz of the coordinator
response. If any of the topic in the group is denied, we remove the
described info and add a topic auth failure to the described group.
(similar to the group auth failure)
Reviewers: David Jacot <djacot@confluent.io>, Lianet Magrans
<lmagrans@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>,
Chia-Ping Tsai <chia7712@gmail.com>, TaiJuWu <tjwu1217@gmail.com>,
TengYao Chi <kitingiao@gmail.com>
This change implements the basic RPC handling StreamsGroupHeartbeat and
StreamsGroupDescribe. This includes:
- Adding an option to enable streams groups on the broker
- Passing describe and heartbeats to the right shard of the group
coordinator
- The handler inside the GroupMetadatManager for StreamsGroupDescribe is
fairly trivial, and is included directly in this PR.
- The handler for StreamsGroupHeartbeat is complex and not included in
this PR yet. Instead, a UnsupportedOperationException is thrown.
However, the interface is already defined: The result of a
streamsGroupHeartbeat is a response, together with a list of internal
topics to be created.
The heartbeat implementation inside the `GroupMetadataManager`, which
actually implements the assignment / reconciliation logic, will come in
a follow-up PR. Also, automatic creation of internal topics will be
created in a follow-up PR.
Reviewers: Bill Bejeck <bill@confluent.io>
### About
The current `SimpleAssignor` in AK assigned all subscribed topic
partitions to all the share group members. This does not match the
description given in
[KIP-932](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255070434#KIP932:QueuesforKafka-TheSimpleAssignor).
Here are the rules as mentioned in the KIP by which the assignment
should happen. We have changed the step 3 implementation here due to the
reasons
[described](https://github.com/apache/kafka/pull/18864#issuecomment-2659266502)
-
1. The assignor hashes the member IDs of the members and maps the
partitions assigned to the members based on the hash. This gives
approximately even balance.
2. If any partitions were not assigned any members by (1) and do not
have members already assigned in the current assignment, members are
assigned round-robin until each partition has at least one member
assigned to it.
3. We combine the current and new assignment. (Original rule - If any
partitions were assigned members by (1) and also have members in the
current assignment assigned by (2), the members assigned by (2) are
removed.)
### Tests
The added code has been verified with unit tests and the already present
integration tests.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, TaiJuWu <tjwu1217@gmail.com>
The last commit in this class mistakenly described the functions to be
for Streams Groups. Just a minor update.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Sushant Mahajan <smahajan@confluent.io>
I was looking into GroupCoordinatorConfigs to review configurations that
we will ship with Apache Kafka 4.0. I found out that it was pretty
disorganised. This patch cleans up the format and re-groups the
configurations which are related.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
* In this PR, we have added GC side impl to call the delete state share
coord RPC using the persister.
* We will be using the existing `GroupCoordinatorService.deleteGroups`.
The logic will be modified as follows:
* After sanitization, we will call a new
`runtime.scheduleWriteOperation` (not read for consistency) with
callback `GroupCoordinatorShard.sharePartitions`. This will return a Map
of share partitions of the groups which are of SHARE type. We need to
pass all groups as WE CANNOT DETERMINE the type of the group in the
service class.
* Then using the map we will create requests which could be passed to
the persister and make the appropriate calls.
* Once this future completes, we will continue with the existing flow of
group deletion.
* If the group under inspection is not share group - the read callback
should return an empty map.
* Tests have been added wherever applicable.
Reviewers: David Jacot <djacot@confluent.io>, Andrew Schofield <aschofield@confluent.io>
* Due to recent changes in the way group count metrics are initialized
and updated, the current share group count code has become obsolete as
well as non-functional.
* The update method for the share group count which should be called
from `ShareGroup` cannot be called either. This is because the
constructor has been changed to NOT accept the
`GroupCoordinatorShardMetrics` ref.
* In this PR, we remedy the situation by bringing share group count code
at par with consumer and streams groups.
* Additionally the metric name for share groups with group state
attributes was not aligned with streams and consumer groups as mentioned
in https://github.com/apache/kafka/pull/17011#discussion_r1960309578.
This PR aligns them too.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Cleanup code to avoid rawtype, and add suppressions where necessary.
Change the build to fail on rawtype warning.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
This patch changes the default value of `group.coordinator.threads` to `4` and sets it priority to `HIGH`. This change makes it consistent with how we handle `num.network.threads` and `num.io.threads`. The patch also tweaks the upgrade notes.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Adds streams group to the GroupMetadataManager, and implements loading
the records from the offset topic into state. The state also contains
two timers (rebalance timeout and session timeout) that are started
after the group coordinator has been loaded.
Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bill@confluent.io>
Implements a memory model for representing streams groups in the group coordinator, as well as group count and rebalance metrics.
Reviewers: Bill Bejeck <bill@confluent.io>, Bruno Cadonna <bruno@confluent.io>
At the moment, we require specifying builtin server side assignors by their full class name. This is not convenient and also exposed their full class name as part of our public API. This patch changes it to accept specifying builtin server side assignor by their short name (uniform or range) while continuing to accept full class name for customer assignors.
Reviewers: Jeff Kim <jeff.kim@confluent.io>
A class to build a new target assignment based on the provided parameters. As a result, it yields the records that must be persisted to the log and the new member assignments as a map.
Compared to the feature branch, I extended the unit tests (testing also standby and warm-up task logic) and adopted simplifications due to the TasksTuple class.
Reviewers: Bruno Cadonna <cadonna@apache.org>, Bill Bejeck <bbejeck@apache.org>
During testing we discovered that the empty group count is not updated in group conversion, but when the new group is transition to other state, the empty group count is decremented. This could result in negative empty group count.
We can have a new consumer group count implementation that follows the pattern we did for the classic group count. The timeout task periodically refreshes the metrics based on the current groups soft state.
Reviewers: Jeff Kim <jeff.kim@confluent.io>
CoordinatorRecordSerde does not validate the version of the value to check whether the version is supported by the current version of the software. This is problematic if a future and unsupported version of the record is read by an older version of the software because it would misinterpret the bytes. Hence CoordinatorRecordSerde must throw an error if the version is unknown. This is also consistent with the handling in the old coordinator.
Reviewers: Jeff Kim <jeff.kim@confluent.io>
The full class name of the assignors if part of our public api. Hence, we should ensure that they are not changed by mistake. This patch adds a unit test verifying them.
Reviewers: Sean Quah <squah@confluent.io>, Jeff Kim <jeff.kim@confluent.io>
A class with helper methods to create records stored in the __consumer_offsets topic.
Compared to the feature branch, I added unit tests (most functions were not tested) and adopted the new interface for constructing coordinator records introduced by David.
Reviewers: Bruno Cadonna <cadonna@apache.org>
Implements the current assignment builder, analogous to the current assignment builder of consumer groups. The main difference is the underlying assigned resource, and slightly different logic around process IDs: We make sure to move a task only to a new client, once the task is not owned anymore by any client with the same process ID (sharing the same state directory) - in any role (active, standby or warm-up).
Compared to the feature branch, the main difference is that I refactored the separate treatment of active, standby and warm-up tasks into a compound datatype called TaskTuple (which is used in place of the more specific Assignment class). This also has effects on StreamsGroupMember.
Reviewers: Bruno Cadonna <cadonna@apache.org>, Bill Bejeck <bbejeck@apache.org>
This patch does a few things:
1) Replace ApiMessageAndVersion by ApiMessage in CoordinatorRecord for the key
2) Leverage the fact that ApiMessage exposes the apiKey. Hence we don't need to specify the key anymore.
Reviewers: Andrew Schofield <aschofield@confluent.io>
The internal topic manager takes a requested topology and returns a configured topology, as well as any internal topics that need to be created.
Shares with the client-side "InternalTopicManager" the name only.
Reviewers: Bruno Cadonna <cadonna@apache.org>
This patch updates the transaction coordinator record to use the new coordinator record definition.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Apache Kafka 4.0 will only support KRaft and 3.0-IV1 is the minimum version supported by KRaft. So, we can assume that Apache Kafka 4.0 will only communicate with brokers that are 3.0-IV1 or newer.
Note that KRaft was only marked as production-ready in 3.3, so we could go further and set the baseline to 3.3. I think we should have that discussion, but it made sense to start with the non controversial parts.
Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <david.jacot@gmail.com>
A simplified port of "CopartitionedTopicsEnforcer" from the client-side to the group coordinator.
This class is responsible for enforcing the number of partitions in copartitioned topics. For each copartition group, it checks whether the number of partitions for all topics in the group is the same, and enforces the right number of partitions for repartition topics whose number of partitions is not enforced by the topology.
Compared to the client-side version, the implementation uses immutable data structures, and returns the computed number of partitions instead of modifying mutable data structures and calling the admin client.
Reviewers: Bruno Cadonna <cadonna@apache.org>
This patch updates the GroupCoordinatorSerde and the ShareGroupCoordinatorSerde to leverage the CoordinatorRecordType to deserialize records. With this, newly added record are automatically picked up. In other words, the serdes work with all defined records without doing anything.
Reviewers: Andrew Schofield <aschofield@confluent.io>
A simplified port of "RepartitionTopics" from the client-side to the group coordinator.
Compared to the client-side version, the implementation uses immutable data structures, and returns the computed number of partitions instead of modifying mutable data structures and calling the admin client.
Reviewers: Bruno Cadonna <cadonna@apache.org>
Adds a class that represent the topology of a Streams group sent by a Streams client in the Streams group heartbeat during initialization to the group coordinator.
This topology representation is used together with the partition metadata on the broker to create a configured topology.
Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
* KAFKA-18321: Add StreamsGroupMember, MemberState and Assignment classes
This commit adds the classes to represent a Streams group member in the
consumer coordinator.
Reviewers: Bill Bejeck <bill@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
A simplified port of "ChangelogTopics" from the client-side to the group coordinator
Compared to the client-side version, the implementation uses immutable data structures, and returns the computed number of partitions instead of modifying mutable data structures and calling the admin client.
Reviewers: Bruno Cadonna <cadonna@apache.org>
Clients in the Streams Rebalance Protocol send an "unconfigured" representation of the topology to the broker. That is, the number of input topics and (some) internal topics is not fixed, regular expressions are not resolved. The broker takes this description of the topology and, together with the current state of the topics on the broker, derives a ConfiguredTopology. The configured topology is what is being returned from StreamsGroupDescribe, and has all number of partitions defined, and regular expressions resolved. The configured topology also contains missing internal topics that need to be created, and potentially configuration errors, such as missing source topics.
In this change, we add the internal data structures for representing the configured topology. They differ in some details from the data structures used in the RPCs. Most importantly, they can be evolved independently of the public interface.
Reviewers: Bruno Cadonna <cadonna@apache.org>
Introduces interfaces for defining task assignors. Task assignors are pure functions, mapping the state of the group and a topology to a target assignment. We include a mock assignor, which we will be able to use when testing / benchmarking without the complexities of the sticky task assignor and the high-availability task assignor. We may remove the mock assignor in before the streams rebalance protocol goes GA.
The consumer groups introduce these interfaces to establish a clear separation between the group coordinator code and the pluggable assignors, which may live outside the group coordinator code. We have removed pluggable assignors in KIP-1071, but I think it still makes sense to keep these interfaces for having a clean interface for people to code against. This will pay off, if we plan on making the task assignors pluggable later.
Reviewers: Bill Bejeck <bbejeck@gmail.com>, David Jacot <djacot@confluent.io>
Included in this change:
1. Remove deprecated protocol api versions from json files.
3. Remove fields that are no longer used from json files (affects ListOffsets, OffsetCommit, DescribeConfigs).
4. Remove record down-conversion support from KafkaApis.
5. No longer return `Errors.UNSUPPORTED_COMPRESSION_TYPE` on the fetch path[1].
6. Deprecate `TopicConfig. MESSAGE_DOWNCONVERSION_ENABLE_CONFIG` and made the relevant
configs (`message.downconversion.enable` and `log.message.downcoversion.enable`) no-ops since
down-conversion is no longer supported. It was an oversight not to deprecate this via KIP-724.
7. Fix `shouldRetainsBufferReference` to handle null request schemas for a given version.
8. Simplify producer logic since it only supports the v2 record format now.
9. Fix tests so they don't exercise protocol api versions that have been removed.
10. Add upgrade note.
Testing:
1. System tests have a lot of failures, but those tests fail for trunk too and I didn't see any issues specific to this change - it's hard to be sure given the number of failing tests, but let's not block on that given the other testing that has been done (see below).
3. Java producers and consumers with version 0.9-0.10.1 don't have api versions support and hence they fail in an ungraceful manner: the broker disconnects and the clients reconnect until the relevant timeout is triggered.
4. Same thing seems to happen for the console producer 0.10.2 although it's unclear why since api versions should be supported. I will look into this separately, it's unlikely to be related to this PR.
5. Console consumer 0.10.2 fails with the expected error and a reasonable message[2].
6. Console producer and consumer 0.11.0 works fine, newer versions should naturally also work fine.
7. kcat 1.5.0 (based on librdkafka 1.1.0) produce and consume fail with a reasonable message[3][4].
8. kcat 1.6.0-1.7.0 (based on librdkafka 1.5.0 and 1.7.0 respectively) consume fails with a reasonable message[5].
9. kcat 1.6.0-1.7.0 produce works fine.
10. kcat 1.7.1 (based on librdkafka 1.8.2) works fine for consumer and produce.
11. confluent-go-client (librdkafka based) 1.8.2 works fine for consumer and produce.
12. I will test more clients, but I don't think we need to block the PR on that.
Note that this also completes part of KIP-724: produce v2 and lower as well as fetch v3 and lower are no longer supported.
Future PRs will remove conditional code that is no longer needed (some of that has been done in KafkaApis,
but only what was required due to the schema changes). We can probably do that in master only as it does
not change behavior.
Note that I did not touch `ignorable` fields even though some of them could have been
changed. The reasoning is that this could result in incompatible changes for clients
that use new protocol versions without setting such fields _if_ we don't manually
validate their presence. I will file a JIRA ticket to look into this carefully for each
case (i.e. if we do validate their presence for the appropriate versions, we can
set them to ignorable=false in the json file).
[1] We would return this error if a fetch < v10 was used and the compression topic config was set
to zstd, but we would not do the same for the case where zstd was compressed at the producer
level (the most common case). Since there is no efficient way to do the check for the common
case, I made it consistent for both by having no checks.
[2] ```org.apache.kafka.common.errors.UnsupportedVersionException: The broker is too new to support JOIN_GROUP version 1```
[3]```METADATA|rdkafka#producer-1| [thrd:main]: localhost:9092/bootstrap: Metadata request failed: connected: Local: Required feature not supported by broker (0ms): Permanent```
[4]```METADATA|rdkafka#consumer-1| [thrd:main]: localhost:9092/bootstrap: Metadata request failed: connected: Local: Required feature not supported by broker (0ms): Permanent```
[5] `ERROR: Topic test-topic [0] error: Failed to query logical offset END: Local: Required feature not supported by broker`
Reviewers: David Arthur <mumrah@gmail.com>
This patch is the first one in a series to improve how coordinator records are managed. It focuses on making coordinator records first class citizen in the generator.
* Introduce `coordinator-key` and `coordinator-value` in the schema;
* Introduce `apiKey` for those. This is done to avoid relying on the version to determine the type.
* It also allows the generator to enforce some rules: the key cannot use flexible versions, the key must have a single version `0`, there must be a key and a value for a given api key, etc.
* It generates an enum with all the coordinator record types. This is pretty handy in the code.
The patch also updates the group coordinators to use those.
Reviewers: Jeff Kim <jeff.kim@confluent.io>, Andrew Schofield <aschofield@confluent.io>
KIP-1071 specifies records that the group coordinator needs to store
into the consumer offset topic to persist the state of a Streams
group. This records are specified in json files from which the actual
classes for the records are generated.
This commit adds the records needed by the group coordinator to store
the state of a Streams group.
Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
KIP-848 will be release as GA in Apache Kafka 4.0. Hence we need to mark all the related public apis as stable.
Reviewers: Jeff Kim <jeff.kim@confluent.io>
Kafka consumer supports auto.offset.reset config option, which is used when there is no initial offset in Kafka (or) if the current offset does not exist any more on the server. This config currently supports earliest/latest/none options. Currently consumer resets might force applications to reprocess large amounts of data from earlier offsets. With infinite storage, its beneficial to have a duration based offset reset strategy. This will allow applications to consume/initialise from a fixed duration when there is no initial offset in Kafka.
As part of KIP-932, we are adding support for share consumer groups. Share consumer groups supports dynamic group configuration property share.auto.offset.reset. This is used to set the initial Share-Partition Start Offset (SPSO) based on the share.auto.offset.reset configuration. Currently share.auto.offset.reset supports earliest and latest options to automatically reset the offset
Similar to the Kafka Consumer, we will add support for by_duration: config value for share.auto.offset.reset.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>
We have observed the below errors in some cluster:
Uncaught exception in scheduled task 'handleTxnCompletion-902667' exception.message:Trying to complete a transactional offset commit for producerId *** and groupId *** even though the offset commit record itself hasn't been appended to the log.
When a transaction is completed, the transaction coordinator sends a WriteTxnMarkers request to all the partitions involved in the transaction to write the markers to them. When the broker receives it, it writes the markers and if markers are written to the __consumer_offsets partitions, it informs the group coordinator that it can materialize the pending transactional offsets in its main cache. The group coordinator does this asynchronously since Apache Kafka 2.0, see this patch.
The above error appends when the asynchronous operation is executed by the scheduler and the operation finds that there are pending transactional offsets that were not written yet. How come?
There is actually an issue is the steps described above. The group coordinator does not wait until the asynchronous operation completes to return to the api layer. Hence the WriteTxnMarkers response may be send back to the transaction coordinator before the async operation is actually completed. Hence it is possible that the next transactional produce to be started also before the operation is completed too. This could explain why the group coordinator has pending transactional offsets that are not written yet.
There is a similar issue when the transaction is aborted. However on this path, we don't have any checks to verify whether all the pending transactional offsets have been written or not so we don't see any errors in our logs. Due to the same race condition, it is possible to actually remove the wrong pending transactional offsets.
PS: The new group coordinator is not impacted by this bug.
Reviewers: Justine Olshan <jolshan@confluent.io>
This pull request replaces Log4j with Log4j2 across the entire project, including dependencies, configurations, and code. The notable changes are listed below:
1. Introduce Log4j2 Instead of Log4j
2. Change Configuration File Format from Properties to YAML
3. Adds warnings to notify users if they are still using Log4j properties, encouraging them to transition to Log4j2 configurations
Co-authored-by: Lee Dongjin <dongjin@apache.org>
Reviewers: Luke Chen <showuon@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>