* Merging the Refactory to add standby tasks heartbeat response
added test
Add stability check and refactor partition assignments
Introduces a `membersStable` method in `StreamsGroup` to ensure group members are in a stable state before partition assignments. Refactors `EndpointToPartitionsManager` to streamline input topic handling and updates related tests for clarity and correctness. Adds integration test for validating partition assignment logic with multiple configurations.
* Refactor IQv2 metadata handling and endpoint partition mapping.
Streamlined metadata validation logic in IQv2 integration tests and removed redundant code. Simplified `maybeBuildEndpointToPartitions` by eliminating unnecessary stability checks, improving clarity and maintainability of endpoint-to-partition mapping.
* Remove unused `membersStable` method from StreamsGroup
* Refactor and enhance IQv2 test metadata validation.
Replaced redundant variable names and improved metadata handling for active and standby tasks in the IQv2 test. Added validations for state store names using `EXPECTED_STORE_NAME` and organized metadata assertions to enhance clarity and testing coverage. These changes improve maintainability and ensure better validation of stream metadata.
* Add inspection of standby topics
* Adding standby tasks heartbeat response
* Refactory to add standby tasks heartbeat response
added test
* Addressing comments plus some minor checkstyle side cleanup
This PR integrates KIP-1082 changes into the KIP-1071. The change centers around clients creating the member id on startup, vs. waiting for the GroupCoordinator to create the member id and return it in the response to initial heartbeat.
The changes were small and included some changes to the GroupMetadataManagerTest streams group related tests.
I'll add comments in-line regarding the updates. Overall, the impact can be best summarized by stating that now certain actions will now happen one epoch earlier. For example, task assignment would typically happen on epoch 2 (bumping up from 1) since clients waited for the membership id to be returned and the assignment occurred in the subsequent heartbeat request.
For testing, I updated the GroupMetadataManagerTest to get all tests passing.
The Streams membership manager did not use the LeaveOnClose event
to faster leaving the group on closing. With the event the Streams
membership manager leaves the group without invoking any callback
for releasing the active tasks.
With this commit the Streams membership member uses the LeaveOnClose
event.
This PR implements two changes in the KIP, and related logic on client & server side.
We replace topology ID with topology epoch. We only include the basic changes to the schema and basic validation for now (fail if topology is changed without a topology epoch bump) - since topology updating will follow later on.
We add status codes for topic validation, and implement the required logic to validate the internal topics.
We merge the initialization of the Streams group into the heartbeat of the Streams group.
As a result the first heartbeat of a member (member epoch is zero) contains the topology
metadata required for initialization.
The streams membership manager manages the state of the member
and is responsible for reconciling the received task assignment
with the current task assignment of the member. Additionally,
it requests the call of the callback that installs the
reconciled task assignment in the Streams client.
* KSTREAMS-6456: Port tools for topic configuration
Ports several tools for topic configuration from the client side, to the
broker-side. Several things are refactored:
- Decoupling. On the client side, for example, RepartitionTopics was
using the CopartitionedTopicEnforcer, the InternalTopicCreator, the
TopologyMetadata and the Cluster objects. All of those are mocked
with Mockito during testing. This points to bad coupling. We
refactored all classes to be mostly self-sufficient, only relying on
themselves and simple interfaces.
- Tests only use JUnit5, not hamcrast matchers and no other streams
utilities, to not pollute the group coordinator module.
- All classes only modify the configurations -- the code does not
actually call into the AdminClient anymore.
- We map all errors to new errors in the broker, in particular,
the error for missing topics, inconsistent internal topics, and
invalid topologies.
We include the internal, mutable datastructures, that are set- and
map-based for effiecient algorithms. They are distinctly different
from the data represented in `StreamsGroupTopologyValue` and
`StreamsGroupInitializeRequest`, since regular expressions must
be resolved at this point.
Both the topic creation and internal topic validation will be
based on this code, the basics of this are implemented in the
`InternalTopicManager`.
Every time, either the broker-side topology or the topic metadata
on the broker changes, we reconfigure the internal topics, check
consistency with the current topics on the broker, and possibly
trigger creation of the missing internal topics. These changes
will be built on top of this change.
* formatting fixes
* KSTREAMS-6456: Complete topic metadata for automatic topic creation
This change updates the current RPCs and schemas to add the following
metadata:
- copartition groups are added to topology record and initialize RPC
- multiple regexs are added to topology record and initialize RPC
- replication factors are added for reach internal topic
We also add code to fill this information correctly from the
`InternalTopologyBuilder` object.
The fields `assignmentConfiguration` and `assignor` in the
`StreamsAssignmentInterface` are removed, because they are not
needed anymore.
* fixes
Throughout RPCs, we were using subtopology instead of subtopologyId for
the string ID of a subtopology. We were also using sub-topology
spelling. Cleaning this up before merging the auto-topic creating code.
This commit:
- schedules a timeout for the initialization call
- requests the initialization only from the one member
- intializations of unknown topologies or already initialized topologies
are silently dropped
The goal is to define initial record types for storing group metadata. The aim is roughly match the data represented in the RPCs and the KIP-848 records. There is an extra record for the topology that is called when StreamsInitialize is called.
See https://github.com/lucasbru/kafka/pull/14
target assignment and topology
current assignment
member metadata
names and version
initial copy
Find a way to inject the StreamsHeartbeatRequestManager into AsyncKafkaConsumer from Kafka Streams
Also trigger assignment logic in Kafka Streams
See https://github.com/lucasbru/kafka/pull/12
Even if we do not implement client-side assignment in the POC, let's define all public interfaces that will be part of the KIP in the branch, so that we have a single source of truth.
Also updates the RPCs following discussions with responsive
* Adds "userdata" for client-side assignment to the RPCs, as requested by Sophie.
* Use task offsets instead of task lags
* Allow the task offsets to be updated for 4 reasons
See https://github.com/lucasbru/kafka/pull/8
Removes the client side AddPartitionsToTxn/AddOffsetsToTxn calls so that the partition is implicitly added as part of KIP-890 part 2.
This change also requires updating the valid state transitions. The client side can not know for certain if a partition has been added server side when the request times out (partial completion). Thus for TV2, the transition to PrepareAbort is now valid for Empty, CompleteCommit, and CompleteAbort.
For readability, the V1 and V2 endTransaction methods have been separated.
Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>, Ritika Reddy <rreddy@confluent.io>
This PR introduces the DescribeGroups v6 API as part of KIP-1043. This adds an error message for the described groups so that it is possible to get some context on the error. It also changes the behaviour for when the group ID cannot be found but returning error code GROUP_ID_NOT_FOUND rather than NONE.
Reviewers: David Jacot <djacot@confluent.io>
In the response handler for ShareAcknowledge, we are passing the clientResponse.receivedTimeMs() to the handler methods. But when there is a disconnect or when the response received is null, we should be passing the current time instead.
This bug was causing consumer to hang as it did not call the handler methods on disconnect, and further requests were blocked waiting for its completion.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>