* Add `RackAwareAssignor`. It uses `racksForPartition` to check the rack
id of a partition and assign it to a member which has the same rack id.
* Add `ConsumerIntegrationTest#testRackAwareAssignment` to check
`racksForPartition` works correctly.
Reviewers: David Jacot <djacot@confluent.io>
---------
Signed-off-by: PoAn Yang <payang@apache.org>
In #19840, we broke de-duplication during ACL creation. This patch fixes
that and adds a test to cover this case.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Hi, I've created pull request.
jira: [19328](https://issues.apache.org/jira/browse/KAFKA-19328)
problem:
1. doAnswer chaining works as intended only when calls are made
sequentially. In a multithreaded environment, its behavior is
unpredictable.
2. errors in a thread can be swallowed, not seen in main thread.
3. 5 doAnswer chain is not enough for 100 threads. The last chain is
returned for most cases.
4. nextFetchOffset seems to be called before doAnswer chain, so the last
values (25, 5, 26, 16) always was found in doAsnwer chain.
solution:
Delete doAnswer chain so that above four problems disappear.
Reviewers: Abhinav Dixit <adixit@confluent.io>, Apoorv Mittal
<apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
This PR implements all the options for `--delete --group grpId` and
`--delete --all-groups`
Tests: Integration tests and unit tests.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Andrew Schofield
<aschofield@confluent.io>
The `LIST_CLIENT_METRICS_RESOURCES` RPC was generalised to all config
resources in AK 4.1 and the RPC was renamed to `LIST_CONFIG_RESOURCES`.
This PR updates the RPC authorisation table in the documentation.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
### About
Replaced `.close` functionality with `try-with-resources` for few tests
in `DelayedShareFetchTest` where we required to use `mockStatic`.
### Testing
The code has been tested by running the unit tests.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
Adding support for the `urn:ietf:params:oauth:grant-type:jwt-bearer`
grant type (AKA `jwt-bearer`). Includes further refactoring of the
existing OAuth layer and addition of generic JWT assertion layer that
can be leveraged in the future.
This constitutes the main piece of the JWT Bearer grant type support.
Forthcoming commits/PRs will include improvements for both the
`client_credentials` and `jwt-bearer` grant types in the following
areas:
* Integration test coverage (KAFKA-19153)
* Unit test coverage (KAFKA-19308)
* Top-level documentation (KAFKA-19152)
* Improvements to and documentation for `OAuthCompatibilityTool`
(KAFKA-19307)
Reviewers: Manikumar Reddy <manikumar@confluent.io>, Lianet Magrans
<lmagrans@confluent.io>
---------
Co-authored-by: Zachary Hamilton <77027819+zacharydhamilton@users.noreply.github.com>
Co-authored-by: Lianet Magrans <98415067+lianetm@users.noreply.github.com>
This PR adds integration tests for `--list`
(Transferred from the feature branch `kip1071`) related ticket:
[KAFKA-18887](https://issues.apache.org/jira/browse/KAFKA-18887)
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
The `String.split` method never returns an array containing null
elements.
Reviewers: TengYao Chi <frankvicky@apache.org>, Ken Huang
<s7133700@gmail.com>, Lan Ding <isDing_L@163.com>
## Problem
When an `txnProducer.abortTransaction()`operation encounters a
`TRANSACTION_ABORTABLE` error, it currently tries to transition to
`ABORTABLE_ERROR` state. This can create an infinite retry loop since:
1. The abort operation fails with `TRANSACTION_ABORTABLE`
2. We transition to `ABORTABLE_ERROR` state
3. The application recieves instance of TransactionAbortableException
and it retries the abort
4. The cycle repeats
## Solution
For `txnProducer.abortTransaction()`API, convert
`TRANSACTION_ABORTABLE` errors to fatal errors (`KafkaException`) during
abort operations to ensure clean transaction termination. This prevents
retry loops by:
1. Treating abort failures as fatal errors at application layer
2. Ensuring the transaction can be cleanly terminated
3. Providing clear error messages to the application
## Changes
- Modified `EndTxnHandler.handleResponse()` to convert
`TRANSACTION_ABORTABLE` errors to `KafkaException` during abort
operations
- Set TransactionManager state to FATAL
- Updated test `testAbortableErrorIsConvertedToFatalErrorDuringAbort` to
verify this behavior
## Testing
- Added test case verifying that abort operations convert
`TRANSACTION_ABORTABLE` errors to `KafkaException`
- Verified that Commit API with TRANSACTION_ABORTABLE error should
set TM to Abortable state
- Verified that Abort API with TRANSACTION_ABORTABLE error should
convert to Fatal error i.e. KafkaException
## Impact
At application layer, this change improves transaction reliability by
preventing infinite retry loops during abort operations.
Reviewers: Justine Olshan <jolshan@confluent.io>
### Problem
- Currently, when a transactional producer encounters retriable errors
(like `COORDINATOR_LOAD_IN_PROGRESS`) and exhausts all retries, finally
returns retriable error to Application Layer.
- Application reties can cause duplicate records. As a fix we are
transitioning all retriable errors as Abortable Error in transaction
producer path.
- Additionally added InvalidTxnStateException as part of
https://issues.apache.org/jira/browse/KAFKA-19177
### Solution
- Modified the TransactionManager to automatically transition retriable
errors to abortable errors after all retries are exhausted. This ensures
that applications can abort transaction when they encounter
`TransactionAbortableException`
- `RefreshRetriableException` like `CoordinatorNotAvailableException`
will be refreshed internally
[[code](6c26595ce3/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java (L1702-L1705))]
till reties are expired, then it will be treated as retriable errors and
translated to `TransactionAbortableException`
- Similarly for InvalidTxnStateException
### Testing
Added test `testSenderShouldTransitionToAbortableAfterRetriesExhausted`
to verify in sender thread:
- Retriable errors are properly converted to abortable state after
retries
- Transaction state transitions correctly and subsequent operations fail
appropriately with TransactionAbortableException
Reviewers: Justine Olshan <jolshan@confluent.io>
* Use metadata hash to replace subscription metadata.
* Remove `ShareGroupPartitionMetadataKey` and
`ShareGroupPartitionMetadataValue`.
* Use `subscriptionTopicNames` and `metadataImage` to replace
`subscriptionMetadata` in `subscribedTopicsChangeMap` function.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot
<djacot@confluent.io>, Andrew Schofield <aschofield@confluent.io>
---------
Signed-off-by: PoAn Yang <payang@apache.org>
Update catch to handle compression errors
Before :

After
```
Sent message: KR Message 376
[kafka-producer-network-thread | kr-kafka-producer] INFO
org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter -
KR: Failed to compress telemetry payload for compression: zstd, sending
uncompressed data
Sent message: KR Message 377
```
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Bill Bejeck <bbejeck@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This is the initial documentation for KIP-932 preview in AK 4.1. The aim
is to get very minimal docs in before the cutoff. Longer term, more
comprehensive documentation will be provided for AK 4.2.
The PR includes:
* Generation of group-level configuration documentation
* Add link to KafkaShareConsumer to API docs
* Add a summary of share group rational to design docs
* Add basic operations information for share groups to ops docs
* Add upgrade note describing arrival of KIP-932 preview in 4.1
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
---------
Co-authored-by: Apoorv Mittal <apoorvmittal10@gmail.com>
* Use metadata hash to replace subscription metadata.
* Remove `StreamsGroupPartitionMetadataKey` and
`StreamsGroupPartitionMetadataValue`.
* Check whether `configuredTopology` is empty. If it's, call
`InternalTopicManager.configureTopics` and set the result to the group.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
---------
Signed-off-by: PoAn Yang <payang@apache.org>
- Currently, read and write share state requests were allowed on
uninitialized share partitions (share partitions on which
initializeState has NOT been called). This should not be the case.
- This PR addresses the concern by adding error checks on read and
write. Other requests are allowed (initialize, readSummary, alter).
- Refactored `ShareCoordinatorShardTest` to reduce redundancy and added
some new tests.
- Some request/response classes have also been reformatted.
Reviewers: Andrew Schofield <aschofield@confluent.io>
According to the current code in AK, the offset reset strategy for share
groups was set using the flag `--offset-reset-strategy` in the
share_consumer_test.py tests, but that would mean that the admin client
call would be sent out by all members in the share group. This PR
changes that by introducing `set_group_offset_reset_strategy` method in
kafka.py, which runs the kafka-configs.sh script in one of the existing
docker containers, thereby changing the config only once.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Before 4.1, the api key 74 is `ListClientMetricsResources`. After 4.1,
it's `ListConfigResources`. If users sent a v0 ListConfigResources to
broker, the metric doesn't record request with
`ListClientMetricsResources`. This PR is to add
`ListClientMetricsResources` metric if the request is v0
`ListConfigResources`.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>
This PR adds system tests in share_consume_bench_test.py for testing the
trogdor agent for Share Consumers/
Reviewers: Lan Ding <53332773+DL1231@users.noreply.github.com>, Andrew
Schofield <aschofield@confluent.io>
PR streamlines the logs when delete share group or offset is triggered.
Also fixes the response when group is not found while deleting share
group.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Sushant Mahajan
<smahajan@confluent.io>
This PR sets `SV_1` as the latest-production share version. This means
for AK 4.1 it will be a preview feature, not enabled by default, but can
be enabled without turning on unstable features. This is analogous to
how ELR worked for AK 4.0.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
This PR includes some performance system tests utilizing the
kafka-share-consumer-perf.sh tool for share groups
Reviewers: Andrew Schofield <aschofield@confluent.io>
Currently some tests in StreamsBrokersBounceTest failed due to error
`The cluster does not support the STREAMS group protocol or does not
support the versions of the STREAMS group protocol used by this client
(used versions: 0 to 0).`
The reason is that under isolated kraft mode, we missed to set both
`unstable.api.versions.enable` and `unstable.feature.versions.enable` to
true to all controllers, which cause `streams.version` fallback to 0 in
the broker side and the above error raise when
StreamsGroupRequestHeartbeat comes to the broker.
This patch add the missing configs to controllers properties if streams
group protocol is used.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
This refactor improves the implementation of `awaitNonEmptyRecords` by:
- Removing the unreachable `throw new IllegalStateException` statement,
which was dead code due to `pollRecordsUntilTrue` throwing exceptions on
timeout.
- Eliminating the use of `return` inside the lambda, which relies on
non-local returns that can be confusing and error-prone in Scala.
Reviewers: Yung <yungyung7654321@gmail.com>, Ken Huang
<s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>
---------
Co-authored-by: Jing-Jia Hung <jing@Jing-JiadeiMac.local>
Following the removal of the ZK-to-KRaft migration code in commit
85bfdf4, controller-to-broker communication is now handled by the
control-plane listener (`controller.listener.names`). The
`interBrokerListenerName` parameter in `ClusterControlManager` is no
longer referenced on the controller side and can be safely removed as
dead code.
Reviewers: Lan Ding <isDing_L@163.com>, Ken Huang <s7133700@gmail.com>,
Chia-Ping Tsai <chia7712@gmail.com>
The flakiness occurs when the offsets topic does not yet exist. Hence,
the issue is mitigated by creating the offsets topic in `setup()`. This
serves as a workaround. The root cause is tracked in
[KAFKA-19357](https://issues.apache.org/jira/browse/KAFKA-19357).
I ran the test 100 times on my Mac and all of them passed.
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
Removed the unused FetchResponse#of that is not used in production. The
test cases that originally invoked this method have been updated to call
the other
[FetchResponse#of](6af849f864/clients/src/main/java/org/apache/kafka/common/requests/FetchResponse.java (L232)),
which is currently used by ```KafkaApis```, to maintain the integrity of
the tests.
Reviewers: Jun Rao <junrao@gmail.com>, PoAn Yang <payang@apache.org>,
Chia-Ping Tsai <chia7712@gmail.com>
Adds missing documentation to the `partitionsToOffsetAndMetadata`
methods in both `ListStreamsGroupOffsetsResult` and
`ListShareGroupOffsetsResult` classes to clarify the behavior when a
group does not have a committed offset for a specific partition.
As document in ListConsumerGroupOffsetsResult: > If the group doesn’t
have a committed offset for a specific partition, the corresponding
value in the returned map will be null.
This important detail was previously missing in the JavaDoc of the
stream and share group variants.
Reviewers: Nick Guo <lansg0504@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
#15613 ensures that all `commitAsync` callbacks are triggered before
`commitSync` completes for `AsyncKafkaConsumer`. However, the related
changes to `ClassicKafkaConsumer`, #15693, were not merged. I assume
this might be because we intend to gradually move toward using AsyncConsumer
instead.
In short, this behavioral difference should be documented.
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
Minor updates to RangeSet: * Disallow ranges with negative size *
Disallow ranges with more than Integer.MAX_VALUE elements * Fix
equals() so that all empty RangeSets are equal, to follow the Set
interface definition better. * Reimplement hashCode() to follow the
Set interface definition.
Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang
<payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
Use Java to rewrite BaseConsumerTest, SaslPlainPlaintextConsumerTest by
new test infra and move it to client-integration-tests module, the
BaseConsumerTest test is still used, thus we should not remove now.
Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi
<frankvicky@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
The commit 57ae6d6706 had mistakenly
removed the `@Test` tag from a test. Adding it back.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>, Ken Huang <s7133700@gmail.com>, PoAn Yang
<payang@apache.org>
1. Move `LogReadResult` to server module.
2. Rewrite `LogReadResult` in Java.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, TengYao Chi <frankvicky@apache.org>
This PR rewrites `ConsumerWithLegacyMessageFormatIntegrationTest.scala`
in Java and moves it to the `clients-integration-tests module`.
Reviewers: PoAn Yang <payang@apache.org>, Ken Huang
<s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
* Return resource doesn't exist message when users try to describe a
non-existent resource in kafka-configs.sh and kafka-client-metrics.sh.
* For groups type, the command checks both existent groups and
non-existent groups but having dynamic config. If it cannot find a group
in both conditions, return resource doesn't exist message.
Reviewers: Lan Ding <53332773+DL1231@users.noreply.github.com>, Andrew
Schofield <aschofield@confluent.io>
---------
Signed-off-by: PoAn Yang <payang@apache.org>
public void completeTransaction(PreparedTxnState preparedTxnState)
The method compares the currently prepared transaction state and the
state passed in the argument.
1. Commit if the state matches
2. Abort the transaction otherwise.
If the producer is not in a prepared state (i.e., neither
prepareTransaction was called nor initTransaction(true) was called), we
return an INVALID_TXN_STATE error.
Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits
<alivshits@confluent.io>
Addresses:
[KAFKA-6629](https://issues.apache.org/jira/browse/KAFKA-6629)
Adds configuration for the SessionKeySchema and parameterises the
existing tests so that both WindowKeys and SessionKeys are tested under
the existing unit tests.
Reviewers: Bill Bejeck <bbejeck@apache.org>
---------
Co-authored-by: Lorcan <lorcanjames1@gmail.com>
- Due to condition on number of updates/snapshot in
`generateShareStateRecord`, share updates gets written for write state
requests even if they have the highest state epoch seen so far.
- A share update cannot record state epoch. As a result, this update
gets missed.
- This PR remedies the issue and adds a test as proof of the fix.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Use ClusterTest and java to rewrite `EndToEndClusterIdTest` and move it
to the server module
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
This PR include system tests in the file share_group_command_test.py.
These tests test the functionality of kafka-share-groups.sh tool
Reviewers: Sushant Mahajan <smahajan@confluent.io>, Andrew Schofield
<aschofield@confluent.io>
Fix `updateBrokerContactTime` so that existing brokers still have their
contact time updated when they are already tracked. Also, update the
unit test to test this case.
Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Yung
<yungyung7654321@gmail.com>, TengYao Chi <frankvicky@apache.org>, Ken
Huang <s7133700@gmail.com>
- A new method `assignablePartitions` was added to the
`SubscribedTopicDescriber`in https://github.com/apache/kafka/pull/19026.
This method was required for computing assignments for share groups
(KIP-932).
- However, since the describer is a public interface and is used to
encapsulate methods which return all subscribed partitions (KIP-848),
`assignablePartitions` is deemed inconsistent with this interface.
- Hence, this PR extends the `GroupSpec` interface to add a method
`isPartitionAssignable` which will serve the same purpose. The
`assignablePartitions` has been removed from the describer.
- Tests have been updated for the assigners and spec and removed from
describer as required.
Reviewers: Andrew Schofield <aschofield@confluent.io>, David Jacot
<djacot@confluent.io>
This patch fixes a problem in AclControlManager where we are updating
the timeline data structures prematurely.
Reviewers: Alyssa Huang <ahuang@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, Andrew Schofield <aschofield@confluent.io>,
Add integration tests for alter share group offsets API.
Reviewers: Lan Ding <53332773+DL1231@users.noreply.github.com>, Sushant
Mahajan <smahajan@confluent.io>, Apoorv Mittal
<apoorvmittal10@gmail.com>