Originally, we set commit-interval to MAX_VALUE for this test,
to ensure we only commit expliclity. However, we needed to decrease it
later on when adding the tx-timeout verification.
We did see failing test for which commit-interval hit, resulting in
failing test runs. This PR increase the commit-interval close to
test-timeout to avoid commit-interval from triggering.
Reviewers: Bruno Cadonna <bruno@confluent.io>
An issue in the component "GroovyEngine.execute" of jline-groovy versions through 3.24.1 allows attackers to cause an OOM (OutofMemory) error. Please refer to https://devhub.checkmarx.com/cve-details/CVE-2023-50572 for more details
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Currently, there are few document files generated automatically like the task genConnectMetricsDocs
However, the unwanted log information also added into it.
And the format is not aligned with other which has Mbean located of the third column.
I modified the code logic so the format could follow other section in ops.html
Also close the log since we take everything from the std as a documentation
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Is contains some of ConsoleGroupCommand tests rewritten in java.
Intention of separate PR is to reduce changes and simplify review.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Kafka Streams support asymmetric join windows. Depending on the window configuration
we need to compute window close time etc differently.
This PR flips `joinSpuriousLookBackTimeMs`, because they were not correct, and
introduced the `windowsAfterIntervalMs`-field that is used to find if emitting records can be skipped.
Reviewers: Hao Li <hli@confluent.io>, Guozhang Wang <guozhang.wang.us@gmail.com>, Matthias J. Sax <matthias@confluent.io>
It seems likely that BrokerServer was built upon the KafkaServer codebase.(#10113)
KafkaServer, using Zookeeper, separates controlPlane and dataPlane to implement KIP-291.
In KRaft, the roles of DataPlane and ControlPlane in KafkaServer seem to be divided into BrokerServer and ControllerServer.
It appears that the initial implementation of BrokerServer initialized and used the controlPlaneRequestProcessor, but it seems to have been removed, except for the code used in the shutdown method, through subsequent modifications.(#10931)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Remove the test constructor for PartitionAssignment and remove the TODO.
Also add KRaftClusterTest.testCreatePartitions to get more coverage for
createPartitions.
Reviewers: David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
In order to move ConfigCommand to tools we must move all it's dependencies which includes KafkaConfig and other core classes to java. This PR moves log cleaner configuration to CleanerConfig class of storage module.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
The javadocs for commitAsync() (w/o callback) say:
@throws org.apache.kafka.common.errors.FencedInstanceIdException
if this consumer instance gets fenced by broker.
If no callback is passed into commitAsync(), no offset commit callback invocation is submitted. However, we only check for a FencedInstanceIdException when we execute a callback. When the consumer gets fenced by another consumer with the same group.instance.id, and we do not use a callback, we miss the exception.
This change modifies the behavior to propagate the FencedInstanceIdException even if no callback is used. The code is kept very similar to the original consumer.
We also change the order - first try to throw the fenced exception, then execute callbacks. That is the order in the original consumer so it's safer to keep it this way.
For testing, we add a unit test that verifies that the FencedInstanceIdException is thrown in that case.
Reviewers: Philip Nee <pnee@confluent.io>, Matthias J. Sax <matthias@confluent.io>
* KAFKA-16288: Prevent ClassCastExceptions for strings in Values.convertToDecimal
* KAFKA-16289: Values inferred schemas for map and arrays should ignore element order
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewers: Chris Egerton <chrise@aiven.io>
A foreign-key-join might drop a "subscription response" message, if the value-hash changed.
This PR adds support to record such event via the existing "dropped records" sensor.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Currently, in the async Kafka consumer updates to the group metadata
that are received by the heartbeat are propagated to the application thread
in form of an event. Group metadata is updated when a new assignment is
received. The new assignment is directly set in the subscription without
sending an update event from the background thread to the application thread.
That means that there might be a delay between the application thread being
aware of the update to the assignment and the application thread being
aware of the update to the group metadata. This delay can cause stale
group metadata returned by the application thread that then causes
issues when data of the new assignment is committed. A concrete
example is
producer.sendOffsetsToTransaction(offsetsToCommit, groupMetadata)
The offsets to commit might already stem from the new assignment
but the group metadata might relate to the previous assignment.
Reviewers: Kirk True <ktrue@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
This comment was added by #12862
The method with the comment was originally named updateLastSend, but its name was later changed to onSendAttempt.
This method doesn't increment numAttempts.
It seems that the numAttempts is only modified after a Request succeeds or fails.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Global state stores are currently flushed at each commit, which may impact performance, especially for EOS (commit each 200ms).
The goal of this improvement is to flush global state stores only when the delta between the current offset and the last checkpointed offset exceeds a threshold.
This is the same logic we apply on local state store, with a threshold of 10000 records.
The implementation only flushes if the time interval elapsed and the threshold of 10000 records is exceeded.
Reviewers: Jeff Kim <jeff.kim@confluent.io>, Bruno Cadonna <cadonna@apache.org>
This minor pull request consist of upgrading version of jqwik library to version 1.8.0 that brings some bug fixing and some enhancements, upgrading the version now will make future upgrades easier
For breaking changes:
We are not using ArbitraryConfiguratorBase, so there is no overriding of configure method
We are not using TypeUsage.canBeAssignedTo(TypeUsage)
No breaking is related to @Provide and @ForAll usage no Exception CannotFindArbitraryException is thrown during tests running
No usage of StringArbitrary.repeatChars(0.0)
We are not affected by the removal of method TypeArbitrary.use(Executable)
We are not affected by the removal or methods ActionChainArbitrary.addAction(action) and ActionChainArbitrary.addAction(weight, action)
For more details check the release notes: https://jqwik.net/release-notes.html#180
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Yash Mayya <yash.mayya@gmail.com>
The internal SubscriptionState object keeps track of whether the assignment is user-assigned, or auto-assigned. If there are no assigned partitions, the assignment resets to NONE. If you call SubscriptionState.assignFromSubscribed in this state it fails.
This change makes sure to check SubscriptionState.hasAutoAssignedPartitions() so that assignFromSubscribed is going to be permitted.
Also, a minor refactoring to make clearing the subscription a bit easier to follow in MembershipManagerImpl.
Testing via new unit test.
Reviewers: Bruno Cadonna <cadonna@apache.org>, Andrew Schofield <aschofield@confluent.io>
When the consumer is closed, we perform a sychronous autocommit. We don't want to be woken up here, because we are already executing a close operation under a deadline. This is in line with the behavior of the old consumer.
This fixes PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup which is flaky on trunk - because we return immediately from the synchronous commit with a WakeupException, which causes us to not wait for the commit to finish and thereby sometimes miss the committed offset when a new consumer is created.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
I have noticed the following log when a __consumer_offsets partition immigrate from a broker. It appends because the event is queued up after the event that unloads the state machine. This patch fixes it and fixes another similar one.
```
[2024-02-06 17:14:51,359] ERROR [GroupCoordinator id=1] Execution of UpdateImage(tp=__consumer_offsets-28, offset=13251) failed due to This is not the correct coordinator.. (org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime)
org.apache.kafka.common.errors.NotCoordinatorException: This is not the correct coordinator.
```
Reviewers: Justine Olshan <jolshan@confluent.io>
There are a few minor issues with the event sub-classes in the
org.apache.kafka.clients.consumer.internals.events package that should be cleaned up:
- Update the names of subclasses to remove "Application" or "Background"
- Make toString() final in the base classes and clean up the implementations of toStringBase()
- Fix minor whitespace inconsistencies
- Make variable/method names consistent
Reviewer: Bruno Cadonna <cadonna@apache.org>
In KIP-848, we introduce the notion of Group Types based on the protocol type that the members in the consumer group use. As of now we support two types of groups:
* Classic : Members use the classic consumer group protocol ( existing one )
* Consumer : Members use the consumer group protocol introduced in KIP-848.
Currently List Groups allows users to list all the consumer groups available. KIP-518 introduced filtering the consumer groups by the state that they are in. We now want to allow users to filter consumer groups by type.
This patch includes the changes to the admin client and related files. It also includes changes to parameterize the tests to include permutations of the old GC and the new GC with the different protocol types.
Reviewers: David Jacot <djacot@confluent.io>
This is the first part of the implementation of KIP-1005
The purpose of this pull request is for the broker to start returning the correct offset when it receives a -5 as a timestamp in a ListOffsets API request
Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Satish Duggana <satishd@apache.org>
`poll(long timeout, TimeUnit unit)` is either used with `Long.MAX_VALUE` or `0`. This patch replaces it with `poll` and `take`. It removes the `awaitNanos` usage.
Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
Adding the following rebalance metrics to the consumer:
rebalance-latency-avg
rebalance-latency-max
rebalance-latency-total
rebalance-rate-per-hour
rebalance-total
failed-rebalance-rate-per-hour
failed-rebalance-total
Due to the difference in protocol, we need to redefine when rebalance starts and ends.
Start of Rebalance:
Current: Right before sending out JoinGroup
ConsumerGroup: When the client receives assignments from the HB
End of Rebalance - Successful Case:
Current: Receiving SyncGroup request after transitioning to "COMPLETING_REBALANCE"
ConsumerGroup: After completing reconciliation and right before sending out "Ack" heartbeat
End of Rebalance - Failed Case:
Current: Any failure in the JoinGroup/SyncGroup response
ConsumerGroup: Failure in the heartbeat
Note: Afterall, we try to be consistent with the current protocol. Rebalances start and end with sending and receiving network requests. Failures in network requests signify the user failures in rebalance. And it is entirely possible to have multiple failures before having a successful one.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
Performs additional unwrap during handshake after data from client is processed to support openssl, which needs the extra unwrap to complete handshake.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Add test for concurrently updatingMetadata and fetching snapshot/cluster
Reviewers: Jason Gustafson <jason@confluent.io>
Co-authored-by: Zhifeng Chen <ericzhifengchen@gmail.com>
Treats KAFKA-16277 - CooperativeStickyAssignor does not spread topics evenly among consumer group
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Introduce call to onPartitionsLost callback to release assignment when a consumer pro-actively leaves the group due to poll timer expired.
When the poll timer expires, the member sends a leave group request (reusing same existing LEAVING state and logic), and then transitions to STALE to release it assignment and wait for the poll timer reset. Once both conditions are met, the consumer transitions out of the STALE state to rejoin the group. Note that while on this STALE state, the member is not part of the group so it does not send heartbeats.
This PR also includes the fix to ensure that while STALE or in any other state where the member is not in the group, heartbeat responses that may be received are ignored.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
Introduces a new filter in ListTransactionsRequest API. This enables caller to filter on transactions that have been running for longer than a certain duration of time.
This PR includes the following changes:
bumps version for ListTransactionsRequest API to 1. Set the durationFilter to -1L when communicating with an older broker that does not support version 1.
bumps version for ListTransactionsResponse to 1 without changing the response structure.
adds durationFilter option to kafka-transactions.sh --list
Tests:
Client side test to build request with correct combination of duration filter and API version: testBuildRequestWithDurationFilter
Server side test to filter transactions based on duration: testListTransactionsFiltering
Added test case for kafka-transactions.sh change in TransactionsCommandTest
Reviewers: Justine Olshan <jolshan@confluent.io>, Raman Verma <rverma@confluent.io>
KRaft was only notifying listeners of the latest leader and epoch when the replica transition to a new state. This can result in the listener never getting notified if the registration happened after it had become a follower.
This problem doesn't exists for the active leader because the KRaft implementation attempts to notified the listener of the latest leader and epoch when the replica is the active leader.
This issue is fixed by notifying the listeners of the latest leader and epoch after processing the listener registration request.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Commit 1442862 introduced a test with an assertion inside a listener
callback. This improves the test by checking that the listener is
actually being executed, to avoid silent skipping of the assertion.
Reviewers: Matthias J. Sax <matthias@confluent.io>
The group coordinator expects the instance ID to always be sent when
leaving the group in a static membership configuration, see
ea94507679/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java (L814)
The failure was silent, because the group coordinator does not log
failed requests and the consumer doesn't wait for the heartbeat response
during close.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Kirk True <ktrue@confluent.io>, Bruno Cadonna <cadonna@apache.org>
Fix to ensure that a consumer that has been fenced by the coordinator stops sending heartbeats while it is on the FENCED state releasing its assignment (waiting for the onPartitionsLost callback to complete). Once the callback completes, the member transitions to JOINING and it's then when it should resume sending heartbeats again.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
a3528a3 removed this log but not the test asserting it.
Builds are currently red because for some reason these tests can't retry. We should address that as a followup.
Reviewers: Greg Harris <greg.harris@aiven.io>, Matthias J. Sax <matthias@confluent.io>