During the transaction commit phase, it is normal to hit CONCURRENT_TRANSACTION error before the transaction markers are fully propagated. Instead of letting the client to retry the produce request, it is better to retry on the server side.
Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>
Rewrite BrokerMetricNamesTest using ReplicaManager.MetricNames, ensuring that all metrics are always included. This helps prevent issues like PartitionsWithLateTransactionsCount not being correctly included in the test before.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Implements a memory model for representing streams groups in the group coordinator, as well as group count and rebalance metrics.
Reviewers: Bill Bejeck <bill@confluent.io>, Bruno Cadonna <bruno@confluent.io>
The case testDescribeQuorumRequestToControllers shutdowns raft client but not the controller. This makes client has chance to send a request to the controller and get NOT_LEADER_OR_FOLLOWER error. However, if the raft client finishes shutdown before handling the request, the request will not be handled. Shutdown the controller before doing KafkaFuture#get for the client request, so we can make sure the request is handled by another controller eventually.
Signed-off-by: PoAn Yang <payang@apache.org>
Reviewers: Luke Chen <showuon@gmail.com>
Sometimes we didn't get into abortable state before aborting, so the epoch didn't get bumped. Now we force abortable state with an attempt to send before aborting so the epoch bump occurs as expected.
Reviewers: Jeff Kim <jeff.kim@confluent.io>
The tests which set reassign_from_offset_zero=False have a setup phase which produces records with old timestamps to the topic and waits until they are cleaned by the retention in order to run the main phase of the test based on non-zero offsets. The setup phases did not wait enough for the cleaning task to kick in, mainly because the scheduled task was not started yet due to log.initial.task.delay.ms being set to 30s by default. Reducing it to 5s helps to stabilize the test. The patch also changes the sleep to 12s in order to have a bit more head room.
```
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id: 2025-02-11--016
run time: 26 minutes 9.451 seconds
tests run: 12
passed: 12
flaky: 0
failed: 0
ignored: 0
================================================================================
```
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
* KAFKA-18758: NullPointerException in shutdown following InvalidConfigurationException
Add checks for null in shutdown as BrokerLifecycleManager is not instantiaited if LogManager constructor throws an Exception
After log4j migration, we need to update the logging configuration in KafkaDockerWrapper from log4j1 to log4j2.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
it's crucial to utilize a try-finally block to ensure proper closure of the ReplicaManager. Failing to do so can result in an unreleased thread from the purgatory, potentially leading to errors in subsequent integration tests that incorporate thread leak detection.
Reviewers: poorv Mittal <apoorvmittal10@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Most of the changes are obvious clean-ups/fixes. A couple of noteworthy items:
1. Support for non LTS versions is clarified (we were incorrectly stating full support
for Java 23).
2. TLS version negotiation details are clarified.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
This commit ensures that the ClientQuotaCallback#updateClusterMetadata method is executed in KRaft mode. This method is triggered whenever a topic or cluster metadata change occurs. However, in KRaft mode, the current implementation of the updateClusterMetadata API is inefficient due to the requirement of creating a full Cluster object. To address this, a follow-up issue (KAFKA-18239) has been created to explore more efficient mechanisms for providing cluster information to the ClientQuotaCallback without incurring the overhead of a full Cluster object creation.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
At the moment, we require specifying builtin server side assignors by their full class name. This is not convenient and also exposed their full class name as part of our public API. This patch changes it to accept specifying builtin server side assignor by their short name (uniform or range) while continuing to accept full class name for customer assignors.
Reviewers: Jeff Kim <jeff.kim@confluent.io>
In the streams smoke test, flush records that are appended to the input topics, to advance the stream time so that all suppressed windows are flushed at the end of the test. The records are created with record time equal to current time + 2 days. caf0b67 changed the broker defaults so that records more than one hour in the future are rejected by the broker. This breaks the flush messages. By moving all record time stamps 2 days into the past, the existing logic should work correctly with the new default broker configuration.
A similar thing happens in the relational smoke test, where data is emitted 4 days into the future. To avoid running into retention / compaction, the window retention time is increased for both tests.
Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bill@confluent.io>