Commit Graph

12413 Commits

Author SHA1 Message Date
Mayank Shekhar Narula ddfcc333f8
KAFKA-16226 Add test for concurrently updatingMetadata and fetching snapshot/cluster (#15385)
Add test for concurrently updatingMetadata and fetching snapshot/cluster

Reviewers: Jason Gustafson <jason@confluent.io>

Co-authored-by: Zhifeng Chen <ericzhifengchen@gmail.com>
2024-02-26 15:57:11 -08:00
Cameron Redpath 027fad4b2a
KAFKA-16277: AbstractStickyAssignor - Sort owned TopicPartitions by partition when reassigning (#15416)
Treats KAFKA-16277 - CooperativeStickyAssignor does not spread topics evenly among consumer group

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2024-02-26 13:33:16 -08:00
Lianet Magrans 9bc9fae942
KAFKA-16258: callback to release assignment when stale member leaves group (#15415)
Introduce call to onPartitionsLost callback to release assignment when a consumer pro-actively leaves the group due to poll timer expired.

When the poll timer expires, the member sends a leave group request (reusing same existing LEAVING state and logic), and then transitions to STALE to release it assignment and wait for the poll timer reset. Once both conditions are met, the consumer transitions out of the STALE state to rejoin the group. Note that while on this STALE state, the member is not part of the group so it does not send heartbeats.

This PR also includes the fix to ensure that while STALE or in any other state where the member is not in the group, heartbeat responses that may be received are ignored.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-02-26 11:39:33 +01:00
Anton Agestam e8cd661bec
MINOR: Document MetadataResponse invariants for name and ID (#15386)
Reviewers: Justine Olshan <jolshan@confluent.io>, Ziming Deng <dengziming1993@gmail.com>, Greg Harris <gharris1727@gmail.com>.
2024-02-26 14:49:59 +08:00
Yang Yu b4e96913cc
KAFKA-16265: KIP-994 (Part 1) Minor Enhancements to ListTransactionsRequest (#15384)
Introduces a new filter in ListTransactionsRequest API. This enables caller to filter on transactions that have been running for longer than a certain duration of time.

This PR includes the following changes:

bumps version for ListTransactionsRequest API to 1. Set the durationFilter to -1L when communicating with an older broker that does not support version 1.
bumps version for ListTransactionsResponse to 1 without changing the response structure.
adds durationFilter option to kafka-transactions.sh --list
Tests:

Client side test to build request with correct combination of duration filter and API version: testBuildRequestWithDurationFilter
Server side test to filter transactions based on duration: testListTransactionsFiltering
Added test case for kafka-transactions.sh change in TransactionsCommandTest

Reviewers: Justine Olshan <jolshan@confluent.io>, Raman Verma <rverma@confluent.io>
2024-02-24 06:09:23 -08:00
José Armando García Sancio 474f8c1ad6
KAFKA-16286; Notify listener of latest leader and epoch (#15397)
KRaft was only notifying listeners of the latest leader and epoch when the replica transition to a new state. This can result in the listener never getting notified if the registration happened after it had become a follower.

This problem doesn't exists for the active leader because the KRaft implementation attempts to notified the listener of the latest leader and epoch when the replica is the active leader.

This issue is fixed by notifying the listeners of the latest leader and epoch after processing the listener registration request.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-02-23 12:56:25 -08:00
Lucas Brutschy 7efdf40a77
MINOR: Improve test testCommitInRebalanceCallback (#15420)
Commit 1442862 introduced a test with an assertion inside a listener
callback. This improves the test by checking that the listener is
actually being executed, to avoid silent skipping of the assertion.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2024-02-23 17:21:48 +01:00
Lucas Brutschy 7ac50a8611
KAFKA-16152: Fix PlaintextConsumerTest.testStaticConsumerDetectsNewPartitionCreatedAfterRestart (#15419)
The group coordinator expects the instance ID to always be sent when
leaving the group in a static membership configuration, see

ea94507679/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java (L814)

The failure was silent, because the group coordinator does not log
failed requests and the consumer doesn't wait for the heartbeat response
during close.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Kirk True <ktrue@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2024-02-23 16:43:50 +01:00
Lianet Magrans 2185004083
KAFKA-16251: Fix for not sending heartbeat while fenced (#15392)
Fix to ensure that a consumer that has been fenced by the coordinator stops sending heartbeats while it is on the FENCED state releasing its assignment (waiting for the onPartitionsLost callback to complete). Once the callback completes, the member transitions to JOINING and it's then when it should resume sending heartbeats again.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-02-23 10:56:05 +01:00
Daan Gerits 06392f7ae2
MINOR: Update of the PAPI testing classes to the latest implementation (#12740)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2024-02-22 18:15:24 -08:00
Justine Olshan 661e41c92f
KAFKA-16302: Remove check for log message that is no longer present (fix builds) (#15422)
a3528a3 removed this log but not the test asserting it.

Builds are currently red because for some reason these tests can't retry. We should address that as a followup.

Reviewers:  Greg Harris <greg.harris@aiven.io>,  Matthias J. Sax <matthias@confluent.io>
2024-02-22 17:10:11 -08:00
Stanislav Kozlovski a512ef1478
MINOR: Add 3.7 JBOD KRaft Early Acces disclaimer (#15418)
This patch adds a small disclaimer to the ops documentation page to mention that JBOD is in early access with 3.7
2024-02-22 23:55:19 +01:00
Josep Prat 98a658f871
MINOR: Update dependencies (#15404)
* MINOR: Update dependencies

Updates minor versions for our dependencies and build tool

- Jackson from 2.16.0 to 2.16.1
- JUnit from 5.10.0 to 5.10.2
  https://junit.org/junit5/docs/5.10.2/release-notes/ and https://junit.org/junit5/docs/5.10.1/release-notes/
- Mockito from 5.8.0 to 5.10.0 (only if JDK 11 or higher)
  https://github.com/mockito/mockito/releases/tag/v5.10.0 and https://github.com/mockito/mockito/releases/tag/v5.9.0
- Gradle from 8.5 to 8.6 https://docs.gradle.org/8.6/release-notes.html

Reviewers: Divij Vaidya <diviv@amazon.com>


Signed-off-by: Josep Prat <josep.prat@aiven.io>
2024-02-22 12:11:51 +01:00
Stanislav Kozlovski ea94507679
MINOR: Add 3.7 upgrade notes (#15407)
This patch adds the 3.7 upgrade notes.
2024-02-22 10:44:24 +01:00
Stanislav Kozlovski 9402d525ab
MINOR: Reconcile upgrade.html with kafka-site/36's version (#15406)
The usual flow of updating the upgrade.html docs is to first do it in apache/kafka/trunk, then cherry-pick to the relative release branch and then copy into the kafka-site repo.

It seems like this was not done with a few commits updating the 3.6.1, 3.5.2 and 3.5.1, resulting in kafka-site's latest upgrade.html containing content that isn't here. This was caught while we were adding the 3.7 upgrade docs.

This patch reconciles both files by taking the extra changes from kafka-site and placing them here. This was done by simply comparing a diff of both changes and taking the ones that apply
2024-02-22 10:39:58 +01:00
Bruno Cadonna 02ebfc6108
KAFKA-16194: Do not return records from poll if group metadata unknown (#15369)
Due to the asynchronous nature of the async consumer, it might happen that on the application thread the group metadata is not known after the first poll returns records. If the offsets of those records are then send to a transaction with

txnProducer.sendOffsetsToTransaction(offsetsToCommit, groupMetadata);

and then the transaction is committed, the group coordinator will raise an error saying that the member is not known since the member in groupMetadata is still -1 before the metadata is updated.

This commit avoids this error by not returning any records from poll() until the group metadata is updated, i.e., the member ID and the generation ID (a.k.a. member epoch) are known. This check is only done if group management is used.

Additionally, this commit resets the group metadata when the consumer unsubscribes.

Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
2024-02-21 18:19:38 +01:00
Matthias J. Sax a3528a316f
MINOR: remove unnecessary logging (#15396)
We already record dropping record via metrics and logging at WARN level
is too noise. This PR removes the unnecessary logging.

Reviewers: Kalpesh Patel <kpatel@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2024-02-21 08:01:11 -08:00
Anton Liauchuk 4c012c5c23
KAFKA-16278: Missing license for scala related dependencies (#15398)
Reviewers: Divij Vaidya <diviv@amazon.com>
2024-02-21 12:25:15 +01:00
Bruno Cadonna cc49fc7656
HOTFIX: Fix compilation error in TransactionManagerTest (#15405)
Variable metadataMock was removed by #15323 after the CI build of #15320 was run and before #15320 was merged.

Reviewers: Luke Chen <showuon@gmail.com>, Lucas Brutschy <lbrutschy@confluent.io>
2024-02-21 11:54:46 +01:00
Lianet Magrans 77ba06fa62
KAFKA-16033: Commit retry logic fixes (#15357)
This change modifies the commit manager for improved retry logic & fixing bugs:

- defines high level functions for each of the different types of commit: commitSync, commitAsync, autoCommitSync (used from consumer close), autoCommitAsync (on interval), autoCommitNow (before revocation).
 - moves retry logic to these caller functions, keeping a common response error handling that propagates errors that each caller functions retry as it needs.

Fixes the following issues:

- auto-commit before revocation should retry with latest consumed offsets
- auto-commit before revocation should only reset the timer once, when the rebalance completes
- StaleMemberEpoch error (fatal) is considered retriable only when committing offsets before revocation, where it is retried with backoff if the member has a valid epoch. All other commits will fail fatally on stale epoch. Note that auto commit on the interval (autoCommitAsync) does not have any specific retry logic for the stale epoch, but will effectively retry on the next interval (as it does for any other fatal error)
- fix duplicated and noisy logs for auto-commit

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-02-21 11:08:37 +01:00
Paolo Patierno 9118ad653f
Additional fix on the rollback migration documentation (#15317)
This is an additional PR to fix rollback migration documentation related to #15287

Reviewers: Luke Chen <showuon@gmail.com>
2024-02-21 17:43:05 +08:00
Lucas Brutschy fcbfd3412e
KAFKA-16284: Fix performance regression in RocksDB (#15393)
A performance regression introduced in commit 5bc3aa4 reduces the write performance in RocksDB by ~3x. The bug is that we fail to pass the WriteOptions that disable the write-ahead log into the DB accessor.

For testing, the time to write 10 times 1 Million records into one RocksDB each were measured:

Before 5bc3aa4: 7954ms, 12933ms
After 5bc3aa4: 30345ms, 31992ms
After 5bc3aa4 with this fix: 8040ms, 10563ms
On current trunk with this fix: 9508ms, 10441ms

Reviewers: Bruno Cadonna <bruno@confluent.io>, Nick Telford <nick.telford@gmail.com>
2024-02-21 09:01:53 +01:00
Artem Livshits c1fdcb2a27
MINOR: extend transaction unit test to validate drain (#15320)
There is a test already that checks that transactional messages are not
drained when partition is not added, this change just logically
completes the test to also show that messages can be drained after
parition is added.

Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Tzu-Li (Gordon) Tai <tzulitai@apache.org>, Justine Olshan <jolshan@confluent.io>
2024-02-20 15:13:03 -08:00
Matthias J. Sax 4c70581eb6
KAFKA-15770: IQv2 must return immutable position (#15219)
ConsistencyVectorIntegrationTest failed frequently because the return
Position from IQv2 is not immutable while the test assume immutability.
To return a Position with a QueryResult that does not change, we need to
deep copy the Position object.

Reviewers: John Roesler <john@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2024-02-20 12:24:32 -08:00
Omnia Ibrahim ead2431c37
MINOR: Remove unwanted debug line in LogDirFailureTest (#15371)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Justine Olshan <jolshan@confluent.io>, Igor Soarez <soarez@apple.com>
2024-02-20 11:25:17 +01:00
Lucas Brutschy 5854139cd8
KAFKA-16243: Make sure that we do not exceed max poll interval inside poll (#15372)
The consumer keeps a poll timer, which is used to ensure liveness of the application thread. The poll timer automatically updates while the Consumer.poll(Duration) method is blocked, while the newer consumer only updates the poll timer when a new call to Consumer.poll(Duration) is issued. This means that the kafka-console-consumer.sh tools, which uses a very long timeout by default, works differently with the new consumer, with the consumer proactively rejoining the group during long poll timeouts.

This change solves the problem by (a) repeatedly sending PollApplicationEvents to the background thread, not just on the first call of poll and (b) making sure that the application thread doesn't block for so long that it runs out of max.poll.interval.

An integration test is added to make sure that we do not rejoin the group when a long poll timeout is used with a low max.poll.interval.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2024-02-20 10:48:36 +01:00
runom a26a1d847f
MINOR: fix MetricsTest.testBrokerTopicMetricsBytesInOut (#14744)
The assertion to check BytesOut doesn't include replication was performed before replication occurred.
This PR fixed the position of the assertion.

Reviewers: Luke Chen <showuon@gmail.com>
2024-02-20 11:23:06 +08:00
Ahmed Sobeh 6eea40882d
KAFKA-15349: ducker-ak should fail fast when gradlew systemTestLibs fails (#15391)
In this modification, if ./gradlew systemTestLibs fails, the script will output an error message and terminate execution using the die function. This ensures that the script fails fast and prompts the user to address the error before continuing.

Reviewers: Luke Chen <showuon@gmail.com>
2024-02-20 11:19:53 +08:00
Josep Prat b71999be95
MINOR: Clean up core modules (#15279)
This PR cleans up: metrics, migration, network, raft, security, serializer, tools, utils, and zookeeper package classes

Mark methods and fields private where possible
Annotate public methods and fields
Remove unused classes and methods
Make sure Arrays are not printed with .toString
Optimize minor warnings

Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-02-19 16:54:50 +01:00
Lucas Brutschy 1442862bbd
KAFKA-16009: Fix PlaintextConsumerTest. testMaxPollIntervalMsDelayInRevocation (#15383)
The wake-up mechanism in the new consumer is preventing from committing within a rebalance listener callback. The reason is that we are trying to register two wake-uppable actions at the same time.

The fix is to register the wake-uppable action more closely to where we are in fact blocking on it, so that the action is not registered when we execute rebalance listeneners and callback listeners.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2024-02-19 15:33:37 +01:00
Owen Leung 71a4e6fc0c
KAFKA-15140: improve TopicCommandIntegrationTest to be less flaky (#14891)
This PR improves TopicCommandIntegrationTest by :
    - using TestUtils.createTopicWithAdmin
    - replacing \n with lineSeperator
    - using waitForAllReassignmentsToComplete
    - adding more log when assertion fails

Reviewers: Luke Chen <showuon@gmail.com>, Justine Olshan <jolshan@confluent.io>
2024-02-19 19:37:31 +08:00
David Jacot e247bd03af
MINOR: Improve ListConsumerGroupTest.testListGroupCommand (#15382)
While reviewing https://github.com/apache/kafka/pull/15150, I found that our tests verifying the console output are really hard to read. Here is my proposal to make it better.

Reviewers: Ritika Reddy <rreddy@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-02-17 00:07:50 -08:00
Luke Chen 98fb3bd304
MINOR: log error when initialLoadFuture is not done in authorizer (#14953)
Currently, when initializing StandardAuthorizer, it'll wait until all ACL loaded and complete the initialLoadFuture. So, checking logs, we'll see:

2023-12-06 14:07:50,325 INFO [StandardAuthorizer 1] Initialized with 6 acl(s). (org.apache.kafka.metadata.authorizer.StandardAuthorizerData) [kafka-1-metadata-loader-event-handler]
2023-12-06 14:07:50,325 INFO [StandardAuthorizer 1] Completed initial ACL load process. (org.apache.kafka.metadata.authorizer.StandardAuthorizerData) [kafka-1-metadata-loader-event-handler]

But then, when shutting down the node, we will also see this error:

2023-12-06 14:12:32,752 ERROR [StandardAuthorizer 1] Failed to complete initial ACL load process. (org.apache.kafka.metadata.authorizer.StandardAuthorizerData) [kafka-1-metadata-loader-event-handler]
java.util.concurrent.TimeoutException
	at kafka.server.metadata.AclPublisher.close(AclPublisher.scala:98)
	at org.apache.kafka.image.loader.MetadataLoader.closePublisher(MetadataLoader.java:568)
	at org.apache.kafka.image.loader.MetadataLoader.lambda$removeAndClosePublisher$7(MetadataLoader.java:528)
	at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
	at java.base/java.lang.Thread.run(Thread.java:840)

It's confusing. And it's because we'll try to complete authorizer initialLoad, and complete the initialLoadFuture if not done. But we'll log the error no matter it's completed or not. This patch improves the logging.

Reviewers: Josep Prat <josep.prat@aiven.io>
2024-02-17 13:58:11 +08:00
Calvin Liu 756f44a3e5
KAFKA-15665: Enforce partition reassignment should complete when all target replicas are in ISR (#15359)
When completing the partition reassignment, the new ISR should have all the target replicas.

Reviewers: Justine Olshan <jolshan@confluent.io>, David Mao <dmao@confluent.io>
2024-02-16 10:27:43 -08:00
Matthias J. Sax 0789952b68
MINOR: add note about Kafka Streams feature for 3.7 release (#15380)
Reviewers: Walker Carlson <wcarlson@confluent.io>
2024-02-16 08:59:48 -08:00
Paolo Patierno ddc5d1dc34
MINOR: Added ACLs authorizer change during migration (#15333)
This trivial PR makes clear when it's the right time to switch from AclAuthorizer to StandardAuthorizer during the migration process.

Reviewers: Luke Chen <showuon@gmail.com>
2024-02-16 18:57:22 +08:00
Luke Chen 501f82b91c
KAFKA-15670: add "inter.broker.listener.name" config in KRaft controller config (#14631)
During ZK migrating to KRaft, before entering dual-write mode, the KRaft controller will send RPCs (i.e. UpdateMetadataRequest, LeaderAndIsrRequest, and StopReplicaRequest) to the brokers. Currently, we use the inter broker listener to send the RPC to brokers from the controller. But in the doc, we didn't provide this info to users because the normal KRaft controller won't use inter.broker.listener.names.

This PR adds the missing config in the ZK migrating to KRaft doc.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Paolo Patierno <ppatierno@live.com>
2024-02-16 18:56:06 +08:00
Kirk True 7a07aefd86
KAFKA-16230: Update verifiable_consumer.py to support KIP-848’s group protocol config (#15328)
The Python VerifiableConsumer now passes in the --group-protocol and --group-remote-assignor command line arguments to VerifiableConsumer if the node is running 3.7.0+ and using the new consumer group.protocol.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2024-02-16 11:53:53 +01:00
Lianet Magrans 14d5e17070
KAFKA-16165: Fix invalid transition on poll timer expiration (#15375)
This fixes an invalid transition (leaving->stale) that was discovered in the system tests. The underlying issue was that the poll timer expiration logic was blindly forcing a transition to stale and sending a leave group, without considering that the member could be already leaving.
The fix included in this PR ensures that the poll timer expiration logic, whose purpose is to leave the group, is only applied if the member is not already leaving. Note that it also fixes the transition out of the STALE state, that should only happen when the poll timer is reset.

As a result of this changes:

If the poll timer expires while the member is not leaving, the poll timer expiration logic is applied: it will transition to stale, send a leave group, and remain in STALE state until the timer is reset. At that point the member will transition to JOINING to rejoin the group.
If the poll timer expires while the member is already leaving, the poll timer expiration logic does not apply, and just lets the HB continue. Not that this would be the case of member in PREPARE_LEAVING waiting for callbacks to complete (needs to continue sending HB), or LEAVING (needs to send the last HB to leave).

Reviewers: Kirk True <ktrue@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2024-02-16 11:38:46 +01:00
Kirk True 051d4274da
KAFKA-16167: re-enable PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup (#15358)
This integration test is now passing, presumably based on recent related changes. Re-enabling to ensure it is included in the test suite to catch any regressions.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-02-16 09:05:02 +01:00
Minha, Jeong 553f45bca8
MINOR: Fix toString method of IsolationLevel (#14782)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Ashwin Pankaj <apankaj@confluent.io>
2024-02-15 19:07:18 -08:00
Mayank Shekhar Narula ff90f78c70
KAFKA-16226; Reduce synchronization between producer threads (#15323)
As this [JIRA](https://issues.apache.org/jira/browse/KAFKA-16226) explains, there is increased synchronization between application-thread, and the background thread as the background thread started to synchronized methods Metadata.currentLeader() in [original PR](https://github.com/apache/kafka/pull/14384). So this PR does the following changes
1. Changes background thread, i.e. RecordAccumulator's partitionReady(), and drainBatchesForOneNode(), to not use `Metadata.currentLeader()`. Instead rely on `MetadataCache` that is immutable. So access to it is unsynchronized.
2.  This PR repurposes `MetadataCache` as an immutable snapshot of Metadata. This is a wrapper around public `Cluster`. `MetadataCache`'s API/functionality should be extended for internal client usage Vs public `Cluster`. For example, this PR adds `MetadataCache.leaderEpochFor()`
3. Rename `MetadataCache` to `MetadataSnapshot` to make it explicit its immutable.

**Note both `Cluster` and `MetadataCache` are not syncronized, hence reduce synchronization from the hot path for high partition counts.**

Reviewers: Jason Gustafson <jason@confluent.io>
2024-02-15 09:26:47 -08:00
David Jacot 5edf52359a
MINOR: Fix group metadata loading log (#15368)
Spotted the following log: 
```
[2024-02-14 09:59:30,103] INFO [GroupCoordinator id=1] Finished loading of metadata from 39 in __consumer_offsets-4ms with epoch 2 where 39ms was spent in the scheduler. Loaded 0 records which total to 0 bytes. (org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime)
```
The partition and the time are incorrect. This patch fixes it.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Bruno Cadonna <bruno@confluent.io>
2024-02-15 00:19:43 -08:00
David Jacot d378ad39fa
MINOR: Fix KafkaAdminClientTest.testClientInstanceId (#15370)
This patch tries to address the [flakiness](https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Europe%2FZurich&tests.container=org.apache.kafka.clients.admin.KafkaAdminClientTest&tests.sortField=FLAKY&tests.test=testClientInstanceId()) of `KafkaAdminClientTest.testClientInstanceId`. The test fails with `org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.`. I believe that it is because 10ms is not enough when our CI is busy. Let's try with 1s.

Reviewers: Bruno Cadonna <bruno@confluent.io>
2024-02-14 23:25:35 -08:00
Lucas Brutschy e8c70fce26
KAFKA-16155: Re-enable testAutoCommitIntercept (#15334)
The main bug causing this test to fail as described in the ticket was already fixed.

The test is still flaky if unchanged, because in the new consumer, the assignment can
change in between two polls. Interceptors are only executed inside poll (and have to be,
since they must run as part of the application thread), so we need to modify the
integration test to call poll once after observing that the assignment changed.

Reviewers: Bruno Cadonna <bruno@confluent.io>
2024-02-14 16:09:48 +01:00
David Jacot d24abe0ede
MINOR: Refactor GroupMetadataManagerTest (#15348)
`GroupMetadataManagerTest` class got a little under control. We have too many things defined in it. As a first steps, this patch extracts all the inner classes. It also extracts all the helper methods. However, the logic is not changed at all.

Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Justine Olshan <jolshan@confluent.io>
2024-02-13 23:29:29 -08:00
Omnia Ibrahim be6653c8bc
KAFKA-16225 [1/N]: Set metadata.log.dir to broker in KRAFT mode in integration test
Fix the flakiness of LogDirFailureTest by setting a separate metadata.log.dir for brokers in KRAFT mode.

The test was flaky because as we call causeLogDirFailure some times we impact the first log.dir which also is KafkaConfig.metadataLogDir as we don't have metadata.log.dir. So to fix the flakiness we need to explicitly set metadata.log.dir to diff log dir than the ones we could potentially fail for the tests. 

This is part 1 of the fixes. Delivering them separately as the other issues were not as clear cut.

Reviewers: Gaurav Narula <gaurav_narula2@apple.com>, Justine Olshan <jolshan@confluent.io>, Greg Harris <greg.harris@aiven.io>
2024-02-13 14:13:53 -08:00
Mickael Maison 0bf830fc9c
KAFKA-14576: Move ConsoleConsumer to tools (#15274)
Reviewers: Josep Prat <josep.prat@aiven.io>, Omnia Ibrahim <o.g.h.ibrahim@gmail.com>
2024-02-13 19:24:07 +01:00
Lucas Brutschy 8c0488b887
MINOR: ignore heartbeat response if leaving (#15362)
When the consumer enters state LEAVING, it sets the epoch to the leave epoch,
such as -1. When the timing is right, we may get a heartbeat response after
entering the state LEAVING, which resets the epoch to the member epoch on
the server. The result is that the consumer never leaves the group.

Seems like c6f4c604d8 changed the timing inside
the consumer to relatively frequently triggers this problem inside
`DescribeConsumerGroupTest`.

We fix it by ignoring any heartbeat responses when we are in state LEAVING.

Reviewers: David Jacot <djacot@confluent.io>
2024-02-13 19:08:13 +01:00
Gantigmaa Selenge fed3c3da84
KAFKA-14822: Allow restricting File and Directory ConfigProviders to specific paths (#14995)
Reviewers: Greg Harris <gharris1727@gmail.com>, Mickael Maison <mickael.maison@gmail.com>
2024-02-13 18:28:28 +01:00