The choice of using name vs. user as a parameter is because internally the record uses name, all
tests using the StorageTool use name as a parameter, KafkaPrincipals are created with name and
because creating SCRAM credentials is done with --entity-name
Reviewers: Colin P. McCabe <cmccabe@apache.org>
This PR adds the interface for grace period to the Joined object as well as uses the buffer. The majority of it is tests and moving some of the existing join logic.
Reviewers: Victoria Xia <victoria.xia@confluent.io>, Bruno Cadonna <cadonna@apache.org>
Fixed a regression described in KAFKA-15053 that security.protocol only allows uppercase values like PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL. With this fix, both lower case and upper case values will be supported (e.g. PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL, plaintext, ssl, sasl_plaintext, sasl_ssl)
Reviewers: Chris Egerton <chrise@aiven.io>, Divij Vaidya <diviv@amazon.com>
When Kafka fails to perform an atomic file move the error is getting swallowed. Kafka should log these cases at least at WARN level.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, Kirk True <kirk@kirktrue.pro>
If the follower has an empty log, fetches with offset 0, it is more
efficient for the leader to reply with a snapshot id (redirect to
FETCH_SNAPSHOT) than for the follower to continue fetching from the log
segments.
Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>
Adding the following metrics as per kip-890:
VerificationTimeMs – number of milliseconds from adding partition info to the manager to the time the response is sent. This will include the round trip to the transaction coordinator if it is called. This will also account for verifications that fail before the coordinator is called.
VerificationFailureRate – rate of verifications that returned in failure either from the AddPartitionsToTxn response or through errors in the manager.
AddPartitionsToTxnVerification metrics – separating the verification request metrics from the typical add partitions ones similar to how fetch replication and fetch consumer metrics are separated.
Reviewers: Divij Vaidya <diviv@amazon.com>
Update "requests" lib used in system tests to version "2.31.0" to fix CVE-2023-32681: Unintended leak of Proxy-Authorization header in requests
Reviewers: Divij Vaidya <diviv@amazon.com>
This fixes a regression introduced by the previous KAFKA-15109 commit (d0457f7360 on trunk).
Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@apache.org>
RPCProducerIdManager initiates an async request to the controller to grab a block of producer IDs and then blocks waiting for a response from the controller.
This is done in the request handler threads while holding a global lock. This means that if many producers are requesting producer IDs and the controller is slow to respond, many threads can get stuck waiting for the lock.
This patch aims to:
* resolve the deadlock scenario mentioned above by not waiting for a new block and returning an error immediately
* remove synchronization usages in RpcProducerIdManager.generateProducerId()
* handle errors returned from generateProducerId() so that KafkaApis does not log unexpected errors
* confirm producers backoff before retrying
* introduce backoff if manager fails to process AllocateProducerIdsResponse
Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>
An example of the warning:
> warning: [lossy-conversions] implicit cast from long to int in compound assignment is possibly lossy
There should be no change in behavior as part of these changes - runtime logic ensured
we didn't run into issues due to the lossy conversions.
Reviewers: Divij Vaidya <diviv@amazon.com>
This patch introduces the GroupCoordinatorService. This is the new (incomplete) implementation of the group coordinator based on the coordinator runtime introduced in https://github.com/apache/kafka/pull/13795.
Reviewers: Divij Vaidya <diviv@amazon.com>, Justine Olshan <jolshan@confluent.io>
While in migration mode, the KRaft controller must always bump the leader epoch when shrinking an ISR.
This is required to maintain compatibility with the ZK brokers. Without the epoch bump, the ZK brokers
will ignore the partition state change present in the LeaderAndIsrRequest since it would not contain a new
leader epoch.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Use thread safe Caffeine to cache indexes fetched from RemoteTier locally. This PR removes a lock contention that led to higher fetch latencies as the IO threads spent time unnecessarily waiting on global cache lock while a single thread fetches the index from remote tier. See PR #13850 for details and rejected alternatives.
Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
Fixes thread leaks by closing the ReplicaManager using try/finally at the end of each test. The leaks were leading to flaky test failures in ReplicaManagerTest.
Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>
The Java rewrite is kept relatively close to the Scala original
to minimize potential newly introduced bugs and to make reviewing
simpler. The following details might be of note:
- The `Logging` trait moved to InterBrokerSendThread with the
rewrite of ShutdownableThread has been similarly moved to any
subclasses that currently use it. InterBrokerSendThread's own
logging has been made to use ShutdownableThread's logger which
mimics the prefix/log identifier that the trait provided.
- The case RequestAndCompletionHandler class has been made a
separate POJO class and the internal-use UnsentRequests class
has been kept as a static nested class.
- The relatively commonly used but internal (not part of the
public API) clients classes that InterBrokerSendThread relies on
have been allowlisted in the server-common import control.
- The accompanying test class has also been moved and rewritten
with one new test added and most of the pre-existing tests made
stricter.
Reviewers: David Jacot <djacot@confluent.io>
Replaces EasyMock and PowerMock with Mockito in TimeOrderedCachingPersistentWindowStoreTest.
Reviewers: Divij Vaidya <diviv@amazon.com>, Guozhang Wang <wangguoz@gmail.com>, Bruno Cadonna <cadonna@apache.org>
This PR expands the scope of ApiVersionManager a bit to include returning the current
MetadataVersion and features that are in effect. This is useful in general because that information
needs to be returned in an ApiVersionsResponse. It also allows us to fix the ApiVersionManager
interface so that all subclasses implement all methods of the interface. Having subclasses that
don't implement some methods is dangerous because they could cause exceptions at runtime in
unexpected scenarios.
On the KRaft controller, we were previously performing a read operation in the QuorumController
thread to get the current metadata version and features. With this PR, we now read a volatile
variable maintained by a separate MetadataVersionContextPublisher object. This will improve
performance and simplify the code. It should not change the guarantees we are providing; in both
the old and new scenarios, we need to be robust against version skew scenarios during updates.
Add a Features class which just has a 3-tuple of metadata version, features, and feature epoch.
Remove MetadataCache.FinalizedFeaturesAndEpoch, since it just duplicates the Features class.
(There are some additional feature-related classes that can be consolidated in in a follow-on PR.)
Create a java class, EndpointReadyFutures, for managing the futures associated with individual
authorizer endpoints. This avoids code duplication between ControllerServer and BrokerServer and
makes this code unit-testable.
Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>
Change Timer.java to implement AutoCloseable because automatic bug finders will flag a warning if an object of a class is marked as AutoCloseable but is not closed properly in the code.
Reviewers: Divij Vaidya <diviv@amazon.com>
This test still fails regularly with the following error:
```
Error
java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: Timed out while awaiting expected assignment Set(topicWithAllPartitionsOnAllRacks-0, topicWithSingleRackPartitions-0). The current assignment is []
Stacktrace
java.util.concurrent.ExecutionException: org.opentest4j.AssertionFailedError: Timed out while awaiting expected assignment Set(topicWithAllPartitionsOnAllRacks-0, topicWithSingleRackPartitions-0). The current assignment is []
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
at integration.kafka.server.FetchFromFollowerIntegrationTest.$anonfun$testRackAwareRangeAssignor$9(FetchFromFollowerIntegrationTest.scala:211)
at integration.kafka.server.FetchFromFollowerIntegrationTest.$anonfun$testRackAwareRangeAssignor$9$adapted(FetchFromFollowerIntegrationTest.scala:211)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:575)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:573)
```
I propose to increase the timeouts to 30 secs to mitigate it. The test already uses 30 secs timeouts in many places. This patch uses 30 secs everywhere. This solution is not optimal but this is better than having a flaky test.
Reviewers: Justine Olshan <jolshan@confluent.io>
I have seen failures like the following one in a few builds:
```
Build / JDK 11 and Scala 2.13 / testDescribeSimpleConsumerGroup() – kafka.admin.DescribeConsumerGroupTest
org.apache.kafka.common.errors.TopicExistsException: Topic '__consumer_offsets' already exists.
```
Many tests still use `TestUtils.createOffsetsTopic(zkClient, servers)` to create the offsets topic. This method does not handle the case where the topic exists (e.g. in the case of a retry). This patch adds this logic.
Reviewers: Divij Vaidya <diviv@amazon.com>, Justine Olshan <jolshan@confluent.io>