Commit Graph

3933 Commits

Author SHA1 Message Date
Divij Vaidya 0a50005408
KAFKA-13929: Replace legacy File.createNewFile() with NIO.2 Files.createFile() (#12197)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-06-10 13:28:55 +02:00
Joel Hamill 249cd4461f
MINOR: Fix typo in Kafka config docs (#12268)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-06-08 13:51:41 -07:00
José Armando García Sancio 21490af989
MINOR; Test last committed record offset for Controllers (#12249)
As part of KIP-835, LastCommittedRecordOffset was added to the
KafkaController metric type. Make sure to test that metric.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-06-08 10:45:04 -07:00
Jason Gustafson 4542acdc14
MINOR: Convert `ReassignPartitionsIntegrationTest` to KRaft (#12258)
Updates relevant tests in `ReassignPartitionsIntegrationTest` for KRaft. We skip JBOD tests since it is not supported and we skip `AlterPartition` upgrade tests since they are not relevant.

Reviewers: Kvicii <Karonazaba@gmail.com>, David Arthur <mumrah@gmail.com>
2022-06-07 20:59:24 -07:00
Jason Gustafson 49a0e0d72e
MINOR: Fix kraft timeout in LogOffsetTest (#12262)
Fixes the timeouts we have been seeing in `LogOffsetTest` when KRaft is enabled. The problem is the dependence on `MockTime`. In the KRaft broker, we need a steadily advancing clock for events in `KafkaEventQueue` to get executed. In the case of the timeouts, the broker was stuck with the next heartbeat event in the queue. We depended on the execution of this event in order to send the next heartbeat and complete the `initialCatchUpFuture` and finish broker startup. This caused the test to get stuck during initialization, which is probably why the `@Timeout` wasn't working. The patch fixes the problem by using system time instead because there was not a strong dependence on `MockTime`.

Reviewers: David Arthur <mumrah@gmail.com>
2022-06-07 17:50:45 -07:00
David Arthur 806098ffe1
KAFKA-13410; Add a --release-version flag for storage-tool (#12245)
This patch removes the --metadata-version and adds a --release-version to the kafka-storage tool. This change is not a breaking change since we are removing --metadata-version which was introduced on May 18, but it has not been released yet.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>
2022-06-07 11:25:40 -07:00
David Jacot 151ca12a56
KAFKA-13916; Fenced replicas should not be allowed to join the ISR in KRaft (#12240)
This PR implements the first part of KIP-841. Specifically, it implements the following:

1. Adds a new metadata version.
2. Adds the InControlledShutdown field to the BrokerRegistrationRecord and BrokerRegistrationChangeRecord and bump their versions. The newest versions are only used if the new metadata version is enabled.
3. Writes a BrokerRegistrationChangeRecord with InControlledShutdown set when a broker requests a controlled shutdown.
4. Ensures that fenced and in controlled shutdown replicas are not picked as leaders nor included in the ISR.
5. Adds or extends unit tests.

Reviewes: José Armando García Sancio <jsancio@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>
2022-06-07 10:37:20 -07:00
Kvicii 09570f2540
KAFKA-13592; Fix flaky test ControllerIntegrationTest.testTopicIdUpgradeAfterReassigningPartitions (#11687)
Fixes several race conditions in the test case causing the flaky failure.

Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, Jason Gustafson <jason@confluent.io>

Co-authored-by: Kvicii <Karonazaba@gmail.com>
2022-06-06 17:56:48 -07:00
Divij Vaidya 601051354b
MINOR: Correctly mark some tests as integration tests (#12223)
Also fix package name of `ListOffsetsIntegrationTest`.

Reviewers: dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-06-06 11:18:24 -07:00
dengziming 0b3ab4687e
KIP-835: metadata.max.idle.interval.ms shoud be much bigger than broker.heartbeat.interval.ms (#12238)
The active quorum controller will append NoOpRecord periodically to increase metadata LEO, however, when a broker startup, we will wait until its metadata LEO catches up with the controller LEO, we generate NoOpRecord every 500ms and send heartbeat request every 2000ms.

It's almost impossible for a broker to catch up with the controller LEO if the broker sends a query request every 2000ms but the controller LEO increases every 500ms, so the tests in KRaftClusterTest will fail.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, showuon <43372967+showuon@users.noreply.github.com>, Colin Patrick McCabe <cmccabe@confluent.io>
2022-06-03 11:39:30 -07:00
Rittika Adhikari 3467036e01
KAFKA-13803: Refactor Leader API Access (#12005)
This PR refactors the leader API access in the follower fetch path.

Added a LeaderEndPoint interface which serves all access to the leader.

Added a LocalLeaderEndPoint and a RemoteLeaderEndPoint which implements the LeaderEndPoint interface to handle fetches from leader in local & remote storage respectively.

Reviewers: David Jacot <djacot@confluent.io>, Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>
2022-06-03 09:12:06 -07:00
Kvicii 7e71483aed
MINOR: fix doc (#12243)
Reviewers: Luke Chen <showuon@gmail.com>
2022-06-03 15:56:13 +08:00
RivenSun d8d92f0f80
MINOR: Update the kafka-reassign-partitions script command in documentation (#12237)
Reviewers: Luke Chen <showuon@gmail.com>
2022-06-02 21:30:22 +08:00
Luke Chen fa33fb4d3c
KAFKA-13773: catch kafkaStorageException to avoid broker shutdown directly (#12136)
When logManager startup and loadLogs, we expect to catch any IOException (ex: out of space error) and turn the log dir into offline. Later, we'll handle the offline logDir in ReplicaManage, so that the cleanShutdown file won't be created when all logDirs are offline. The reason why the broker shutdown with cleanShutdown file after full disk is because during loadLogs and do log recovery, we'll write leader-epoch-checkpoint fil. And if any IOException thrown, we'll wrap it as KafkaStorageException and rethrow. And since we don't catch KafkaStorageException, so the exception is caught in the other place and go with clean shutdown path.

This PR is to fix the issue by catching the KafkaStorageException with IOException cause exceptions during loadLogs, and mark the logDir as offline to let the ReplicaManager handle the offline logDirs.

Reviewers: Jun Rao <jun@confluent.io>, Alok Thatikunta <alok123thatikunta@gmail.com>
2022-06-02 14:15:51 +08:00
Jason Gustafson 0f9f7e6c78
MINOR: Enable kraft support in quota integration tests (#12217)
Enable kraft support in BaseQuotaTest and its extensions.

Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>
2022-06-01 16:56:10 -07:00
Colin Patrick McCabe 0ca9cd4d2d
MINOR: Several fixes and improvements for FeatureControlManager (#12207)
This PR fixes a bug where FeatureControlManager#replay(FeatureLevelRecord) was throwing an
exception if not all controllers in the quorum supported the feature being applied. While we do
want to validate this, it needs to be validated earlier, before the record is committed to the log.
Once the record has been committed to the log it should always be applied if the current controller
supports it.

Fix another bug where removing a feature was not supported once it had been configured. Note that
because we reserve feature level 0 for "feature not enabled", we don't need to use
Optional<VersionRange>; we can just return a range of 0-0 when the feature is not supported.

Allow the metadata version to be downgraded when UpgradeType.UNSAFE_DOWNGRADE has been set.
Previously we were unconditionally denying this even when this was set.

Add a builder for FeatureControlManager, so that we can easily add new parameters to the
constructor in the future. This will also be useful for creating FeatureControlManagers that are
initialized to a specific MetadataVersion.

Get rid of RemoveFeatureLevelRecord, since it's easier to just issue a FeatureLevelRecord with
the level set to 0.

Set metadata.max.idle.interval.ms to 0 in RaftClusterSnapshotTest for more predictability.

Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>
2022-06-01 16:09:38 -07:00
dengziming 1d6e3d6cb3
KAFKA-13845: Add support for reading KRaft snapshots in kafka-dump-log (#12084)
The kafka-dump-log command should accept files with a suffix of ".checkpoint". It should also decode and print using JSON the snapshot header and footer control records.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-06-01 14:49:00 -07:00
José Armando García Sancio 7d1b0926fa
KAFKA-13883: Implement NoOpRecord and metadata metrics (#12183)
Implement NoOpRecord as described in KIP-835. This is controlled by the new
metadata.max.idle.interval.ms configuration.

The KRaft controller schedules an event to write NoOpRecord to the metadata log if the metadata
version supports this feature. This event is scheduled at the interval defined in
metadata.max.idle.interval.ms. Brokers and controllers were improved to ignore the NoOpRecord when
replaying the metadata log.

This PR also addsffour new metrics to the KafkaController metric group, as described KIP-835.

Finally, there are some small fixes to leader recovery. This PR fixes a bug where metadata version
3.3-IV1 was not marked as changing the metadata. It also changes the ReplicaControlManager to
accept a metadata version supplier to determine if the leader recovery state is supported.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-06-01 10:48:24 -07:00
Clara Fang 31a84dd72e
KAFKA-13946; Add missing parameter to kraft test kit `ControllerNode.setMetadataDirectory()` (#12225)
Added parameter `metadataDirectory` to `setMetadataDirectory()` so that `this.metadataDirectory` would not be set to itself.

Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-05-30 15:55:07 -07:00
Jason Gustafson 645c1ba526
MINOR: Fix buildResponseSend test cases for envelope responses (#12185)
The test cases we have in `RequestChannelTest` for `buildResponseSend` construct the envelope request incorrectly. The request is created using the envelope context, but also a reference to the wrapped envelope request object. This patch fixes `TestUtils.buildEnvelopeRequest` so that the wrapped request is built properly. It also fixes the dependence on this incorrect construction and consolidates the tests in `RequestChannelTest` to avoid duplication.

Reviewers: dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>
2022-05-30 11:34:36 -07:00
bozhao12 620ada9888
MINOR: Fix typo in ClusterTestExtensionsTest (#12218)
Reviewers: Kvicii <Karonazaba@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-05-27 13:38:54 -07:00
Colin Patrick McCabe 7143267f71
MINOR: Fix some bugs with UNREGISTER_BROKER
Fix some bugs in the KRaft unregisterBroker API and add a junit test.

1. kafka-cluster-tool.sh unregister should fail if no broker ID is passed.

2. UnregisterBrokerRequest must be marked as a KRaft broker API so 
that KRaft brokers can receive it.

3. KafkaApis.scala must forward UNREGISTER_BROKER to the controller.

Reviewers: Jason Gustafson <jason@confluent.io>, dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>
2022-05-26 14:07:29 -07:00
David Arthur 4efdc1a310
MINOR: Consolidate FinalizedFeatureCache into MetadataCache (#12214)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-05-26 16:25:58 -04:00
Jason Gustafson 43160bc476
MINOR: Add timeout to LogOffsetTest (#12213)
Reviewers: Kvicii <Karonazaba@gmail.com>, David Arthur <mumrah@gmail.com>
2022-05-26 16:07:54 -04:00
David Jacot 76477ffd2d
KAFKA-13858; Kraft should not shutdown metadata listener until controller shutdown is finished (#12187)
When the kraft broker begins controlled shutdown, it immediately disables the metadata listener. This means that metadata changes as part of the controlled shutdown do not get sent to the respective components. For partitions that the broker is follower of, that is what we want. It prevents the follower from being able to rejoin the ISR while still shutting down. But for partitions that the broker is leading, it means the leader will remain active until controlled shutdown finishes and the socket server is stopped. That delay can be as much as 5 seconds and probably even worse.

This PR revises the controlled shutdown procedure as follow:
* The broker signals to the replica manager that it is about to start the controlled shutdown.
* The broker requests a controlled shutdown to the controller.
* The controller moves leaders off from the broker, removes the broker from any ISR that it is a member of, and writes those changes to the metadata log.
* When the broker receives a partition metadata change, it looks if it is in the ISR. If it is, it updates the partition as usual. If it is not or if there is no leader defined--as would be the case if the broker was the last member of the ISR--it stops the fetcher/replica. This basically stops all the partitions for which the broker was part of their ISR.

When the broker is a replica of a partition but it is not in the ISR, the controller does not do anything. The leader epoch is not bumped. In this particular case, the follower will continue to run until the replica manager shuts down. In this time, the replica could become in-sync and the leader could try to bring it back to the ISR. This remaining issue will be addressed separately.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-25 16:09:01 -07:00
dengziming 54d60ced86
KAFKA-13833: Remove the min_version_level from the finalized version range written to ZooKeeper (#12062)
Reviewers: David Arthur <mumrah@gmail.com>
2022-05-25 14:02:34 -04:00
Divij Vaidya f6ba10ef9c
MINOR: Fix flaky test TopicCommandIntegrationTest.testDescribeAtMinIsrPartitions(String).quorum=kraft (#12189)
Flaky test as failed in CI https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-12184/1/tests/

The test fails because it does not wait for metadata to be propagated across brokers before killing a broker which may lead to it getting stale information. Note that a similar test was done in #12104 for a different test.

Reviewers: Kvicii Y, Ziming Deng, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2022-05-21 10:33:44 -07:00
dengziming 6380652a5a
KAFKA-13863; Prevent null config value when create topic in KRaft mode (#12109)
This patch ensures consistent handling of null-valued topic configs between the zk and kraft controller. Prior to this patch, we returned INVALID_REQUEST in zk mode and it was not an error in kraft. After this patch, we return INVALID_CONFIG consistently for this case.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-19 09:46:48 -07:00
Jason Gustafson 8efdbce523
KAFKA-13837; Return an error from Fetch if follower is not a valid replica (#12150)
When a partition leader receives a `Fetch` request from a replica which is not in the current replica set, the behavior today is to return a successful fetch response, but with empty data. This causes the follower to retry until metadata converges without updating any state on the leader side. It is clearer in this case to return an error, so that the metadata inconsistency is visible in logging and so that the follower backs off before retrying. 

In this patch, we use `UNKNOWN_LEADER_EPOCH` when the `Fetch` request includes the current leader epoch. The way we see this is that the leader is validating the (replicaId, leaderEpoch) tuple. When the leader returns `UNKNOWN_LEADER_EPOCH`, it means that the leader does not expect the given leaderEpoch from that replica. If the request does not include a leader epoch, then we use `NOT_LEADER_OR_FOLLOWER`. We can take a similar interpretation for this case: the leader is rejecting the request because it does not think it should be the leader for that replica. But mainly these errors ensure that the follower will retry the request.

As a part of this patch, I have refactored the way that the leader updates follower fetch state. Previously, the process is a little convoluted. We send the fetch from `ReplicaManager` down to `Partition.readRecords`, then we iterate over the results and call `Partition.updateFollowerFetchState`. It is more straightforward to update state directly as a part of `readRecords`. All we need to do is pass through the `FetchParams`. This also prevents an unnecessary copy of the read results.

Reviewers: David Jacot <djacot@confluent.io>
2022-05-18 20:58:20 -07:00
bozhao12 b4f35c9ce0
MINOR: Fix typo in ReplicaManagerTest (#12178)
Reviewer: Luke Chen <showuon@gmail.com>
2022-05-19 10:28:47 +08:00
Jason Gustafson 1802c6dcb5
MINOR: Enable KRaft in `TransactionsTest` (#12176)
Enable support for KRaft in `TransactionsTest`. 

Reviewers: David Arthur <mumrah@gmail.com>
2022-05-18 14:07:59 -07:00
David Arthur 1135f22eaf
KAFKA-13830 MetadataVersion integration for KRaft controller (#12050)
This patch builds on #12072 and adds controller support for metadata.version. The kafka-storage tool now allows a
user to specify a specific metadata.version to bootstrap into the cluster, otherwise the latest version is used.

Upon the first leader election of the KRaft quroum, this initial metadata.version is written into the metadata log. When
writing snapshots, a FeatureLevelRecord for metadata.version will be written out ahead of other records so we can
decode things at the correct version level.

This also includes additional validation in the controller when setting feature levels. It will now check that a given
metadata.version is supportable by the quroum, not just the brokers.

Reviewers: José Armando García Sancio <jsancio@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>, Alyssa Huang <ahuang@confluent.io>
2022-05-18 12:08:36 -07:00
runom cf34a2e4b0
MINOR: Replace string literal with constant in RequestChannel (#12134)
Replace the "RequestsPerSec" literal value with the pre-existing constant `RequestsPerSec`.

Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-05-18 11:31:15 -07:00
dengziming 67d00e25e9
MINOR: Enable some AdminClient integration tests (#12110)
Enable KRaft in `AdminClientWithPoliciesIntegrationTes`t and `PlaintextAdminIntegrationTest`. There are some tests not enabled or not as expected yet:

- testNullConfigs, see KAFKA-13863
- testDescribeCluster and testMetadataRefresh, currently we don't get the real controller in KRaft mode so the test may not run as expected

This patch also changes the exception type raised from invalid `IncrementalAlterConfig` requests with the `SUBTRACT` and `APPEND` operations. When the configuration value type is not a list, we now raise `INVALID_CONFIG` instead of `INVALID_REQUEST`.

Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-05-18 09:39:26 -07:00
David Jacot a1cd1d1839
MINOR: Followers should not have any remote replica states left over from previous leadership (#12138)
This patch ensures that followers don't have any remote replica states left over from previous leadership.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-18 09:32:48 +02:00
bozhao12 f36de0744b
MINOR: Remove redundant metric reset in KafkaController (#12158)
The following variables in `KafkaController` are used for metrics:
```
    offlinePartitionCount 
    preferredReplicaImbalanceCount
    globalTopicCount 
    globalPartitionCount
    topicsToDeleteCount 
    replicasToDeleteCount 
    ineligibleTopicsToDeleteCount 
    ineligibleReplicasToDeleteCount 
```
When the controller goes from active to non-active, these variables will be reset to 0. Currently, this is done explicitly in in `KafkaController.onControllerResignation()` and also after every loop iteration in `KafkaController.updateMetrics()` .
The first of these is redundant and can be removed. This patch fixes this and also simplifies `updateMetrics`. 

Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-17 15:40:05 -07:00
dengziming 5f039bae1c
KAFKA-13905: Fix failing ServerShutdownTest.testCleanShutdownAfterFailedStartupDueToCorruptLogs (#12165)
Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>
2022-05-17 16:31:28 +08:00
David Jacot 972b76561a
MINOR: Rename remaining `zkVersion` to `partitionEpoch` in `PartitionTest` (#12147)
Reviewers:  Kvicii <42023367+Kvicii@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-05-17 08:58:43 +02:00
Jason Gustafson 1103c76d63
KAFKA-13899: Use INVALID_CONFIG error code consistently in AlterConfig APIs (#12162)
In the AlterConfigs/IncrementalAlterConfigs zk handler, we return `INVALID_REQUEST` and `INVALID_CONFIG` inconsistently. The problem is in `LogConfig.validate`. We may either return `ConfigException` or `InvalidConfigException`. When the first of these is thrown, we catch it and convert to `INVALID_REQUEST`. If the latter is thrown, then we return `INVALID_CONFIG`. It seems more appropriate to return `INVALID_CONFIG` consistently, which is what the KRaft implementation already does this. This patch fixes this and converts a few integration tests to KRaft.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-05-16 17:41:23 -07:00
Joel Hamill 06051988a2
MINOR: Clarify impact of num.replica.fetchers (#12153)
The documentation for `num.replica.fetchers` should emphasize the fact that the count applies to each source broker individually. Also mention the tradeoff.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-16 09:38:06 -07:00
Divij Vaidya 5fae84e4d1
KAFKA-13851: Add integration tests for DeleteRecords API (#12087)
Reviewers: Luke Chen <showuon@gmail.com>, dengziming <dengziming1993@gmail.com>
2022-05-16 15:50:50 +08:00
Colin Patrick McCabe a3e0af94f2
MINOR: convert some tests to KRaft (#12155)
Convert EndToEndClusterIdTest, ConsumerGroupCommandTest,
ListConsumerGroupTest, and LogOffsetTest to test KRaft mode.

Reviewers: Jason Gustafson <jason@confluent.io>, dengziming <dengziming1993@gmail.com>
2022-05-13 17:29:47 -07:00
vamossagar12 f96e381387
KAFKA-13746: Attempt to fix flaky test by waiting on metadata update (#12104)
Reviewers: dengziming <dengziming1993@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
2022-05-13 17:09:47 -07:00
Colin Patrick McCabe fa59be4e77
KAFKA-13649: Implement early.start.listeners and fix StandardAuthorizer loading (#11969)
Since the StandardAuthorizer relies on the metadata log to store its ACLs, we need to be sure that
we have the latest metadata before allowing the authorizer to be used. However, if the authorizer
is not usable for controllers in the cluster, the latest metadata cannot be fetched, because
inter-node communication cannot occur. In the initial commit which introduced StandardAuthorizer,
we punted on the loading issue by allowing the authorizer to be used immediately. This commit fixes
that by implementing early.start.listeners as specified in KIP-801. This will allow in superusers
immediately, but throw the new AuthorizerNotReadyException if non-superusers try to use the
authorizer before StandardAuthorizer#completeInitialLoad is called.

For the broker, we call StandardAuthorizer#completeInitialLoad immediately after metadata catch-up
is complete, right before unfencing. For the controller, we call
StandardAuthorizer#completeInitialLoad when the node has caught up to the high water mark of the
cluster metadata partition.

This PR refactors the SocketServer so that it creates the configured acceptors and processors in
its constructor, rather than requiring a call to SocketServer#startup A new function,
SocketServer#enableRequestProcessing, then starts the threads and begins listening on the
configured ports. enableRequestProcessing uses an async model: we will start the acceptor and
processors associated with an endpoint as soon as that endpoint's authorizer future is completed.

Also fix a bug where the controller and listener were sharing an Authorizer when in co-located
mode, which was not intended.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-12 14:48:33 -07:00
José Armando García Sancio e94934b6b7
MINOR; DeleteTopics version tests (#12141)
Add a DeleteTopics test for all supported versions. Convert the
DeleteTopicsRequestTest to run against both ZK and KRaft mode.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>
2022-05-12 13:04:48 -07:00
dengziming 6ab4d047d5
MINOR: Fix flaky testDescribeUnderReplicatedPartitions (#12112)
Currently, we are waiting for metadataCache to bookkeeper the partition info, this isn't enough, we should wait until the partition ISR is less than AR.

Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <divijvaidya13@gmail.com>
2022-05-12 20:55:42 +08:00
Jason Gustafson 7730476603
MINOR: Create case class to encapsulate fetch parameters and simplify handling (#12082)
This patch adds a new case class `FetchParams` which encapsulates the parameters of the fetch request. It then uses this class in `DelayedFetch` directly instead of `FetchMetadata`. The intent is to reduce the number of things we need to change whenever we need to pass through new parameters. The patch also cleans up `ReplicaManagerTest` for more consistent usage.

Reviewers: David Jacot <djacot@confluent.io>
2022-05-10 13:24:23 -07:00
Colin Patrick McCabe 1cfc7c25fd
MINOR: install Exit.exit handler in BrokerMetadataPublisherTest (#12142)
Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-10 12:44:39 -07:00
dengziming 0c1cde1080
KAFKA-13862; Support Append/Subtract multiple config values in KRaft mode (#12108)
We can append/subtract multiple config values in kraft mode using the `IncrementalAlterConfig` RPC. For example: append/subtract topic config "cleanup.policy" with value="delete,compact" will end up treating "delete,compact" as a value not 2 values. This patch fixes the problem. Additionally, it update the zk logic to correctly handle duplicate additions.

Reviewers: Akhilesh Chaganti <akhileshchg@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
2022-05-10 12:41:17 -07:00
David Jacot b485f92647
KAFKA-13790; ReplicaManager should be robust to all partition updates from kraft metadata log (#12085)
This patch refactors the `Partition.makeLeader` and `Partition.makeFollower` to be robust to all partition updates from the KRaft metadata log. Particularly, it ensures the following invariants:

- A partition update is accepted if the partition epoch is equal or newer. The partition epoch is updated by the AlterPartition path as well so we accept an update from the metadata log with the same partition epoch in order to fully update the partition state.
- The leader epoch state offset is only updated when the leader epoch is bumped.
- The follower states are only updated when the leader epoch is bumped.
- Fetchers are only restarted when the leader epoch is bumped. This was already the case but this patch adds unit tests to prove/maintain it.

In the mean time, the patch unifies the state change logs to be similar in both ZK and KRaft world.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-09 20:47:14 +02:00
RivenSun df507e56e2
KAFKA-13793: Add validators for configs that lack validators (#12010)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Luke Chen <showuon@gmail.com>, Chris Egerton <fearthecellos@gmail.com>, Christo Lolov <lolovc@amazon.com>, Divij Vaidya <divijvaidya13@gmail.com>
2022-05-09 20:29:17 +02:00
Luke Chen 16324448a2
KAFKA-13804: Output the reason why broker exit unexpectedly during startup (#12028)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-05-06 11:25:33 +02:00
Lucas Wang 0a9893cac0
KAFKA-13815: Avoid reinitialization for a replica that is being deleted (#12029)
This PR tries to avoid the reinitialization of the leader epoch cache
and the partition metadata if the corresponding replica is being deleted.
With this change, the asyncDelete method can run more efficiently,
which means a StopReplica request with many partitions to be deleted can be
processed more quickly.

Reviewers: David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
2022-05-04 11:41:34 -07:00
Akhilesh Chaganti 430f75ba22
KAFKA-13861; Fix the validateOnly behavior for CreatePartitions requests in KRaft mode (#12106)
The KRaft implementation of the `CreatePartitions` ignores the `validateOnly` flag in the
request and creates the partitions if the validations are successful. Fixed the behavior
not to create partitions upon validation if the `validateOnly` flag is true.

Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-05-04 10:31:46 -07:00
Joel Hamill c6d3bcbd16
MINOR: Improve docs about how to provide multiple log.dir (#12119)
Reviewer:  Luke Chen <showuon@gmail.com>
2022-05-04 11:15:29 +08:00
dengziming bf7cd675f8
MINOR: Remove duplicated test cases in MetadataVersionTest (#12116)
These tests belongs to ApiVersionsResponseTest, and accidentally copied them to MetadataVersionTest when working on #12072.

Reviewer: Luke Chen <showuon@gmail.com>
2022-05-04 11:10:39 +08:00
Alyssa Huang 8245c9a3d5
KAFKA-13854 Refactor ApiVersion to MetadataVersion (#12072)
Refactoring ApiVersion to MetadataVersion to support both old IBP versioning and new KRaft versioning (feature flags)
for KIP-778.

IBP versions are now encoded as enum constants and explicitly prefixed w/ IBP_ instead of KAFKA_, and having a
LegacyApiVersion vs DefaultApiVersion was not necessary and replaced with appropriate parsing rules for extracting
the correct shortVersions/versions.

Co-authored-by: David Arthur <mumrah@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2022-05-02 16:27:52 -07:00
Colin Patrick McCabe 4a6287e832
MINOR: fix ClientQuotasRequestTest (#12107)
Fix ClientQuotasRequestTest.testAlterClientQuotasBadIp so that it uses actually unresolvable hostnames.
The previous choices "ip" and "abc-123" are now resolvable.

Reviewers: David Jacot <djacot@confluent.io>, Andrew Choi <andrew.choi@uwaterloo.ca>, Divij Vaidya <divijvaidya13@gmail.com>
2022-05-02 09:06:16 -07:00
bozhao12 fbcff567d0
KAFKA-13865: Fix ResponseSendTimeMs metric in RequestChannel is removed twice (#12111)
Fix ResponseSendTimeMs metric in RequestChannel is removed twice

Reviewers: Luke Chen <showuon@gmail.com>
2022-05-02 17:02:53 +08:00
Jason Gustafson f2a782a4d7
MINOR: Rename `AlterIsrManager` to `AlterPartitionManager` (#12089)
Since we have changed the `AlterIsr` API to `AlterPartition`, it makes sense to rename `AlterIsrManager` as well and some of the associated classes.

Reviewers: dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>
2022-04-26 09:34:18 -07:00
Alyssa Huang 2a7fdd7670
MINOR; enable KRaft in ConfigCommandIntegrationTest (#11732)
Adding KRaft and ZK params to ConfigCommandIntegrationTest wherever appropriate.

Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@users.noreply.github.com>
2022-04-25 15:11:14 -07:00
David Jacot a5f7c82a86
MINOR: Refactor `kafka.cluster.Replica` (#12081)
This patch refactors kafka.cluster.Replica, it usages and tests. This is part of the work in KAFKA-13790.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-04-25 21:43:32 +01:00
Jason Gustafson 25ee7f147c
MINOR: Change `AlterPartition` validation order in `KafkaController` (#12032)
Currently we validate recovery state before checking leader epoch in `KafkaController`. It seems more intuitive to validate leader epoch first since the leader might be working with stale state, which is what we do in KRaft. This patch fixes this and adds a couple additional validations to make the behavior consistent. 

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-04-25 09:37:03 -07:00
Hongten ff3d42a18c
KAFKA-13852: Kafka Acl documentation bug for wildcard '*' (#12090)
The wildcard * in command without wrapped by single quote will be replaced into the file name under the current folder by bash. So we need to wrap with single quote. Update the doc and command option description.

Reviewers: dengziming <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>
2022-04-24 16:50:44 +08:00
David Jacot 7c8c65fc54
MINOR: Rename `ZkVersion` to `PartitionEpoch` (#12071)
This patch does some initial cleanups in the context of KAFKA-13790. Mainly, it renames `ZkVersion` field to `PartitionEpoch` in the `LeaderAndIsrRequest`, the `LeaderAndIsr` and the `Partition`.

Reviewers: Jason Gustafson <jason@confluent.io>, dengziming <dengziming1993@gmail.com>
2022-04-22 20:38:17 +02:00
Mickael Maison 2ed09db6c4
MINOR: Scala cleanups in core (#12058)
Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>
2022-04-20 15:10:46 +02:00
Divij Vaidya 301c6f44d6
KAFKA-10095: Add stricter assertion in LogCleanerManagerTest (#12004)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-04-20 12:45:03 +02:00
Colin Patrick McCabe 9c3f605fc7
KAFKA-13835: Fix two bugs related to dynamic broker configs in KRaft (#12063)
Fix two bugs related to dynamic broker configs in KRaft. The first bug is that we are calling reloadUpdatedFilesWithoutConfigChange when a topic configuration is changed, but not when a
broker configuration is changed. This is backwards. This function must be called only for broker 
configs, and never for topic configs or cluster configs.

The second bug is that there were several configurations such as max.connections which are related
to broker listeners, but which do not involve changing the registered listeners. We should support
these configurations in KRaft. This PR fixes the configuration change validation to support this case.

Reviewers: Jason Gustafson <jason@confluent.io>, Matthew de Detrich <mdedetrich@gmail.com>
2022-04-19 13:17:16 -07:00
dengziming 9ec232fde8
KAFKA-13832: Fix flaky testAlterAssignment (#12060)
In KRaft mode the metadata is not propagate in time, so we should should wait for it before make assertions.

Reviewers:  Luke Chen <showuon@gmail.com>
2022-04-19 15:08:29 +08:00
Luke Chen 44906bdcdf
KAFKA-8785: fix request timeout by waiting for metadata cache up-to-date (#11681)
The reason why this test is flaky is because we have race condition at the beginning of the test, when brokers are staring up, and the adminClient is requesting for brokers metadata. Once the adminClient only got partial metadata, the test will fail, because in these tests, brokers will be shutdown to test leader election.

Fix this issue by explicitly waiting for metadata cache up-to-date in waitForReadyBrokers, and let admin client get created after waitForReadyBrokers.

Reviewers: Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, dengziming <dengziming1993@gmail.com>
2022-04-19 14:13:21 +08:00
Colin Patrick McCabe 1521813a3a
KAFKA-13807: Fix incrementalAlterConfig and refactor some things (#12033)
Ensure that we can set log.flush.interval.ms at the broker or cluster level via
IncrementalAlterConfigs. This was broken by KAFKA-13749, which added log.flush.interval.ms as the
second synonym rather than the first. Add a regression test to DynamicConfigChangeTest.

Create ControllerRequestContext and pass it to every controller API. This gives us a uniform way to
pass through information like the deadline (if there is one) and the Kafka principal which is
making the request (in the future we will want to log this information).

In ControllerApis, enforce a timeout for broker heartbeat requests which is equal to the heartbeat
request interval, to avoid heartbeats piling up on the controller queue. This should have been done
previously, but we overlooked it.

Add a builder for ClusterControlManager and ReplicationControlManager to avoid the need to deal
with a lot of churn (especially in test code) whenever a new constructor parameter gets added for
one of these.

In ControllerConfigurationValidator, create a separate function for when we just want to validate
that a ConfigResource is a valid target for DescribeConfigs. Previously we had been re-using the
validation code for IncrementalAlterConfigs, but this was messy.

Split out the replica placement code into a separate package and reorganize it a bit.

Reviewers: David Arthur <mumrah@gmail.com
2022-04-15 16:07:23 -07:00
Xiaobing Fang f4e3ccd81a
MINOR: update comment in LocalLog.replaceSegments() (#12054)
Reviewers: Luke Chen <showuon@gmail.com>
2022-04-15 21:00:11 +08:00
dengziming 77cd827104
MINOR: Move some TopicCommand and ConfigCommand integration tests to unit tests (#12024)
Move some TopicCommand and ConfigCommand integration tests to unit tests to speed up the tests

Reviewers: Luke Chen <showuon@gmail.com>
2022-04-15 16:35:52 +08:00
David Arthur 55ff5d3603
KAFKA-13823 Feature flag changes from KIP-778 (#12036)
This PR includes the changes to feature flags that were outlined in KIP-778.  Specifically, it
changes UpdateFeatures and FeatureLevelRecord to remove the maximum version level. It also adds
dry-run to the RPC so the controller can actually attempt the upgrade (rather than the client). It
introduces an upgrade type enum, which supersedes the allowDowngrade boolean. Because
FeatureLevelRecord was unused previously, we do not need to introduce a new version.

The kafka-features.sh tool was overhauled in KIP-778 and now includes the describe, upgrade,
downgrade, and disable sub-commands.  Refer to
[KIP-778](https://cwiki.apache.org/confluence/display/KAFKA/KIP-778%3A+KRaft+Upgrades) for more
details on the new command structure.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>
2022-04-14 10:04:32 -07:00
Lee Dongjin 01e4ceba52
KAFKA-12613: Fix inconsistent validation logic between KafkaConfig and LogConfig (#10472)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-04-14 11:58:57 +02:00
dengziming 87aa8259dd KAFKA-13743: Prevent topics with conflicting metrics names from being created in KRaft mode #11910
In ZK mode, the topic "foo_bar" will conflict with "foo.bar" because of limitations in metric
names. We should implement this in KRaft mode.  This PR also changes TopicCommandIntegrationTest to
support KRaft mode.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-04-13 11:59:29 -07:00
José Armando García Sancio a6d86b9998
MINOR: Verify stopReplica if broker epoch not stale (#12040)
Verify that ReplicaManager.stopReplica is called if the stop replica
request doesn't result in a stale broker epoch error.

Reviewers: Mickael Maison <mimaison@users.noreply.github.com>
2022-04-13 09:05:27 -07:00
RivenSun 1df232c839
MINOR: Supplement the description of `Valid Values` in the documentation of `compression.type` (#11985)
Because a validator is added to ProducerConfig.COMPRESSION_TYPE_CONFIG and KafkaConfig.CompressionTypeProp, the corresponding testCase is improved to verify whether the wrong value of compression.type will throw a ConfigException.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
2022-04-12 21:24:57 -07:00
José Armando García Sancio 317fff9bb0
MINOR: Re-use counter in mocking of LogSegment.size (#12021)
When migrating from Easymock to Mockito, the mockito implemetnation
didn't have the same semantic as the Easymock implementation.

Without this fix the mocking of LogSegment.size() always returns 0 because
a new AtomicInteger was getting created for each invocation of
LogSegment.size()

Reviewers: Mickael Maison <mimaison@users.noreply.github.com>
2022-04-11 13:37:47 -07:00
David Jacot 6e9cd0c7f5
MINOR: A few code cleanups in DynamicBrokerConfig (#12015)
Reviewers: Luke Chen <showuon@gmail.com>
2022-04-09 11:42:42 +02:00
Alok Nikhil 7a5f0cfaef
MINOR: Fix DescribeLogDirs API error handling for older API versions (#12017)
With KAFKA-13527 / KIP-784 we introduced a new top-level error code for
the DescribeLogDirs API for versions 3 and above. However, the change
regressed the error handling for versions less than 3 since the response
converter fails to write the non-zero error code out (rightly) for
versions lower than 3 and drops the response to the client which
eventually times out instead of receiving an empty log dirs response and
processing that as a Cluster Auth failure.

With this change, the API conditionally propagates the error code out to
the client if the request API version is 3 and above. This keeps the
semantics of the error handling the same for all versions and restores
the behavior for older versions.

See current behavior in the broker log:
```bash
ERROR] 2022-04-08 01:22:56,406 [data-plane-kafka-request-handler-10] kafka.server.KafkaApis - [KafkaApi-0] Unexpected error handling request RequestHeader(apiKey=DESCRIBE_LOG_DIRS, apiVersion=0, clientId=sarama, correlationId=1) -- DescribeLogDirsRequestData(topics=null)
org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default errorCode at version 0
[ERROR] 2022-04-08 01:22:56,407 [data-plane-kafka-request-handler-10] kafka.server.KafkaRequestHandler - [Kafka Request Handler 10 on Broker 0], Exception when handling request
org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default errorCode at version 0
```

Reviewers: Ismael Juma <ismael@juma.me.uk>
2022-04-08 12:54:09 -07:00
Xavier Léauté 60c0916bfd
KAFKA-13801: Kafka server does not respect MetricsReporter contract for dynamically configured reporters (#11998)
MetricsReporter.contextChange contract states the method should always be called first before MetricsReporter.init is called. This is done correctly for reporters enabled by default (e.g. JmxReporter) but not for metrics reporters configured dynamically.

This fixes the call ordering for dynamically configured metrics reporter and updates tests to enforce ordering.

Reviewers: David Jacot <djacot@confluent.io>
2022-04-07 10:13:15 +02:00
sciclon2 92305c2cf2
KAFKA-13687: Limiting the amount of bytes to be read in a segment logs (#11842)
This PR allows to limit the output batches while they are inspected via the kafka-dump-log.sh script.

The idea is to take samples from the logsegments without affecting a production cluster as the current script will read the whole files, this could create issues related to performance.

Please see the KIP-824

Reviewers: Jun Rao <junrao@gmail.com>
2022-04-06 08:48:43 -07:00
Anastasia Vela 206ad4d2b5
MINOR: Fix flaky testIdleConnection() test (#11996)
The test expects that the connection becomes idle before the mock time is moved forward, but the processor thread runs concurrently and may run some activity on the connection after the mock time is moved forward, thus the connection never expires.

The solution is to wait until the message is received on the socket, and only then wait until the connection is unmuted (it's not enough to wait for unmuted without waiting for message being received on the socket, because the channel might have not been muted yet).

Reviewers: David Jacot <djacot@confluent.io>
2022-04-06 16:35:25 +02:00
bozhao12 4218fc61fe
KAFKA-13778: Fetch from follower should never run the preferred read replica selection (#11965)
The current preferred read replica selection logic relies on `partition.leaderReplicaIdOpt` to determine if the selection must be run. The issue is that `partition.leaderReplicaIdOpt` is defined for both the leader and the followers thus the logic is ran all the time. The impact is not too bad as the leader is selected most of the time when the logic is ran by the follower and the leader is filtered out. However there are cases where the selection on a follower could redirect the consumer to another follower under certain rare conditions. For instance with the `RackAwareReplicaSelector `, the follower must have stale replica states from a previous leadership and must have other followers in the same rack for instance. Other implementation of the selection logic could be more impacted.

This patch ensures that the preferred read replica selection is only ran by the leader.

Reviewers: David Jacot <djacot@confluent.io>
2022-04-05 18:56:23 +02:00
Anastasia Vela ae45c59e61
MINOR: Fix flaky testClientDisconnectionUpdatesRequestMetrics() (#11987)
Reviewers: David Jacot <djacot@confluent.io>
2022-04-04 09:10:33 +02:00
Colin Patrick McCabe 62ea4c46a9
KAFKA-13749: CreateTopics in KRaft must return configs (#11941)
Previously, when in KRaft mode, CreateTopics did not return the active configurations for the
topic(s) it had just created. This PR addresses that gap. We will now return these topic
configuration(s) when the user has DESCRIBE_CONFIGS permission. (In the case where the user does
not have this permission, we will omit the configurations and set TopicErrorCode. We will also omit
the number of partitions and replication factor data as well.)

For historical reasons, we use different names to refer to each topic configuration when it is set
in the broker context, as opposed to the topic context. For example, the topic configuration
"segment.ms" corresponds to the broker configuration "log.roll.ms". Additionally, some broker
configurations have synonyms. For example, the broker configuration "log.roll.hours" can be used to
set the log roll time instead of "log.roll.ms". In order to track all of this, this PR adds a
table in LogConfig.scala which maps each topic configuration to an ordered list of ConfigSynonym
classes. (This table is then passed to KafkaConfigSchema as a constructor argument.)

Some synonyms require transformations. For example, in order to convert from "log.roll.hours" to
"segment.ms", we must convert hours to milliseconds. (Note that our assumption right now is that
topic configurations do not have synonyms, only broker configurations. If this changes, we will
need to add some logic to handle it.)

This PR makes the 8-argument constructor for ConfigEntry public. We need this in order to make full
use of ConfigEntry outside of the admin namespace. This change is probably inevitable in general
since otherwise we cannot easily test the output from various admin APIs in junit tests outside the
admin package.

Testing:

This PR adds PlaintextAdminIntegrationTest#testCreateTopicsReturnsConfigs. This test validates
some of the configurations that it gets back from the call to CreateTopics, rather than just checking
if it got back a non-empty map like some of the existing tests. In order to test the
configuration override logic, testCreateDeleteTopics now sets up some custom static and dynamic
configurations.

In QuorumTestHarness, we now allow tests to configure what the ID of the controller should be. This
allows us to set dynamic configurations for the controller in testCreateDeleteTopics. We will have
a more complete fix for setting dynamic configuations on the controller later.

This PR changes ConfigurationControlManager so that it is created via a Builder. This will make it
easier to add more parameters to its constructor without having to update every piece of test code
that uses it. It will also make the test code easier to read.

Reviewers: David Arthur <mumrah@gmail.com>
2022-04-01 10:50:25 -07:00
RivenSun 1bdd35d8d8
KAFKA-13786: Add a note in`control.plane.listener.name` doc (#11978)
Add a note in `control.plane.listener.name` doc to mention the value can't be identical with `inter.broker.listener.name`.

Reviewers: Luke Chen <showuon@gmail.com>
2022-04-01 16:23:29 +08:00
dengziming 502f2caca4
MINOR: Remove some unused codes (#11935)
`validateChars` and `BaseEnum` are used in old version of clients. Remove them.

Reviewers: Luke Chen <showuon@gmail.com>
2022-04-01 11:39:50 +08:00
Yang Yu eefdf9d6a7
KAFKA-12875: Change Log layer segment map mutations to avoid absence of active segment (#11950)
Reviewers: Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>
2022-03-31 10:56:07 -07:00
Yu 430f9c9901
KAFKA-13772: Partitions are not correctly re-partitioned when the fetcher thread pool is resized (#11953)
Partitions are assigned to fetcher threads based on their hash modulo the number of fetcher threads. When we resize the fetcher thread pool, we basically re-distribute all the partitions based on the new fetcher thread pool size. The issue is that the logic that resizes the fetcher thread pool updates the `fetcherThreadMap` while iterating over it. The `Map` does not give any guarantee in this case - especially when the underlying map is re-hashed - and that led to not iterating over all the fetcher threads during the process and thus in leaving some partitions in the wrong fetcher threads.

Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2022-03-31 14:45:59 +02:00
dengziming 669a49063d
MINOR: Fix an uncompatible bug in GetOffsetShell (#11936)
In KIP-815 we replaced KafkaConsumer with AdminClient in GetOffsetShell. In the previous implementation, partitions were just ignored if there is no offset for them, however, we will print -1 instead now, This PR fix this inconsistency.

Reviewers: David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>
2022-03-31 10:34:39 +08:00
Xiaobing Fang 8965240da3
MINOR: Fix doc variable typos in `TopicConfig` (#11972)
Reviewers: Luke Chen <showuon@gmail.com>
2022-03-31 09:54:42 +08:00
Jason Gustafson b2cb6caa1e
MINOR: Move `KafkaYammerMetrics` to server-common (#11970)
With major server components like the new quorum controller being moved outside of the `core` module, it is useful to have shared dependencies moved into `server-common`. An example of this is Yammer metrics which server components still rely heavily upon. All server components should have access to the default registry used by the broker so that new metrics can be registered and metric naming conventions should be standardized. This is particularly important in KRaft where we are attempting to recreate identically named metrics in the controller context. This patch takes a step in this direction. It moves `KafkaYammerMetrics` into `server-common` and it implements
standard metric naming utilities there. 

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2022-03-30 13:59:22 -07:00
Edoardo Comar bb60eb86e1
MINOR: Increase wait in ZooKeeperClientTest (#11973)
Increase wait in ZooKeeperClientTest.testReinitializeAfterAuthFailure
so that the testcase of https://github.com/apache/kafka/pull/11563
actually fails without the corresponding source code fix.
Followup of https://issues.apache.org/jira/browse/KAFKA-13461.

Co-Authored-By: Gantigmaa Selenge <gantigmaa.selenge1@uk.ibm.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2022-03-30 17:09:19 +01:00
David Jacot f2aa0c439c
MINOR: Disable SocketServerTest.closingChannelWithBufferedReceives and SocketServerTest.remoteCloseWithoutBufferedReceives (#11960)
This reverts commit d706d6cac4.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-03-29 14:31:12 +02:00
bozhao12 db2485cb59
KAFKA-13767; Fetch from consumers should return immediately when preferred read replica is defined by the leader (#11942)
When a replica selector is configured, the partition leader computes a preferred read replica for any fetch from the consumers. When the preferred read replica is not the leader, the leader returns the preferred read replica with `FetchDataInfo(LogOffsetMetadata.UnknownOffsetMetadata, MemoryRecords.EMPTY)` to the `ReplicaManager`. This causes the fetch to go into in the fetch purgatory because the exit conditions are not met. In turns out that the delayed fetch is not completed until the timeout is reached because the delayed fetch ignores partition with an unknown offset (-1). If the fetch contains only one partition, the fetch is unnecessarily delayed by the timeout time (500ms by default) to only inform the consumer that it has to read from a follower.

This patch fixes the issue by completing the fetch request immediately when a preferred read replica is defined.

Reviewers: David Jacot <djacot@confluent.io>
2022-03-29 10:13:05 +02:00
Sanjana Kaundinya 2a270591f8
MINOR: Improved display names for parameterized KRaft and ZK tests (#11957)
This patch adds display names for KRaft and ZK tests. Without this, it becomes hard to understand in Jenkins test reports which test failed. With this addition, it becomes more clear which method in the test suite fails.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-03-28 16:20:21 -07:00
David Jacot d706d6cac4
MINOR: Renable SocketServerTest.closingChannelWithBufferedReceives and SocketServerTest.remoteCloseWithoutBufferedReceives (#11927)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-03-25 10:18:15 -07:00
Luke Chen 0461382adb
KAFKA-4801: don't verify assignment during broker up and down in testConsumptionWithBrokerFailures (#11949)
In this test, we have another thread to let broker down and up, to test if consumer can still work as expected. During the broker down and up, we tried to verify the assignment is as what we expected. But the rebalance will keep triggering while broker down and up. It doesn't make sense to verify the assignment here. Remove it to make the test reliable.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-03-25 10:16:50 -07:00
David Jacot 12bb23157c
MINOR: A few cleanups in BrokerToControllerChannelManager (#11937)
Make the code style more consistent

Reviewers: Luke Chen <showuon@gmail.com>
2022-03-24 15:49:25 +08:00
Liam Clarke-Hutchinson e8f09007e4
KAFKA-13672: Race condition in DynamicBrokerConfig (#11920)
Reviewers: David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>
2022-03-24 11:54:05 +08:00
dengziming c9c03dd7ef
MINOR: Remove scala KafkaException (#11913)
Use the standard org.apache.kafka.common.KafkaException instead of kafka.common.KafkaException.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@confluent.io>
2022-03-21 14:56:25 -07:00
dengziming d449f850e1
MINOR: show LogRecoveryState in MetadataShell and fix log message
Show the LeaderRecoveryState in MetadataShell.

Fix a case where we were comparing a Byte type with an enum type.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-03-21 14:33:51 -07:00
David Jacot 72558da976
MINOR: Small cleanups in the AclAuthorizer (#11921)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2022-03-21 11:23:31 +01:00
Luke Chen 3a8f6b17a6
KAFKA-7540: commit offset sync before close (#11898)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-03-21 16:51:21 +08:00
José Armando García Sancio 8d6968e832
KAFKA-13682; KRaft Controller auto preferred leader election (#11893)
Implement auto leader rebalance for KRaft by keeping track of the set of topic partitions which have a leader that is not the preferred replica. If this set is non-empty then schedule a leader balance event for the replica control manager.

When applying PartitionRecords and PartitionChangeRecords to the ReplicationControlManager, if the elected leader is not the preferred replica then remember this topic partition in the set of imbalancedPartitions.

Anytime the quorum controller processes a ControllerWriteEvent it schedules a rebalance operation if the there are no pending rebalance operations, the feature is enabled and there are imbalance partitions.

This KRaft implementation only supports the configurations properties auto.leader.rebalance.enable and leader.imbalance.check.interval.seconds. The configuration property leader.imbalance.per.broker.percentage is not supported and ignored.

Reviewers: Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>
2022-03-18 14:30:52 -07:00
José Armando García Sancio 52621613fd
KAFKA-13587; Implement leader recovery for KIP-704 (#11733)
Implementation of the protocol for starting and stopping leader recovery after an unclean leader election. This includes the management of state in the controllers (legacy and KRaft) and propagating this information to the brokers. This change doesn't implement log recovery after an unclean leader election.

Protocol Changes
================

For the topic partition state znode, the new field "leader_recovery_state" was added. If the field is missing the value is assumed to be RECOVERED.

ALTER_PARTITION was renamed from ALTER_ISR. The CurrentIsrVersion field was renamed to PartitionEpoch. The new field LeaderRecoveryState was added.

The new field LeaderRecoverState was added to the LEADER_AND_ISR request. The inter broker protocol version is used to determine which version to send to the brokers.

A new tagged field for LeaderRecoveryState was added to both the PartitionRecord and PartitionChangeRecord.

Controller
==========

For both the KRaft and legacy controller the LeaderRecoveryState is set to RECOVERING, if the leader was elected out of the ISR, also known as unclean leader election. The controller sets the state back to RECOVERED after receiving an ALTER_PARTITION request with version 0, or with version 1 and with the LeaderRecoveryState set to RECOVERED.

Both controllers preserve the leader recovery state even if the unclean leader goes offline and comes back online before an RECOVERED ALTER_PARTITION is sent.

The controllers reply with INVALID_REQUEST if the ALTER_PARTITION either:

    1. Attempts to increase the ISR while the partition is still RECOVERING
    2. Attempts to change the leader recovery state to RECOVERING from a RECOVERED state.

Topic Partition Leader
======================

The topic partition leader doesn't implement any log recovery in this change. The topic partition leader immediately marks the partition as RECOVERED and sends that state in the next ALTER_PARTITION request.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-03-18 09:24:11 -07:00
dengziming 5cebe12a66
KAFKA-13509; Support max timestamp in GetOffsetShell (KIP-815) (#11173)
This patch implements KIP-815 as described here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-815%3A++Support+max-timestamp+in+GetOffsetShell.

Reviewers: Luke Chen <showuon@gmail.com>, Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>
2022-03-17 17:53:37 +01:00
Jason Gustafson 76d287c967
KAFKA-13727; Preserve txn markers after partial segment cleaning (#11891)
It is possible to clean a segment partially if the offset map is filled before reaching the end of the segment. The highest offset that is reached becomes the new dirty offset after the cleaning completes. The data above this offset is nevertheless copied over to the new partially cleaned segment. Hence we need to ensure that the transaction index reflects aborted transactions from both the cleaned and uncleaned portion of the segment. Prior to this patch, this was not the case. We only collected the aborted transactions from the cleaned portion, which means that the reconstructed index could be incomplete. This can cause the aborted data to become effectively committed. It can also cause the deletion of the abort marker before the corresponding data has been removed (i.e. the aborted transaction becomes hanging).

Reviewers: Jun Rao <junrao@gmail.com>
2022-03-15 12:26:23 -07:00
wangyap e8a762eee4
MINOR: set batch-size option into batch.size config in consoleProducer (#11855)
Reviewers: Luke Chen <showuon@gmail.com>
2022-03-15 19:40:11 +08:00
Guozhang Wang cad4985a0a
MINOR: Disable those flaky tests (#11895)
I collected a list of the most flaky tests observed lately, checked / created their corresponding tickets, and mark them as ignored for now. Many of these failures are:

0. Failing very frequently in the past (at least in my observations).
1. not investigated for some time.
2. have a PR for review (mostly thanks to @showuon !), but not reviewed for some time.

Since 0), these tests failures are hindering our development; and from 1/2) above, people are either too busy to look after them, or honestly the tests are not considered as providing values since otherwise people should care enough to panic and try to resolve. So I think it's reasonable to disable all these tests for now. If we later learned our lesson a hard way, it would motivate us to tackle flaky tests more diligently as well.

I'm only disabling those tests that have been failed for a while, and if for such time no one have been looking into them, I'm concerned that just gossiping around about those flakiness would not bring people's attention to them either. So my psychological motivation is that "if people do not care about those failed tests for weeks (which, is not a good thing! :P), let's teach ourselves the lesson a hard way when it indeed buries a bug that bites us, or not learn the lesson at all --- that indicates those tests are indeed not valuable". For tests that I only very recently saw I did not disable them.

Reviewers: John Roesler <vvcephei@apache.org>, Matthias J. Sax <mjsax@apache.org>, Luke Chen <showuon@gmail.com>, Randall Hauch <rhauch@gmail.com>
2022-03-14 21:32:28 -07:00
xuexiaoyue f025a93c7c
MINOR: Fix comments in TransactionsTest (#11880)
Reviewer: Luke Chen <showuon@gmail.com>
2022-03-11 15:42:44 +08:00
Vincent Jiang 798275f254
KAFKA-13717: skip coordinator lookup in commitOffsetsAsync if offsets is empty (#11864)
Reviewer: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2022-03-10 10:52:05 +08:00
David Jacot 69926b5193
MINOR: Clean up AlterIsrManager code (#11832)
Reviewers: Justine Olshan <jolshan@confluent.io>, Jason Gustafson <jason@confluent.io>
2022-03-09 07:31:07 +01:00
Vincent Jiang b27000ec6a
MINOR: Fix flaky test cases SocketServerTest.remoteCloseWithoutBufferedReceives and SocketServerTest.remoteCloseWithIncompleteBufferedReceive (#11861)
When a socket is closed, corresponding channel should be retained only if there is complete buffered requests.

Reviewers: David Jacot <djacot@confluent.io>
2022-03-08 19:03:11 +01:00
Luke Chen 1848f049e1
KAFKA-13710: bring the InvalidTimestampException back for record error (#11853)
Reviewers: Guozhang Wang <guozhang@confluent.io>, Ricardo Brasil <anribrasil@gmail.com>
2022-03-08 14:28:16 +08:00
RivenSun 3be978464c
KAFKA-13694: Log more specific information when the verification record fails on brokers. (#11830)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-03-04 10:45:44 -08:00
wangyap ae76b9d45a
KAFKA-13466: delete unused config batch.size in kafka-console-producer.sh (#11517)
delete unused config batch.size in kafka-console-producer.sh

Reviewer: Andrew Eugene Choi <andrew.choi@uwaterloo.ca>, Luke Chen <showuon@gmail.com>,
2022-03-04 09:47:23 +08:00
Colin Patrick McCabe 07553d13f7
MINOR: create KafkaConfigSchema and TimelineObject (#11809)
Create KafkaConfigSchema to encapsulate the concept of determining the types of configuration keys.
This is useful in the controller because we can't import KafkaConfig, which is part of core. Also
introduce the TimelineObject class, which is a more generic version of TimelineInteger /
TimelineLong.

Reviewers: David Arthur <mumrah@gmail.com>
2022-03-02 14:26:31 -08:00
Kowshik Prakasam 67e99a4236
MINOR: Ensure LocalLog.flush is thread safe to recoveryPoint changes (#11814)
Issue:
Imagine a scenario where two threads T1 and T2 are inside UnifiedLog.flush() concurrently:

KafkaScheduler thread T1 -> The periodic work calls LogManager.flushDirtyLogs() which in turn calls UnifiedLog.flush(). For example, this can happen due to log.flush.scheduler.interval.ms here.
KafkaScheduler thread T2 -> A UnifiedLog.flush() call is triggered asynchronously during segment roll here.
Supposing if thread T1 advances the recovery point beyond the flush offset of thread T2, then this could trip the check within LogSegments.values() here for thread T2, when it is called from LocalLog.flush() here. The exception causes the KafkaScheduler thread to die, which is not desirable.

Fix:
We fix this by ensuring that LocalLog.flush() is immune to the case where the recoveryPoint advances beyond the flush offset.

Reviewers: Jun Rao <junrao@gmail.com>
2022-03-01 10:55:17 -08:00
Jason Gustafson 5f91aa7b4c
KAFKA-13698; KRaft authorizer should use host address instead of name (#11807)
Use `InetAddress.getHostAddress` in `StandardAuthorizer` instead of `InetAddress.getHostName`.

Reviewers: Colin Patrick McCabe <cmccabe@confluent.io>
2022-02-26 10:52:34 -08:00
Jason Gustafson 2c90447a59
KAFKA-13697; KRaft authorizer should support AclOperation.ALL (#11806)
KRaft authorizer should support AclOperation.ALL correctly.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-02-25 15:43:21 -08:00
Jason Gustafson 711b603ddc
MINOR: Cleanup admin creation logic in integration tests (#11790)
There seemed to be a little sloppiness in the integration tests in regard to admin client creation. Not only was there duplicated logic, but it wasn't always clear which listener the admin client was targeting. This made it difficult to tell in the context of authorization tests whether we were indeed testing with the right principal. As an example, we had a method in TestUtils which was using the inter-broker listener implicitly. This meant that the test was using the broker principal which had super user privilege. This was intentional, but I think it would be clearer to make the dependence on this listener explicit. This patch attempts to clean this up a bit by consolidating some of the admin creation logic and making the reliance on the listener clearer.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-02-24 07:37:28 -08:00
dengziming a4b1e50f08
MINOR; Remove unused AdminZkClient in MetadataSupport (#11785)
Remove unused AdminZkClient in MetadataSupport

Reviewers: Luke Chen <showuon@gmail.com>
2022-02-18 14:31:44 +08:00
Wenjun Ruan 81e709c4e2
MINOR: Remove unused params in `ZkConfigManager` (#11763)
Remove `changeExpirationMs` and `time` in `ZkConfigManager`, since these two parameters are not used.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-02-16 10:12:07 -08:00
Jason Gustafson b765a2b44e
MINOR: Remove redundant forwarding integration tests (#11766)
There are a few integration tests for the forwarding logic which were added prior to kraft being ready for integration testing. Now that we have enabled kraft in integration tests, these tests are redundant and can be removed.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-02-15 18:28:34 -08:00
Wenjun Ruan 77cb8e0a5a
MINOR: Remove repeat creation of `ZkConfigRepository` (#11762)
In `KafkaServer, `ZkConfigRepository` is just a wrapper of `zkClient`, so  we don't need to create a new one.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-02-15 13:09:14 -08:00
Luke Chen 71cbff62b6
MINOR: Fix and clarify kraft README and example configuration files (#11616)
Reviewers: Ron Dagostino <rdagostino@confluent.io>, Jason Gustafson <jason@confluent.io>
2022-02-15 10:27:38 -08:00
dengziming dd36331a81
MINOR: Enable kraft in ApiVersionTest (#11667)
This patch enables `ApiVersionsTest` to test both kraft brokers and controllers. It fixes a minor bug in which the `Envelope` request to be exposed from `ApiVersions` requests to the kraft broker. 

Reviewers: Jason Gustafson <jason@confluent.io>
2022-02-15 10:16:03 -08:00
David Jacot c8fbe26f3b
KAFKA-13435; Static membership protocol should let the leader skip assignment (KIP-814) (#11688)
This patch implements KIP-814 as described here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-814%3A+Static+membership+protocol+should+let+the+leader+skip+assignment.

Reviewers: Luke Chen <showuon@gmail.com>, Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>
2022-02-14 11:55:38 +01:00
Jason Gustafson fc20c551d6
MINOR: Clearer field names for ProducerIdsRecord and related classes (#11747)
The current naming of the fields in `ProducerIdsRecord` is a little confusing in regard to whether the block range was inclusive or exclusive. This patch tries to improve naming to make this clearer. In the record class, instead of `ProducerIdsEnd`, we use `NextProducerId`. We have also updated related classes such as `ProducerIdsBlock.java` with similar changes.

Reviewers: dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>
2022-02-11 16:14:31 -08:00
Jason Gustafson e43916c148
KAFKA-13661; Consistent permissions in KRaft for CreatePartitions API (#11745)
In #11649, we fixed one permission inconsistency between kraft and zk authorization for the `CreatePartitions` request. Previously kraft was requiring `CREATE` permission on the `Topic` resource when it should have required `ALTER`. A second inconsistency is that kraft was also allowing `CREATE` on the `Cluster` resource, which is not supported in zk clusters and was not documented in KIP-195: https://cwiki.apache.org/confluence/display/KAFKA/KIP-195%3A+AdminClient.createPartitions. This patch fixes this inconsistency and adds additional test coverage for both cases.

Reviewers: José Armando García Sancio <jsancio@gmail.com>
2022-02-11 15:01:08 -08:00
Mickael Maison 0269edfc80
KAFKA-13577: Replace easymock with mockito in kafka:core - part 3 (#11674)
Reviewers: Tom Bentley <tbentley@redhat.com>
2022-02-11 16:16:25 +01:00
dengziming 590df2c8be
KAFKA-13316; Enable KRaft mode in CreateTopics tests (#11655)
This PR follows #11629 to enable `CreateTopicsRequestWithForwardingTest` and `CreateTopicsRequestWithPolicyTest` in KRaft mode.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-02-10 14:10:23 -08:00
Alexandre Garnier 4da515da94
MINOR: Fix storage meta properties comparison (#11546)
This patch adds missing `equals` and `hashCode` implements for `RawMetaProperties`. This is relied on by the storage tool for detecting when two log directories have different `meta.properties` files. 

Reproduce current issue:

```shell
$ sed -i 's|log.dirs=/tmp/kraft-combined-logs|+log.dirs=/tmp/kraft-combined-logs,/tmp/kraft-combined-logs2' ./config/kraft/server.properties

$ ./bin/kafka-storage.sh format -t R19xNyxMQvqQRGlkGDi2cg -c ./config/kraft/server.properties
Formatting /tmp/kraft-combined-logs
Formatting /tmp/kraft-combined-logs2

$ ./bin/kafka-storage.sh info -c ./config/kraft/server.properties
Found log directories:
  /tmp/kraft-combined-logs
  /tmp/kraft-combined-logs2

Found metadata: {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1}

Found problem:
  Metadata for /tmp/kraft-combined-logs2/meta.properties was {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1}, but other directories featured {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1}
```

It's reporting that same metadata are not the same...

With this fix:

```shell
$ ./bin/kafka-storage.sh info -c ./config/kraft/server.properties
Found log directories:
  /tmp/kraft-combined-logs
  /tmp/kraft-combined-logs2

Found metadata: {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1}
```

Reviewers: Igor Soarez <soarez@apple.com>, Jason Gustafson <jason@confluent.io>
2022-02-10 09:55:14 -08:00
prince-mahajan 78a3789496
KAFKA-13636: Fix for the group coordinator issue where the offsets are deleted for unstable groups (#11742)
This patch ensures that the committed offsets are not expired while the group is rebalancing. The issue is that we can't rely on the subscribed topics if the group is not stable.

Reviewers: David Jacot <djacot@confluent.io>
2022-02-10 17:21:17 +01:00
Joseph (Ting-Chou) Lin 2e25ca1355
MINOR: Fix JavaDoc of OffsetIndex#append (#11744)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-02-10 11:27:34 +01:00
Mickael Maison 57235a0cb4
KAFKA-13577: Replace easymock with mockito in kafka:core - part 2 (#11673)
Reviewers: Tom Bentley <tbentley@redhat.com>
2022-02-10 09:15:58 +01:00
Colin Patrick McCabe d35283f011
KAFKA-13646; Implement KIP-801: KRaft authorizer (#11649)
Currently, when using KRaft mode, users still have to have an Apache ZooKeeper instance if they want to use AclAuthorizer. We should have a built-in Authorizer for KRaft mode that does not depend on ZooKeeper. This PR introduces such an authorizer, called StandardAuthorizer. See KIP-801 for a full description of the new Authorizer design.

Authorizer.java: add aclCount API as described in KIP-801. StandardAuthorizer is currently the only authorizer that implements it, but eventually we may implement it for AclAuthorizer and others as well.

ControllerApis.scala: fix a bug where createPartitions was authorized using CREATE on the topic resource rather than ALTER on the topic resource as it should have been.

QuorumTestHarness: rename the controller endpoint to CONTROLLER for consistency (the brokers already called it that). This is relevant in AuthorizerIntegrationTest where we are examining endpoint names. Also add the controllerServers call.

TestUtils.scala: adapt the ACL functions to be usable from KRaft, by ensuring that they use the Authorizer from the current active controller.

BrokerMetadataPublisher.scala: add broker-side ACL application logic.

Controller.java: add ACL APIs. Also add a findAllTopicIds API in order to make junit tests that use KafkaServerTestHarness#getTopicNames and KafkaServerTestHarness#getTopicIds work smoothly.

AuthorizerIntegrationTest.scala: convert over testAuthorizationWithTopicExisting (more to come soon)

QuorumController.java: add logic for replaying ACL-based records. This means storing them in the new AclControlManager object, and integrating them into controller snapshots. It also means applying the changes in the Authorizer, if one is configured. In renounce, when reverting to a snapshot, also set newBytesSinceLastSnapshot to 0.

Reviewers: YeonCheol Jang <YeonCheolGit@users.noreply.github.com>,  Jason Gustafson <jason@confluent.io>
2022-02-09 10:38:52 -08:00
Mickael Maison 150fecf6d3
KAFKA-13577: Replace easymock with mockito in kafka:core - part 1 (#11672)
Reviewers: Tom Bentley <tbentley@redhat.com>
2022-02-09 17:02:27 +01:00
RivenSun 4b468a9d81
KAFKA-13310 : KafkaConsumer cannot jump out of the poll method, and the… (#11340)
Title: KafkaConsumer cannot jump out of the poll method, and cpu and traffic on the broker side increase sharply
description: The local test has been passed, the problem described by jira can be solved

JIRA link : https://issues.apache.org/jira/browse/KAFKA-13310

Reviewers: Luke Chen <showuon@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
2022-02-08 23:05:42 -08:00
Ismael Juma 7c2d672413
MINOR: Update library dependencies (Q1 2022) (#11306)
- scala 2.13: 2.13.6 -> 2.13.8
  * Support Java 18 and improve Android compatibility
  * https://www.scala-lang.org/news/2.13.7
  * https://www.scala-lang.org/news/2.13.8
- scala 2.12: 2.12.14 -> 2.12.15. 
  * The `-release` flag now works with Scala 2.12, backend parallelism
    can be enabled via `-Ybackend-parallelism N` and string interpolation
    is more efficient.
  * https://www.scala-lang.org/news/2.12.5
- gradle versions plugin: 0.38.0 -> 0.42.0
  * Minor fixes
  * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.40.0
  * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.41.0
  * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.42.0
- gradle dependency check plugin: 6.1.6 -> 6.5.3
  * Minor fixes
- gradle spotbugs plugin: 4.7.1 -> 5.0.5
  * Fixes and minor improvements
  * There were too many releases to include all the links, include the major version bump
  * https://github.com/spotbugs/spotbugs-gradle-plugin/releases/tag/5.0.0
- gradle scoverage plugin: 5.0.0 -> 7.0.0
  * Support newer Gradle versions and other improvements
  * https://github.com/scoverage/gradle-scoverage/releases/tag/6.0.0
  * https://github.com/scoverage/gradle-scoverage/releases/tag/6.1.0
  * https://github.com/scoverage/gradle-scoverage/releases/tag/7.0.0
- gradle shadow plugin: 7.0.0 -> 7.1.2
  * Support gradle toolchains and security fixes
  * https://github.com/johnrengelman/shadow/releases/tag/7.1.0
  * https://github.com/johnrengelman/shadow/releases/tag/7.1.1
  * https://github.com/johnrengelman/shadow/releases/tag/7.1.2
- bcpkix: 1.66 -> 1.70
  * Several improvements and fixes
  * https://www.bouncycastle.org/releasenotes.html
- jline: 3.12.1 -> 3.21.0
  * Various fixes and improvements
- jmh: 1.32 -> 1.34
  * Compiler blackhole enabled by default when using Java 17 and improved
    gradle incremental compilation
  * https://mail.openjdk.java.net/pipermail/jmh-dev/2021-August/003355.html
  * https://mail.openjdk.java.net/pipermail/jmh-dev/2021-December/003406.html
- scalaLogging: 3.9.3 -> 3.9.4
  * Support for Scala 3.0
- jose4j: 0.7.8 -> 0.7.9
  * Minor fixes
- junit: 5.7.1 -> 5.8.2
  * Minor improvements and fixes
  * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.0
  * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.1
  * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.2
- jqwik: 1.5.0 -> 1.6.3
  * Numerous improvements
  * https://github.com/jlink/jqwik/releases/tag/1.6.0
- mavenArtifact: 3.8.1 -> 3.8.4
- mockito: 3.12.4 -> 4.3.1
  * Removed deprecated methods, `DoNotMock` annotation and
    minor fixes/improvements
  * https://github.com/mockito/mockito/releases/tag/v4.0.0
  * https://github.com/mockito/mockito/releases/tag/v4.1.0
  * https://github.com/mockito/mockito/releases/tag/v4.2.0
  * https://github.com/mockito/mockito/releases/tag/v4.3.0
- scalaCollectionCompat: 2.4.4 -> 2.6.0
  * Minor fixes
  * https://github.com/scala/scala-collection-compat/releases/tag/v2.5.0
  * https://github.com/scala/scala-collection-compat/releases/tag/v2.6.0
- scalaJava8Compat: 1.0.0 -> 1.0.2
  * Minor changes
- scoverage: 1.4.1 -> 1.4.11
  * Support for newer Scala versions
- slf4j: 1.7.30 -> 1.7.32
  * Minor fixes, 1.7.35 automatically uses reload4j and 1.7.33/1.7.34
    cause build failures, so we stick with 1.7.32 for now.
- zstd: 1.5.0-4 -> 1.5.2-1
  * zstd 1.5.2
  * Small refinements and performance improvements
  * https://github.com/facebook/zstd/releases/tag/v1.5.1
  * https://github.com/facebook/zstd/releases/tag/v1.5.2

Checkstyle, spotBugs and spotless will be upgraded separately as they
either require non trivial code changes or they have regressions
that affect us.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2022-02-07 15:24:50 -08:00
Luke Chen ca5d6f9229
KAFKA-13563: clear FindCoordinatorFuture for non consumer group mode (#11631)
After KAFKA-10793, we clear the findCoordinatorFuture in 2 places:

1. heartbeat thread
2. AbstractCoordinator#ensureCoordinatorReady

But in non consumer group mode with group id provided (for offset commitment. So that there will be consumerCoordinator created), there will be no (1)heartbeat thread , and it only call (2)AbstractCoordinator#ensureCoordinatorReady when 1st time consumer wants to fetch committed offset position. That is, after 2nd lookupCoordinator call, we have no chance to clear the findCoordinatorFuture , and causes the offset commit never succeeded.

To avoid the race condition as KAFKA-10793 mentioned, it's not safe to clear the findCoordinatorFuture in the future listener. So, I think we can fix this issue by calling AbstractCoordinator#ensureCoordinatorReady when coordinator unknown in non consumer group case, under each ConsumerCoordinator#poll.


Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-02-06 15:07:59 -08:00
Chia-Ping Tsai f49524e4c3
MINOR: disable zookeeper.sasl.client to avoid false error (#11469)
Reviewers: Mickael Maison <mimaison@users.noreply.github.com>
2022-02-06 03:34:22 +08:00
Jason Gustafson ba0fe610ed
MINOR: Do not use optional args in `ProducerStateManager` (#11734)
We allowed `maxProducerIdExpirationMs` and `time` to be optional in the `ProducerStateManager` constructor. We generally frown on optional arguments since it is too easy to overlook them. In this case, it was especially dangerous because the recently added `maxTransactionTimeoutMs` argument used the same type as `maxProducerIdExpirationMs`.

Reviewers: David Jacot <david.jacot@gmail.com, Ismael Juma <ismael@juma.me.uk>
2022-02-05 11:00:17 -08:00
Luke Chen e6db0ca48c
KAFKA-13598: enable idempotence producer by default and validate the configs (#11691)
In v3.0, we changed the default value for `enable.idempotence` to true, but we didn't adjust the validator and the `idempotence` enabled check method. So if a user didn't explicitly enable idempotence, this feature won't be turned on. This patch addresses the problem, cleans up associated logic, and fixes tests that broke as a result of properly applying the new default. Specifically it does the following:

1. fix the `ProducerConfig#idempotenceEnabled` method, to make it correctly detect if `idempotence` is enabled or not
2. remove some unnecessary config overridden and checks due to we already default `acks`, `retries` and `enable.idempotence` configs.
3. move the config validator for the idempotent producer from `KafkaProducer` into `ProducerConfig`. The config validation should be the responsibility of `ProducerConfig` class.
4. add an `AbstractConfig#hasKeyInOriginals` method, to avoid `originals` configs get copied and only want to check the existence of the key. 
5. fix many broken tests. As mentioned, we didn't actually enable idempotence in v3.0. After this PR, there are some tests broken due to some different behavior between idempotent and non-idempotent producer.
6. add additional tests to validate configuration behavior

Reviewers: Kirk True <kirk@mustardgrain.com>, Ismael Juma <ismael@juma.me.uk>, Mickael Maison <mimaison@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
2022-02-05 10:53:27 -08:00
Matthew Wong 17dcb8097c
MINOR: Update documentation and DumpLogSegments tool for addition of `deleteHorizonMs` in batch format (#11694)
This PR updates the documentation and tooling to match https://github.com/apache/kafka/pull/10914, which added support for encoding `deleteHorizonMs` in the record batch schema. The changes include adding the new attribute and updating field names. We have also updated stale references to the old `FirstTimestamp` field in the code and comments. Finally, In the `DumpLogSegments` tool, when record batch information is printed, it will also include the value of `deleteHorizonMs` is (e.g. `OptionalLong.empty` or `OptionalLong[123456]`).

Reviewers: Vincent Jiang <84371940+vincent81jiang@users.noreply.github.com>, Kvicii <42023367+Kvicii@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
2022-02-04 16:20:37 -08:00
Colin Patrick McCabe d4fb388583
MINOR: fix control plane listener + kraft error message (#11729)
The current error message suggests that controller.listener.names is a replacement for
control.plane.listener.name. This is incorrect since these configurations have very different
functions. This PR deletes the incorrect message.

Reviewers: David Jacot <david.jacot@gmail.com>, Kvicii
2022-02-03 10:08:00 -08:00
Kvicii 21c3009ac1
KAFKA-13583; Fix FetchRequestBetweenDifferentIbpTest flaky tests (#11699)
Co-authored-by: Kvicii <Karonazaba@gmail.com>
Reviewers: David Jacot <djacot@confluent.io>
2022-02-03 10:59:12 +01:00