kafka

Commit Graph

Author	SHA1	Message	Date
Divij Vaidya	0a50005408	KAFKA-13929: Replace legacy File.createNewFile() with NIO.2 Files.createFile() (#12197 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-06-10 13:28:55 +02:00
Joel Hamill	249cd4461f	MINOR: Fix typo in Kafka config docs (#12268 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-06-08 13:51:41 -07:00
José Armando García Sancio	21490af989	MINOR; Test last committed record offset for Controllers (#12249 ) As part of KIP-835, LastCommittedRecordOffset was added to the KafkaController metric type. Make sure to test that metric. Reviewers: Jason Gustafson <jason@confluent.io>	2022-06-08 10:45:04 -07:00
Jason Gustafson	4542acdc14	MINOR: Convert `ReassignPartitionsIntegrationTest` to KRaft (#12258 ) Updates relevant tests in `ReassignPartitionsIntegrationTest` for KRaft. We skip JBOD tests since it is not supported and we skip `AlterPartition` upgrade tests since they are not relevant. Reviewers: Kvicii <Karonazaba@gmail.com>, David Arthur <mumrah@gmail.com>	2022-06-07 20:59:24 -07:00
Jason Gustafson	49a0e0d72e	MINOR: Fix kraft timeout in LogOffsetTest (#12262 ) Fixes the timeouts we have been seeing in `LogOffsetTest` when KRaft is enabled. The problem is the dependence on `MockTime`. In the KRaft broker, we need a steadily advancing clock for events in `KafkaEventQueue` to get executed. In the case of the timeouts, the broker was stuck with the next heartbeat event in the queue. We depended on the execution of this event in order to send the next heartbeat and complete the `initialCatchUpFuture` and finish broker startup. This caused the test to get stuck during initialization, which is probably why the `@Timeout` wasn't working. The patch fixes the problem by using system time instead because there was not a strong dependence on `MockTime`. Reviewers: David Arthur <mumrah@gmail.com>	2022-06-07 17:50:45 -07:00
David Arthur	806098ffe1	KAFKA-13410; Add a --release-version flag for storage-tool (#12245 ) This patch removes the --metadata-version and adds a --release-version to the kafka-storage tool. This change is not a breaking change since we are removing --metadata-version which was introduced on May 18, but it has not been released yet. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>	2022-06-07 11:25:40 -07:00
David Jacot	151ca12a56	KAFKA-13916; Fenced replicas should not be allowed to join the ISR in KRaft (#12240 ) This PR implements the first part of KIP-841. Specifically, it implements the following: 1. Adds a new metadata version. 2. Adds the InControlledShutdown field to the BrokerRegistrationRecord and BrokerRegistrationChangeRecord and bump their versions. The newest versions are only used if the new metadata version is enabled. 3. Writes a BrokerRegistrationChangeRecord with InControlledShutdown set when a broker requests a controlled shutdown. 4. Ensures that fenced and in controlled shutdown replicas are not picked as leaders nor included in the ISR. 5. Adds or extends unit tests. Reviewes: José Armando García Sancio <jsancio@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>	2022-06-07 10:37:20 -07:00
Kvicii	09570f2540	KAFKA-13592; Fix flaky test ControllerIntegrationTest.testTopicIdUpgradeAfterReassigningPartitions (#11687 ) Fixes several race conditions in the test case causing the flaky failure. Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, Jason Gustafson <jason@confluent.io> Co-authored-by: Kvicii <Karonazaba@gmail.com>	2022-06-06 17:56:48 -07:00
Divij Vaidya	601051354b	MINOR: Correctly mark some tests as integration tests (#12223 ) Also fix package name of `ListOffsetsIntegrationTest`. Reviewers: dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-06-06 11:18:24 -07:00
dengziming	0b3ab4687e	KIP-835: metadata.max.idle.interval.ms shoud be much bigger than broker.heartbeat.interval.ms (#12238 ) The active quorum controller will append NoOpRecord periodically to increase metadata LEO, however, when a broker startup, we will wait until its metadata LEO catches up with the controller LEO, we generate NoOpRecord every 500ms and send heartbeat request every 2000ms. It's almost impossible for a broker to catch up with the controller LEO if the broker sends a query request every 2000ms but the controller LEO increases every 500ms, so the tests in KRaftClusterTest will fail. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, showuon <43372967+showuon@users.noreply.github.com>, Colin Patrick McCabe <cmccabe@confluent.io>	2022-06-03 11:39:30 -07:00
Rittika Adhikari	3467036e01	KAFKA-13803: Refactor Leader API Access (#12005 ) This PR refactors the leader API access in the follower fetch path. Added a LeaderEndPoint interface which serves all access to the leader. Added a LocalLeaderEndPoint and a RemoteLeaderEndPoint which implements the LeaderEndPoint interface to handle fetches from leader in local & remote storage respectively. Reviewers: David Jacot <djacot@confluent.io>, Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>	2022-06-03 09:12:06 -07:00
Kvicii	7e71483aed	MINOR: fix doc (#12243 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-06-03 15:56:13 +08:00
RivenSun	d8d92f0f80	MINOR: Update the kafka-reassign-partitions script command in documentation (#12237 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-06-02 21:30:22 +08:00
Luke Chen	fa33fb4d3c	KAFKA-13773: catch kafkaStorageException to avoid broker shutdown directly (#12136 ) When logManager startup and loadLogs, we expect to catch any IOException (ex: out of space error) and turn the log dir into offline. Later, we'll handle the offline logDir in ReplicaManage, so that the cleanShutdown file won't be created when all logDirs are offline. The reason why the broker shutdown with cleanShutdown file after full disk is because during loadLogs and do log recovery, we'll write leader-epoch-checkpoint fil. And if any IOException thrown, we'll wrap it as KafkaStorageException and rethrow. And since we don't catch KafkaStorageException, so the exception is caught in the other place and go with clean shutdown path. This PR is to fix the issue by catching the KafkaStorageException with IOException cause exceptions during loadLogs, and mark the logDir as offline to let the ReplicaManager handle the offline logDirs. Reviewers: Jun Rao <jun@confluent.io>, Alok Thatikunta <alok123thatikunta@gmail.com>	2022-06-02 14:15:51 +08:00
Jason Gustafson	0f9f7e6c78	MINOR: Enable kraft support in quota integration tests (#12217 ) Enable kraft support in BaseQuotaTest and its extensions. Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>	2022-06-01 16:56:10 -07:00
Colin Patrick McCabe	0ca9cd4d2d	MINOR: Several fixes and improvements for FeatureControlManager (#12207 ) This PR fixes a bug where FeatureControlManager#replay(FeatureLevelRecord) was throwing an exception if not all controllers in the quorum supported the feature being applied. While we do want to validate this, it needs to be validated earlier, before the record is committed to the log. Once the record has been committed to the log it should always be applied if the current controller supports it. Fix another bug where removing a feature was not supported once it had been configured. Note that because we reserve feature level 0 for "feature not enabled", we don't need to use Optional<VersionRange>; we can just return a range of 0-0 when the feature is not supported. Allow the metadata version to be downgraded when UpgradeType.UNSAFE_DOWNGRADE has been set. Previously we were unconditionally denying this even when this was set. Add a builder for FeatureControlManager, so that we can easily add new parameters to the constructor in the future. This will also be useful for creating FeatureControlManagers that are initialized to a specific MetadataVersion. Get rid of RemoveFeatureLevelRecord, since it's easier to just issue a FeatureLevelRecord with the level set to 0. Set metadata.max.idle.interval.ms to 0 in RaftClusterSnapshotTest for more predictability. Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>	2022-06-01 16:09:38 -07:00
dengziming	1d6e3d6cb3	KAFKA-13845: Add support for reading KRaft snapshots in kafka-dump-log (#12084 ) The kafka-dump-log command should accept files with a suffix of ".checkpoint". It should also decode and print using JSON the snapshot header and footer control records. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>	2022-06-01 14:49:00 -07:00
José Armando García Sancio	7d1b0926fa	KAFKA-13883: Implement NoOpRecord and metadata metrics (#12183 ) Implement NoOpRecord as described in KIP-835. This is controlled by the new metadata.max.idle.interval.ms configuration. The KRaft controller schedules an event to write NoOpRecord to the metadata log if the metadata version supports this feature. This event is scheduled at the interval defined in metadata.max.idle.interval.ms. Brokers and controllers were improved to ignore the NoOpRecord when replaying the metadata log. This PR also addsffour new metrics to the KafkaController metric group, as described KIP-835. Finally, there are some small fixes to leader recovery. This PR fixes a bug where metadata version 3.3-IV1 was not marked as changing the metadata. It also changes the ReplicaControlManager to accept a metadata version supplier to determine if the leader recovery state is supported. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-06-01 10:48:24 -07:00
Clara Fang	31a84dd72e	KAFKA-13946; Add missing parameter to kraft test kit `ControllerNode.setMetadataDirectory()` (#12225 ) Added parameter `metadataDirectory` to `setMetadataDirectory()` so that `this.metadataDirectory` would not be set to itself. Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-05-30 15:55:07 -07:00
Jason Gustafson	645c1ba526	MINOR: Fix buildResponseSend test cases for envelope responses (#12185 ) The test cases we have in `RequestChannelTest` for `buildResponseSend` construct the envelope request incorrectly. The request is created using the envelope context, but also a reference to the wrapped envelope request object. This patch fixes `TestUtils.buildEnvelopeRequest` so that the wrapped request is built properly. It also fixes the dependence on this incorrect construction and consolidates the tests in `RequestChannelTest` to avoid duplication. Reviewers: dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>	2022-05-30 11:34:36 -07:00
bozhao12	620ada9888	MINOR: Fix typo in ClusterTestExtensionsTest (#12218 ) Reviewers: Kvicii <Karonazaba@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-05-27 13:38:54 -07:00
Colin Patrick McCabe	7143267f71	MINOR: Fix some bugs with UNREGISTER_BROKER Fix some bugs in the KRaft unregisterBroker API and add a junit test. 1. kafka-cluster-tool.sh unregister should fail if no broker ID is passed. 2. UnregisterBrokerRequest must be marked as a KRaft broker API so that KRaft brokers can receive it. 3. KafkaApis.scala must forward UNREGISTER_BROKER to the controller. Reviewers: Jason Gustafson <jason@confluent.io>, dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>	2022-05-26 14:07:29 -07:00
David Arthur	4efdc1a310	MINOR: Consolidate FinalizedFeatureCache into MetadataCache (#12214 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-05-26 16:25:58 -04:00
Jason Gustafson	43160bc476	MINOR: Add timeout to LogOffsetTest (#12213 ) Reviewers: Kvicii <Karonazaba@gmail.com>, David Arthur <mumrah@gmail.com>	2022-05-26 16:07:54 -04:00
David Jacot	76477ffd2d	KAFKA-13858; Kraft should not shutdown metadata listener until controller shutdown is finished (#12187 ) When the kraft broker begins controlled shutdown, it immediately disables the metadata listener. This means that metadata changes as part of the controlled shutdown do not get sent to the respective components. For partitions that the broker is follower of, that is what we want. It prevents the follower from being able to rejoin the ISR while still shutting down. But for partitions that the broker is leading, it means the leader will remain active until controlled shutdown finishes and the socket server is stopped. That delay can be as much as 5 seconds and probably even worse. This PR revises the controlled shutdown procedure as follow: * The broker signals to the replica manager that it is about to start the controlled shutdown. * The broker requests a controlled shutdown to the controller. * The controller moves leaders off from the broker, removes the broker from any ISR that it is a member of, and writes those changes to the metadata log. * When the broker receives a partition metadata change, it looks if it is in the ISR. If it is, it updates the partition as usual. If it is not or if there is no leader defined--as would be the case if the broker was the last member of the ISR--it stops the fetcher/replica. This basically stops all the partitions for which the broker was part of their ISR. When the broker is a replica of a partition but it is not in the ISR, the controller does not do anything. The leader epoch is not bumped. In this particular case, the follower will continue to run until the replica manager shuts down. In this time, the replica could become in-sync and the leader could try to bring it back to the ISR. This remaining issue will be addressed separately. Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-25 16:09:01 -07:00
dengziming	54d60ced86	KAFKA-13833: Remove the min_version_level from the finalized version range written to ZooKeeper (#12062 ) Reviewers: David Arthur <mumrah@gmail.com>	2022-05-25 14:02:34 -04:00
Divij Vaidya	f6ba10ef9c	MINOR: Fix flaky test TopicCommandIntegrationTest.testDescribeAtMinIsrPartitions(String).quorum=kraft (#12189 ) Flaky test as failed in CI https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-12184/1/tests/ The test fails because it does not wait for metadata to be propagated across brokers before killing a broker which may lead to it getting stale information. Note that a similar test was done in #12104 for a different test. Reviewers: Kvicii Y, Ziming Deng, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2022-05-21 10:33:44 -07:00
dengziming	6380652a5a	KAFKA-13863; Prevent null config value when create topic in KRaft mode (#12109 ) This patch ensures consistent handling of null-valued topic configs between the zk and kraft controller. Prior to this patch, we returned INVALID_REQUEST in zk mode and it was not an error in kraft. After this patch, we return INVALID_CONFIG consistently for this case. Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-19 09:46:48 -07:00
Jason Gustafson	8efdbce523	KAFKA-13837; Return an error from Fetch if follower is not a valid replica (#12150 ) When a partition leader receives a `Fetch` request from a replica which is not in the current replica set, the behavior today is to return a successful fetch response, but with empty data. This causes the follower to retry until metadata converges without updating any state on the leader side. It is clearer in this case to return an error, so that the metadata inconsistency is visible in logging and so that the follower backs off before retrying. In this patch, we use `UNKNOWN_LEADER_EPOCH` when the `Fetch` request includes the current leader epoch. The way we see this is that the leader is validating the (replicaId, leaderEpoch) tuple. When the leader returns `UNKNOWN_LEADER_EPOCH`, it means that the leader does not expect the given leaderEpoch from that replica. If the request does not include a leader epoch, then we use `NOT_LEADER_OR_FOLLOWER`. We can take a similar interpretation for this case: the leader is rejecting the request because it does not think it should be the leader for that replica. But mainly these errors ensure that the follower will retry the request. As a part of this patch, I have refactored the way that the leader updates follower fetch state. Previously, the process is a little convoluted. We send the fetch from `ReplicaManager` down to `Partition.readRecords`, then we iterate over the results and call `Partition.updateFollowerFetchState`. It is more straightforward to update state directly as a part of `readRecords`. All we need to do is pass through the `FetchParams`. This also prevents an unnecessary copy of the read results. Reviewers: David Jacot <djacot@confluent.io>	2022-05-18 20:58:20 -07:00
bozhao12	b4f35c9ce0	MINOR: Fix typo in ReplicaManagerTest (#12178 ) Reviewer: Luke Chen <showuon@gmail.com>	2022-05-19 10:28:47 +08:00
Jason Gustafson	1802c6dcb5	MINOR: Enable KRaft in `TransactionsTest` (#12176 ) Enable support for KRaft in `TransactionsTest`. Reviewers: David Arthur <mumrah@gmail.com>	2022-05-18 14:07:59 -07:00
David Arthur	1135f22eaf	KAFKA-13830 MetadataVersion integration for KRaft controller (#12050 ) This patch builds on #12072 and adds controller support for metadata.version. The kafka-storage tool now allows a user to specify a specific metadata.version to bootstrap into the cluster, otherwise the latest version is used. Upon the first leader election of the KRaft quroum, this initial metadata.version is written into the metadata log. When writing snapshots, a FeatureLevelRecord for metadata.version will be written out ahead of other records so we can decode things at the correct version level. This also includes additional validation in the controller when setting feature levels. It will now check that a given metadata.version is supportable by the quroum, not just the brokers. Reviewers: José Armando García Sancio <jsancio@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>, Alyssa Huang <ahuang@confluent.io>	2022-05-18 12:08:36 -07:00
runom	cf34a2e4b0	MINOR: Replace string literal with constant in RequestChannel (#12134 ) Replace the "RequestsPerSec" literal value with the pre-existing constant `RequestsPerSec`. Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-05-18 11:31:15 -07:00
dengziming	67d00e25e9	MINOR: Enable some AdminClient integration tests (#12110 ) Enable KRaft in `AdminClientWithPoliciesIntegrationTes`t and `PlaintextAdminIntegrationTest`. There are some tests not enabled or not as expected yet: - testNullConfigs, see KAFKA-13863 - testDescribeCluster and testMetadataRefresh, currently we don't get the real controller in KRaft mode so the test may not run as expected This patch also changes the exception type raised from invalid `IncrementalAlterConfig` requests with the `SUBTRACT` and `APPEND` operations. When the configuration value type is not a list, we now raise `INVALID_CONFIG` instead of `INVALID_REQUEST`. Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-05-18 09:39:26 -07:00
David Jacot	a1cd1d1839	MINOR: Followers should not have any remote replica states left over from previous leadership (#12138 ) This patch ensures that followers don't have any remote replica states left over from previous leadership. Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-18 09:32:48 +02:00
bozhao12	f36de0744b	MINOR: Remove redundant metric reset in KafkaController (#12158 ) The following variables in `KafkaController` are used for metrics： ``` offlinePartitionCount preferredReplicaImbalanceCount globalTopicCount globalPartitionCount topicsToDeleteCount replicasToDeleteCount ineligibleTopicsToDeleteCount ineligibleReplicasToDeleteCount ``` When the controller goes from active to non-active, these variables will be reset to 0. Currently, this is done explicitly in in `KafkaController.onControllerResignation()` and also after every loop iteration in `KafkaController.updateMetrics()` . The first of these is redundant and can be removed. This patch fixes this and also simplifies `updateMetrics`. Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-17 15:40:05 -07:00
dengziming	5f039bae1c	KAFKA-13905: Fix failing ServerShutdownTest.testCleanShutdownAfterFailedStartupDueToCorruptLogs (#12165 ) Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>	2022-05-17 16:31:28 +08:00
David Jacot	972b76561a	MINOR: Rename remaining `zkVersion` to `partitionEpoch` in `PartitionTest` (#12147 ) Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-05-17 08:58:43 +02:00
Jason Gustafson	1103c76d63	KAFKA-13899: Use INVALID_CONFIG error code consistently in AlterConfig APIs (#12162 ) In the AlterConfigs/IncrementalAlterConfigs zk handler, we return `INVALID_REQUEST` and `INVALID_CONFIG` inconsistently. The problem is in `LogConfig.validate`. We may either return `ConfigException` or `InvalidConfigException`. When the first of these is thrown, we catch it and convert to `INVALID_REQUEST`. If the latter is thrown, then we return `INVALID_CONFIG`. It seems more appropriate to return `INVALID_CONFIG` consistently, which is what the KRaft implementation already does this. This patch fixes this and converts a few integration tests to KRaft. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>	2022-05-16 17:41:23 -07:00
Joel Hamill	06051988a2	MINOR: Clarify impact of num.replica.fetchers (#12153 ) The documentation for `num.replica.fetchers` should emphasize the fact that the count applies to each source broker individually. Also mention the tradeoff. Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-16 09:38:06 -07:00
Divij Vaidya	5fae84e4d1	KAFKA-13851: Add integration tests for DeleteRecords API (#12087 ) Reviewers: Luke Chen <showuon@gmail.com>, dengziming <dengziming1993@gmail.com>	2022-05-16 15:50:50 +08:00
Colin Patrick McCabe	a3e0af94f2	MINOR: convert some tests to KRaft (#12155 ) Convert EndToEndClusterIdTest, ConsumerGroupCommandTest, ListConsumerGroupTest, and LogOffsetTest to test KRaft mode. Reviewers: Jason Gustafson <jason@confluent.io>, dengziming <dengziming1993@gmail.com>	2022-05-13 17:29:47 -07:00
vamossagar12	f96e381387	KAFKA-13746: Attempt to fix flaky test by waiting on metadata update (#12104 ) Reviewers: dengziming <dengziming1993@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	2022-05-13 17:09:47 -07:00
Colin Patrick McCabe	fa59be4e77	KAFKA-13649: Implement early.start.listeners and fix StandardAuthorizer loading (#11969 ) Since the StandardAuthorizer relies on the metadata log to store its ACLs, we need to be sure that we have the latest metadata before allowing the authorizer to be used. However, if the authorizer is not usable for controllers in the cluster, the latest metadata cannot be fetched, because inter-node communication cannot occur. In the initial commit which introduced StandardAuthorizer, we punted on the loading issue by allowing the authorizer to be used immediately. This commit fixes that by implementing early.start.listeners as specified in KIP-801. This will allow in superusers immediately, but throw the new AuthorizerNotReadyException if non-superusers try to use the authorizer before StandardAuthorizer#completeInitialLoad is called. For the broker, we call StandardAuthorizer#completeInitialLoad immediately after metadata catch-up is complete, right before unfencing. For the controller, we call StandardAuthorizer#completeInitialLoad when the node has caught up to the high water mark of the cluster metadata partition. This PR refactors the SocketServer so that it creates the configured acceptors and processors in its constructor, rather than requiring a call to SocketServer#startup A new function, SocketServer#enableRequestProcessing, then starts the threads and begins listening on the configured ports. enableRequestProcessing uses an async model: we will start the acceptor and processors associated with an endpoint as soon as that endpoint's authorizer future is completed. Also fix a bug where the controller and listener were sharing an Authorizer when in co-located mode, which was not intended. Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-12 14:48:33 -07:00
José Armando García Sancio	e94934b6b7	MINOR; DeleteTopics version tests (#12141 ) Add a DeleteTopics test for all supported versions. Convert the DeleteTopicsRequestTest to run against both ZK and KRaft mode. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>	2022-05-12 13:04:48 -07:00
dengziming	6ab4d047d5	MINOR: Fix flaky testDescribeUnderReplicatedPartitions (#12112 ) Currently, we are waiting for metadataCache to bookkeeper the partition info, this isn't enough, we should wait until the partition ISR is less than AR. Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <divijvaidya13@gmail.com>	2022-05-12 20:55:42 +08:00
Jason Gustafson	7730476603	MINOR: Create case class to encapsulate fetch parameters and simplify handling (#12082 ) This patch adds a new case class `FetchParams` which encapsulates the parameters of the fetch request. It then uses this class in `DelayedFetch` directly instead of `FetchMetadata`. The intent is to reduce the number of things we need to change whenever we need to pass through new parameters. The patch also cleans up `ReplicaManagerTest` for more consistent usage. Reviewers: David Jacot <djacot@confluent.io>	2022-05-10 13:24:23 -07:00
Colin Patrick McCabe	1cfc7c25fd	MINOR: install Exit.exit handler in BrokerMetadataPublisherTest (#12142 ) Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-10 12:44:39 -07:00
dengziming	0c1cde1080	KAFKA-13862; Support Append/Subtract multiple config values in KRaft mode (#12108 ) We can append/subtract multiple config values in kraft mode using the `IncrementalAlterConfig` RPC. For example: append/subtract topic config "cleanup.policy" with value="delete,compact" will end up treating "delete,compact" as a value not 2 values. This patch fixes the problem. Additionally, it update the zk logic to correctly handle duplicate additions. Reviewers: Akhilesh Chaganti <akhileshchg@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>	2022-05-10 12:41:17 -07:00
David Jacot	b485f92647	KAFKA-13790; ReplicaManager should be robust to all partition updates from kraft metadata log (#12085 ) This patch refactors the `Partition.makeLeader` and `Partition.makeFollower` to be robust to all partition updates from the KRaft metadata log. Particularly, it ensures the following invariants: - A partition update is accepted if the partition epoch is equal or newer. The partition epoch is updated by the AlterPartition path as well so we accept an update from the metadata log with the same partition epoch in order to fully update the partition state. - The leader epoch state offset is only updated when the leader epoch is bumped. - The follower states are only updated when the leader epoch is bumped. - Fetchers are only restarted when the leader epoch is bumped. This was already the case but this patch adds unit tests to prove/maintain it. In the mean time, the patch unifies the state change logs to be similar in both ZK and KRaft world. Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-09 20:47:14 +02:00
RivenSun	df507e56e2	KAFKA-13793: Add validators for configs that lack validators (#12010 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Luke Chen <showuon@gmail.com>, Chris Egerton <fearthecellos@gmail.com>, Christo Lolov <lolovc@amazon.com>, Divij Vaidya <divijvaidya13@gmail.com>	2022-05-09 20:29:17 +02:00
Luke Chen	16324448a2	KAFKA-13804: Output the reason why broker exit unexpectedly during startup (#12028 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-05-06 11:25:33 +02:00
Lucas Wang	0a9893cac0	KAFKA-13815: Avoid reinitialization for a replica that is being deleted (#12029 ) This PR tries to avoid the reinitialization of the leader epoch cache and the partition metadata if the corresponding replica is being deleted. With this change, the asyncDelete method can run more efficiently, which means a StopReplica request with many partitions to be deleted can be processed more quickly. Reviewers: David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>	2022-05-04 11:41:34 -07:00
Akhilesh Chaganti	430f75ba22	KAFKA-13861; Fix the validateOnly behavior for CreatePartitions requests in KRaft mode (#12106 ) The KRaft implementation of the `CreatePartitions` ignores the `validateOnly` flag in the request and creates the partitions if the validations are successful. Fixed the behavior not to create partitions upon validation if the `validateOnly` flag is true. Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-05-04 10:31:46 -07:00
Joel Hamill	c6d3bcbd16	MINOR: Improve docs about how to provide multiple log.dir (#12119 ) Reviewer: Luke Chen <showuon@gmail.com>	2022-05-04 11:15:29 +08:00
dengziming	bf7cd675f8	MINOR: Remove duplicated test cases in MetadataVersionTest (#12116 ) These tests belongs to ApiVersionsResponseTest, and accidentally copied them to MetadataVersionTest when working on #12072. Reviewer: Luke Chen <showuon@gmail.com>	2022-05-04 11:10:39 +08:00
Alyssa Huang	8245c9a3d5	KAFKA-13854 Refactor ApiVersion to MetadataVersion (#12072 ) Refactoring ApiVersion to MetadataVersion to support both old IBP versioning and new KRaft versioning (feature flags) for KIP-778. IBP versions are now encoded as enum constants and explicitly prefixed w/ IBP_ instead of KAFKA_, and having a LegacyApiVersion vs DefaultApiVersion was not necessary and replaced with appropriate parsing rules for extracting the correct shortVersions/versions. Co-authored-by: David Arthur <mumrah@gmail.com> Reviewers: Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2022-05-02 16:27:52 -07:00
Colin Patrick McCabe	4a6287e832	MINOR: fix ClientQuotasRequestTest (#12107 ) Fix ClientQuotasRequestTest.testAlterClientQuotasBadIp so that it uses actually unresolvable hostnames. The previous choices "ip" and "abc-123" are now resolvable. Reviewers: David Jacot <djacot@confluent.io>, Andrew Choi <andrew.choi@uwaterloo.ca>, Divij Vaidya <divijvaidya13@gmail.com>	2022-05-02 09:06:16 -07:00
bozhao12	fbcff567d0	KAFKA-13865: Fix ResponseSendTimeMs metric in RequestChannel is removed twice (#12111 ) Fix ResponseSendTimeMs metric in RequestChannel is removed twice Reviewers: Luke Chen <showuon@gmail.com>	2022-05-02 17:02:53 +08:00
Jason Gustafson	f2a782a4d7	MINOR: Rename `AlterIsrManager` to `AlterPartitionManager` (#12089 ) Since we have changed the `AlterIsr` API to `AlterPartition`, it makes sense to rename `AlterIsrManager` as well and some of the associated classes. Reviewers: dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>	2022-04-26 09:34:18 -07:00
Alyssa Huang	2a7fdd7670	MINOR; enable KRaft in ConfigCommandIntegrationTest (#11732 ) Adding KRaft and ZK params to ConfigCommandIntegrationTest wherever appropriate. Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@users.noreply.github.com>	2022-04-25 15:11:14 -07:00
David Jacot	a5f7c82a86	MINOR: Refactor `kafka.cluster.Replica` (#12081 ) This patch refactors kafka.cluster.Replica, it usages and tests. This is part of the work in KAFKA-13790. Reviewers: Jason Gustafson <jason@confluent.io>	2022-04-25 21:43:32 +01:00
Jason Gustafson	25ee7f147c	MINOR: Change `AlterPartition` validation order in `KafkaController` (#12032 ) Currently we validate recovery state before checking leader epoch in `KafkaController`. It seems more intuitive to validate leader epoch first since the leader might be working with stale state, which is what we do in KRaft. This patch fixes this and adds a couple additional validations to make the behavior consistent. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>	2022-04-25 09:37:03 -07:00
Hongten	ff3d42a18c	KAFKA-13852: Kafka Acl documentation bug for wildcard '' (#12090 ) The wildcard in command without wrapped by single quote will be replaced into the file name under the current folder by bash. So we need to wrap with single quote. Update the doc and command option description. Reviewers: dengziming <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>	2022-04-24 16:50:44 +08:00
David Jacot	7c8c65fc54	MINOR: Rename `ZkVersion` to `PartitionEpoch` (#12071 ) This patch does some initial cleanups in the context of KAFKA-13790. Mainly, it renames `ZkVersion` field to `PartitionEpoch` in the `LeaderAndIsrRequest`, the `LeaderAndIsr` and the `Partition`. Reviewers: Jason Gustafson <jason@confluent.io>, dengziming <dengziming1993@gmail.com>	2022-04-22 20:38:17 +02:00
Mickael Maison	2ed09db6c4	MINOR: Scala cleanups in core (#12058 ) Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>	2022-04-20 15:10:46 +02:00
Divij Vaidya	301c6f44d6	KAFKA-10095: Add stricter assertion in LogCleanerManagerTest (#12004 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-04-20 12:45:03 +02:00
Colin Patrick McCabe	9c3f605fc7	KAFKA-13835: Fix two bugs related to dynamic broker configs in KRaft (#12063 ) Fix two bugs related to dynamic broker configs in KRaft. The first bug is that we are calling reloadUpdatedFilesWithoutConfigChange when a topic configuration is changed, but not when a broker configuration is changed. This is backwards. This function must be called only for broker configs, and never for topic configs or cluster configs. The second bug is that there were several configurations such as max.connections which are related to broker listeners, but which do not involve changing the registered listeners. We should support these configurations in KRaft. This PR fixes the configuration change validation to support this case. Reviewers: Jason Gustafson <jason@confluent.io>, Matthew de Detrich <mdedetrich@gmail.com>	2022-04-19 13:17:16 -07:00
dengziming	9ec232fde8	KAFKA-13832: Fix flaky testAlterAssignment (#12060 ) In KRaft mode the metadata is not propagate in time, so we should should wait for it before make assertions. Reviewers: Luke Chen <showuon@gmail.com>	2022-04-19 15:08:29 +08:00
Luke Chen	44906bdcdf	KAFKA-8785: fix request timeout by waiting for metadata cache up-to-date (#11681 ) The reason why this test is flaky is because we have race condition at the beginning of the test, when brokers are staring up, and the adminClient is requesting for brokers metadata. Once the adminClient only got partial metadata, the test will fail, because in these tests, brokers will be shutdown to test leader election. Fix this issue by explicitly waiting for metadata cache up-to-date in waitForReadyBrokers, and let admin client get created after waitForReadyBrokers. Reviewers: Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, dengziming <dengziming1993@gmail.com>	2022-04-19 14:13:21 +08:00
Colin Patrick McCabe	1521813a3a	KAFKA-13807: Fix incrementalAlterConfig and refactor some things (#12033 ) Ensure that we can set log.flush.interval.ms at the broker or cluster level via IncrementalAlterConfigs. This was broken by KAFKA-13749, which added log.flush.interval.ms as the second synonym rather than the first. Add a regression test to DynamicConfigChangeTest. Create ControllerRequestContext and pass it to every controller API. This gives us a uniform way to pass through information like the deadline (if there is one) and the Kafka principal which is making the request (in the future we will want to log this information). In ControllerApis, enforce a timeout for broker heartbeat requests which is equal to the heartbeat request interval, to avoid heartbeats piling up on the controller queue. This should have been done previously, but we overlooked it. Add a builder for ClusterControlManager and ReplicationControlManager to avoid the need to deal with a lot of churn (especially in test code) whenever a new constructor parameter gets added for one of these. In ControllerConfigurationValidator, create a separate function for when we just want to validate that a ConfigResource is a valid target for DescribeConfigs. Previously we had been re-using the validation code for IncrementalAlterConfigs, but this was messy. Split out the replica placement code into a separate package and reorganize it a bit. Reviewers: David Arthur <mumrah@gmail.com	2022-04-15 16:07:23 -07:00
Xiaobing Fang	f4e3ccd81a	MINOR: update comment in LocalLog.replaceSegments() (#12054 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-04-15 21:00:11 +08:00
dengziming	77cd827104	MINOR: Move some TopicCommand and ConfigCommand integration tests to unit tests (#12024 ) Move some TopicCommand and ConfigCommand integration tests to unit tests to speed up the tests Reviewers: Luke Chen <showuon@gmail.com>	2022-04-15 16:35:52 +08:00
David Arthur	55ff5d3603	KAFKA-13823 Feature flag changes from KIP-778 (#12036 ) This PR includes the changes to feature flags that were outlined in KIP-778. Specifically, it changes UpdateFeatures and FeatureLevelRecord to remove the maximum version level. It also adds dry-run to the RPC so the controller can actually attempt the upgrade (rather than the client). It introduces an upgrade type enum, which supersedes the allowDowngrade boolean. Because FeatureLevelRecord was unused previously, we do not need to introduce a new version. The kafka-features.sh tool was overhauled in KIP-778 and now includes the describe, upgrade, downgrade, and disable sub-commands. Refer to [KIP-778](https://cwiki.apache.org/confluence/display/KAFKA/KIP-778%3A+KRaft+Upgrades) for more details on the new command structure. Reviewers: Colin P. McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>	2022-04-14 10:04:32 -07:00
Lee Dongjin	01e4ceba52	KAFKA-12613: Fix inconsistent validation logic between KafkaConfig and LogConfig (#10472 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-04-14 11:58:57 +02:00
dengziming	87aa8259dd	KAFKA-13743: Prevent topics with conflicting metrics names from being created in KRaft mode #11910 In ZK mode, the topic "foo_bar" will conflict with "foo.bar" because of limitations in metric names. We should implement this in KRaft mode. This PR also changes TopicCommandIntegrationTest to support KRaft mode. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-04-13 11:59:29 -07:00
José Armando García Sancio	a6d86b9998	MINOR: Verify stopReplica if broker epoch not stale (#12040 ) Verify that ReplicaManager.stopReplica is called if the stop replica request doesn't result in a stale broker epoch error. Reviewers: Mickael Maison <mimaison@users.noreply.github.com>	2022-04-13 09:05:27 -07:00
RivenSun	1df232c839	MINOR: Supplement the description of `Valid Values` in the documentation of `compression.type` (#11985 ) Because a validator is added to ProducerConfig.COMPRESSION_TYPE_CONFIG and KafkaConfig.CompressionTypeProp, the corresponding testCase is improved to verify whether the wrong value of compression.type will throw a ConfigException. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	2022-04-12 21:24:57 -07:00
José Armando García Sancio	317fff9bb0	MINOR: Re-use counter in mocking of LogSegment.size (#12021 ) When migrating from Easymock to Mockito, the mockito implemetnation didn't have the same semantic as the Easymock implementation. Without this fix the mocking of LogSegment.size() always returns 0 because a new AtomicInteger was getting created for each invocation of LogSegment.size() Reviewers: Mickael Maison <mimaison@users.noreply.github.com>	2022-04-11 13:37:47 -07:00
David Jacot	6e9cd0c7f5	MINOR: A few code cleanups in DynamicBrokerConfig (#12015 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-04-09 11:42:42 +02:00
Alok Nikhil	7a5f0cfaef	MINOR: Fix DescribeLogDirs API error handling for older API versions (#12017 ) With KAFKA-13527 / KIP-784 we introduced a new top-level error code for the DescribeLogDirs API for versions 3 and above. However, the change regressed the error handling for versions less than 3 since the response converter fails to write the non-zero error code out (rightly) for versions lower than 3 and drops the response to the client which eventually times out instead of receiving an empty log dirs response and processing that as a Cluster Auth failure. With this change, the API conditionally propagates the error code out to the client if the request API version is 3 and above. This keeps the semantics of the error handling the same for all versions and restores the behavior for older versions. See current behavior in the broker log: ```bash ERROR] 2022-04-08 01:22:56,406 [data-plane-kafka-request-handler-10] kafka.server.KafkaApis - [KafkaApi-0] Unexpected error handling request RequestHeader(apiKey=DESCRIBE_LOG_DIRS, apiVersion=0, clientId=sarama, correlationId=1) -- DescribeLogDirsRequestData(topics=null) org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default errorCode at version 0 [ERROR] 2022-04-08 01:22:56,407 [data-plane-kafka-request-handler-10] kafka.server.KafkaRequestHandler - [Kafka Request Handler 10 on Broker 0], Exception when handling request org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default errorCode at version 0 ``` Reviewers: Ismael Juma <ismael@juma.me.uk>	2022-04-08 12:54:09 -07:00
Xavier Léauté	60c0916bfd	KAFKA-13801: Kafka server does not respect MetricsReporter contract for dynamically configured reporters (#11998 ) MetricsReporter.contextChange contract states the method should always be called first before MetricsReporter.init is called. This is done correctly for reporters enabled by default (e.g. JmxReporter) but not for metrics reporters configured dynamically. This fixes the call ordering for dynamically configured metrics reporter and updates tests to enforce ordering. Reviewers: David Jacot <djacot@confluent.io>	2022-04-07 10:13:15 +02:00
sciclon2	92305c2cf2	KAFKA-13687: Limiting the amount of bytes to be read in a segment logs (#11842 ) This PR allows to limit the output batches while they are inspected via the kafka-dump-log.sh script. The idea is to take samples from the logsegments without affecting a production cluster as the current script will read the whole files, this could create issues related to performance. Please see the KIP-824 Reviewers: Jun Rao <junrao@gmail.com>	2022-04-06 08:48:43 -07:00
Anastasia Vela	206ad4d2b5	MINOR: Fix flaky testIdleConnection() test (#11996 ) The test expects that the connection becomes idle before the mock time is moved forward, but the processor thread runs concurrently and may run some activity on the connection after the mock time is moved forward, thus the connection never expires. The solution is to wait until the message is received on the socket, and only then wait until the connection is unmuted (it's not enough to wait for unmuted without waiting for message being received on the socket, because the channel might have not been muted yet). Reviewers: David Jacot <djacot@confluent.io>	2022-04-06 16:35:25 +02:00
bozhao12	4218fc61fe	KAFKA-13778: Fetch from follower should never run the preferred read replica selection (#11965 ) The current preferred read replica selection logic relies on `partition.leaderReplicaIdOpt` to determine if the selection must be run. The issue is that `partition.leaderReplicaIdOpt` is defined for both the leader and the followers thus the logic is ran all the time. The impact is not too bad as the leader is selected most of the time when the logic is ran by the follower and the leader is filtered out. However there are cases where the selection on a follower could redirect the consumer to another follower under certain rare conditions. For instance with the `RackAwareReplicaSelector `, the follower must have stale replica states from a previous leadership and must have other followers in the same rack for instance. Other implementation of the selection logic could be more impacted. This patch ensures that the preferred read replica selection is only ran by the leader. Reviewers: David Jacot <djacot@confluent.io>	2022-04-05 18:56:23 +02:00
Anastasia Vela	ae45c59e61	MINOR: Fix flaky testClientDisconnectionUpdatesRequestMetrics() (#11987 ) Reviewers: David Jacot <djacot@confluent.io>	2022-04-04 09:10:33 +02:00
Colin Patrick McCabe	62ea4c46a9	KAFKA-13749: CreateTopics in KRaft must return configs (#11941 ) Previously, when in KRaft mode, CreateTopics did not return the active configurations for the topic(s) it had just created. This PR addresses that gap. We will now return these topic configuration(s) when the user has DESCRIBE_CONFIGS permission. (In the case where the user does not have this permission, we will omit the configurations and set TopicErrorCode. We will also omit the number of partitions and replication factor data as well.) For historical reasons, we use different names to refer to each topic configuration when it is set in the broker context, as opposed to the topic context. For example, the topic configuration "segment.ms" corresponds to the broker configuration "log.roll.ms". Additionally, some broker configurations have synonyms. For example, the broker configuration "log.roll.hours" can be used to set the log roll time instead of "log.roll.ms". In order to track all of this, this PR adds a table in LogConfig.scala which maps each topic configuration to an ordered list of ConfigSynonym classes. (This table is then passed to KafkaConfigSchema as a constructor argument.) Some synonyms require transformations. For example, in order to convert from "log.roll.hours" to "segment.ms", we must convert hours to milliseconds. (Note that our assumption right now is that topic configurations do not have synonyms, only broker configurations. If this changes, we will need to add some logic to handle it.) This PR makes the 8-argument constructor for ConfigEntry public. We need this in order to make full use of ConfigEntry outside of the admin namespace. This change is probably inevitable in general since otherwise we cannot easily test the output from various admin APIs in junit tests outside the admin package. Testing: This PR adds PlaintextAdminIntegrationTest#testCreateTopicsReturnsConfigs. This test validates some of the configurations that it gets back from the call to CreateTopics, rather than just checking if it got back a non-empty map like some of the existing tests. In order to test the configuration override logic, testCreateDeleteTopics now sets up some custom static and dynamic configurations. In QuorumTestHarness, we now allow tests to configure what the ID of the controller should be. This allows us to set dynamic configurations for the controller in testCreateDeleteTopics. We will have a more complete fix for setting dynamic configuations on the controller later. This PR changes ConfigurationControlManager so that it is created via a Builder. This will make it easier to add more parameters to its constructor without having to update every piece of test code that uses it. It will also make the test code easier to read. Reviewers: David Arthur <mumrah@gmail.com>	2022-04-01 10:50:25 -07:00
RivenSun	1bdd35d8d8	KAFKA-13786: Add a note in`control.plane.listener.name` doc (#11978 ) Add a note in `control.plane.listener.name` doc to mention the value can't be identical with `inter.broker.listener.name`. Reviewers: Luke Chen <showuon@gmail.com>	2022-04-01 16:23:29 +08:00
dengziming	502f2caca4	MINOR: Remove some unused codes (#11935 ) `validateChars` and `BaseEnum` are used in old version of clients. Remove them. Reviewers: Luke Chen <showuon@gmail.com>	2022-04-01 11:39:50 +08:00
Yang Yu	eefdf9d6a7	KAFKA-12875: Change Log layer segment map mutations to avoid absence of active segment (#11950 ) Reviewers: Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>	2022-03-31 10:56:07 -07:00
Yu	430f9c9901	KAFKA-13772: Partitions are not correctly re-partitioned when the fetcher thread pool is resized (#11953 ) Partitions are assigned to fetcher threads based on their hash modulo the number of fetcher threads. When we resize the fetcher thread pool, we basically re-distribute all the partitions based on the new fetcher thread pool size. The issue is that the logic that resizes the fetcher thread pool updates the `fetcherThreadMap` while iterating over it. The `Map` does not give any guarantee in this case - especially when the underlying map is re-hashed - and that led to not iterating over all the fetcher threads during the process and thus in leaving some partitions in the wrong fetcher threads. Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2022-03-31 14:45:59 +02:00
dengziming	669a49063d	MINOR: Fix an uncompatible bug in GetOffsetShell (#11936 ) In KIP-815 we replaced KafkaConsumer with AdminClient in GetOffsetShell. In the previous implementation, partitions were just ignored if there is no offset for them, however, we will print -1 instead now, This PR fix this inconsistency. Reviewers: David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>	2022-03-31 10:34:39 +08:00
Xiaobing Fang	8965240da3	MINOR: Fix doc variable typos in `TopicConfig` (#11972 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-03-31 09:54:42 +08:00
Jason Gustafson	b2cb6caa1e	MINOR: Move `KafkaYammerMetrics` to server-common (#11970 ) With major server components like the new quorum controller being moved outside of the `core` module, it is useful to have shared dependencies moved into `server-common`. An example of this is Yammer metrics which server components still rely heavily upon. All server components should have access to the default registry used by the broker so that new metrics can be registered and metric naming conventions should be standardized. This is particularly important in KRaft where we are attempting to recreate identically named metrics in the controller context. This patch takes a step in this direction. It moves `KafkaYammerMetrics` into `server-common` and it implements standard metric naming utilities there. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2022-03-30 13:59:22 -07:00
Edoardo Comar	bb60eb86e1	MINOR: Increase wait in ZooKeeperClientTest (#11973 ) Increase wait in ZooKeeperClientTest.testReinitializeAfterAuthFailure so that the testcase of https://github.com/apache/kafka/pull/11563 actually fails without the corresponding source code fix. Followup of https://issues.apache.org/jira/browse/KAFKA-13461. Co-Authored-By: Gantigmaa Selenge <gantigmaa.selenge1@uk.ibm.com> Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2022-03-30 17:09:19 +01:00
David Jacot	f2aa0c439c	MINOR: Disable SocketServerTest.closingChannelWithBufferedReceives and SocketServerTest.remoteCloseWithoutBufferedReceives (#11960 ) This reverts commit `d706d6cac4`. Reviewers: Bruno Cadonna <cadonna@apache.org>	2022-03-29 14:31:12 +02:00
bozhao12	db2485cb59	KAFKA-13767; Fetch from consumers should return immediately when preferred read replica is defined by the leader (#11942 ) When a replica selector is configured, the partition leader computes a preferred read replica for any fetch from the consumers. When the preferred read replica is not the leader, the leader returns the preferred read replica with `FetchDataInfo(LogOffsetMetadata.UnknownOffsetMetadata, MemoryRecords.EMPTY)` to the `ReplicaManager`. This causes the fetch to go into in the fetch purgatory because the exit conditions are not met. In turns out that the delayed fetch is not completed until the timeout is reached because the delayed fetch ignores partition with an unknown offset (-1). If the fetch contains only one partition, the fetch is unnecessarily delayed by the timeout time (500ms by default) to only inform the consumer that it has to read from a follower. This patch fixes the issue by completing the fetch request immediately when a preferred read replica is defined. Reviewers: David Jacot <djacot@confluent.io>	2022-03-29 10:13:05 +02:00
Sanjana Kaundinya	2a270591f8	MINOR: Improved display names for parameterized KRaft and ZK tests (#11957 ) This patch adds display names for KRaft and ZK tests. Without this, it becomes hard to understand in Jenkins test reports which test failed. With this addition, it becomes more clear which method in the test suite fails. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-03-28 16:20:21 -07:00
David Jacot	d706d6cac4	MINOR: Renable SocketServerTest.closingChannelWithBufferedReceives and SocketServerTest.remoteCloseWithoutBufferedReceives (#11927 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-25 10:18:15 -07:00
Luke Chen	0461382adb	KAFKA-4801: don't verify assignment during broker up and down in testConsumptionWithBrokerFailures (#11949 ) In this test, we have another thread to let broker down and up, to test if consumer can still work as expected. During the broker down and up, we tried to verify the assignment is as what we expected. But the rebalance will keep triggering while broker down and up. It doesn't make sense to verify the assignment here. Remove it to make the test reliable. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-25 10:16:50 -07:00
David Jacot	12bb23157c	MINOR: A few cleanups in BrokerToControllerChannelManager (#11937 ) Make the code style more consistent Reviewers: Luke Chen <showuon@gmail.com>	2022-03-24 15:49:25 +08:00
Liam Clarke-Hutchinson	e8f09007e4	KAFKA-13672: Race condition in DynamicBrokerConfig (#11920 ) Reviewers: David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>	2022-03-24 11:54:05 +08:00
dengziming	c9c03dd7ef	MINOR: Remove scala KafkaException (#11913 ) Use the standard org.apache.kafka.common.KafkaException instead of kafka.common.KafkaException. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@confluent.io>	2022-03-21 14:56:25 -07:00
dengziming	d449f850e1	MINOR: show LogRecoveryState in MetadataShell and fix log message Show the LeaderRecoveryState in MetadataShell. Fix a case where we were comparing a Byte type with an enum type. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-03-21 14:33:51 -07:00
David Jacot	72558da976	MINOR: Small cleanups in the AclAuthorizer (#11921 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2022-03-21 11:23:31 +01:00
Luke Chen	3a8f6b17a6	KAFKA-7540: commit offset sync before close (#11898 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-21 16:51:21 +08:00
José Armando García Sancio	8d6968e832	KAFKA-13682; KRaft Controller auto preferred leader election (#11893 ) Implement auto leader rebalance for KRaft by keeping track of the set of topic partitions which have a leader that is not the preferred replica. If this set is non-empty then schedule a leader balance event for the replica control manager. When applying PartitionRecords and PartitionChangeRecords to the ReplicationControlManager, if the elected leader is not the preferred replica then remember this topic partition in the set of imbalancedPartitions. Anytime the quorum controller processes a ControllerWriteEvent it schedules a rebalance operation if the there are no pending rebalance operations, the feature is enabled and there are imbalance partitions. This KRaft implementation only supports the configurations properties auto.leader.rebalance.enable and leader.imbalance.check.interval.seconds. The configuration property leader.imbalance.per.broker.percentage is not supported and ignored. Reviewers: Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>	2022-03-18 14:30:52 -07:00
José Armando García Sancio	52621613fd	KAFKA-13587; Implement leader recovery for KIP-704 (#11733 ) Implementation of the protocol for starting and stopping leader recovery after an unclean leader election. This includes the management of state in the controllers (legacy and KRaft) and propagating this information to the brokers. This change doesn't implement log recovery after an unclean leader election. Protocol Changes ================ For the topic partition state znode, the new field "leader_recovery_state" was added. If the field is missing the value is assumed to be RECOVERED. ALTER_PARTITION was renamed from ALTER_ISR. The CurrentIsrVersion field was renamed to PartitionEpoch. The new field LeaderRecoveryState was added. The new field LeaderRecoverState was added to the LEADER_AND_ISR request. The inter broker protocol version is used to determine which version to send to the brokers. A new tagged field for LeaderRecoveryState was added to both the PartitionRecord and PartitionChangeRecord. Controller ========== For both the KRaft and legacy controller the LeaderRecoveryState is set to RECOVERING, if the leader was elected out of the ISR, also known as unclean leader election. The controller sets the state back to RECOVERED after receiving an ALTER_PARTITION request with version 0, or with version 1 and with the LeaderRecoveryState set to RECOVERED. Both controllers preserve the leader recovery state even if the unclean leader goes offline and comes back online before an RECOVERED ALTER_PARTITION is sent. The controllers reply with INVALID_REQUEST if the ALTER_PARTITION either: 1. Attempts to increase the ISR while the partition is still RECOVERING 2. Attempts to change the leader recovery state to RECOVERING from a RECOVERED state. Topic Partition Leader ====================== The topic partition leader doesn't implement any log recovery in this change. The topic partition leader immediately marks the partition as RECOVERED and sends that state in the next ALTER_PARTITION request. Reviewers: Jason Gustafson <jason@confluent.io>	2022-03-18 09:24:11 -07:00
dengziming	5cebe12a66	KAFKA-13509; Support max timestamp in GetOffsetShell (KIP-815) (#11173 ) This patch implements KIP-815 as described here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-815%3A++Support+max-timestamp+in+GetOffsetShell. Reviewers: Luke Chen <showuon@gmail.com>, Justine Olshan <jolshan@confluent.io>, David Jacot <djacot@confluent.io>	2022-03-17 17:53:37 +01:00
Jason Gustafson	76d287c967	KAFKA-13727; Preserve txn markers after partial segment cleaning (#11891 ) It is possible to clean a segment partially if the offset map is filled before reaching the end of the segment. The highest offset that is reached becomes the new dirty offset after the cleaning completes. The data above this offset is nevertheless copied over to the new partially cleaned segment. Hence we need to ensure that the transaction index reflects aborted transactions from both the cleaned and uncleaned portion of the segment. Prior to this patch, this was not the case. We only collected the aborted transactions from the cleaned portion, which means that the reconstructed index could be incomplete. This can cause the aborted data to become effectively committed. It can also cause the deletion of the abort marker before the corresponding data has been removed (i.e. the aborted transaction becomes hanging). Reviewers: Jun Rao <junrao@gmail.com>	2022-03-15 12:26:23 -07:00
wangyap	e8a762eee4	MINOR: set batch-size option into batch.size config in consoleProducer (#11855 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-03-15 19:40:11 +08:00
Guozhang Wang	cad4985a0a	MINOR: Disable those flaky tests (#11895 ) I collected a list of the most flaky tests observed lately, checked / created their corresponding tickets, and mark them as ignored for now. Many of these failures are: 0. Failing very frequently in the past (at least in my observations). 1. not investigated for some time. 2. have a PR for review (mostly thanks to @showuon !), but not reviewed for some time. Since 0), these tests failures are hindering our development; and from 1/2) above, people are either too busy to look after them, or honestly the tests are not considered as providing values since otherwise people should care enough to panic and try to resolve. So I think it's reasonable to disable all these tests for now. If we later learned our lesson a hard way, it would motivate us to tackle flaky tests more diligently as well. I'm only disabling those tests that have been failed for a while, and if for such time no one have been looking into them, I'm concerned that just gossiping around about those flakiness would not bring people's attention to them either. So my psychological motivation is that "if people do not care about those failed tests for weeks (which, is not a good thing! :P), let's teach ourselves the lesson a hard way when it indeed buries a bug that bites us, or not learn the lesson at all --- that indicates those tests are indeed not valuable". For tests that I only very recently saw I did not disable them. Reviewers: John Roesler <vvcephei@apache.org>, Matthias J. Sax <mjsax@apache.org>, Luke Chen <showuon@gmail.com>, Randall Hauch <rhauch@gmail.com>	2022-03-14 21:32:28 -07:00
xuexiaoyue	f025a93c7c	MINOR: Fix comments in TransactionsTest (#11880 ) Reviewer: Luke Chen <showuon@gmail.com>	2022-03-11 15:42:44 +08:00
Vincent Jiang	798275f254	KAFKA-13717: skip coordinator lookup in commitOffsetsAsync if offsets is empty (#11864 ) Reviewer: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2022-03-10 10:52:05 +08:00
David Jacot	69926b5193	MINOR: Clean up AlterIsrManager code (#11832 ) Reviewers: Justine Olshan <jolshan@confluent.io>, Jason Gustafson <jason@confluent.io>	2022-03-09 07:31:07 +01:00
Vincent Jiang	b27000ec6a	MINOR: Fix flaky test cases SocketServerTest.remoteCloseWithoutBufferedReceives and SocketServerTest.remoteCloseWithIncompleteBufferedReceive (#11861 ) When a socket is closed, corresponding channel should be retained only if there is complete buffered requests. Reviewers: David Jacot <djacot@confluent.io>	2022-03-08 19:03:11 +01:00
Luke Chen	1848f049e1	KAFKA-13710: bring the InvalidTimestampException back for record error (#11853 ) Reviewers: Guozhang Wang <guozhang@confluent.io>, Ricardo Brasil <anribrasil@gmail.com>	2022-03-08 14:28:16 +08:00
RivenSun	3be978464c	KAFKA-13694: Log more specific information when the verification record fails on brokers. (#11830 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-04 10:45:44 -08:00
wangyap	ae76b9d45a	KAFKA-13466: delete unused config batch.size in kafka-console-producer.sh (#11517 ) delete unused config batch.size in kafka-console-producer.sh Reviewer: Andrew Eugene Choi <andrew.choi@uwaterloo.ca>, Luke Chen <showuon@gmail.com>,	2022-03-04 09:47:23 +08:00
Colin Patrick McCabe	07553d13f7	MINOR: create KafkaConfigSchema and TimelineObject (#11809 ) Create KafkaConfigSchema to encapsulate the concept of determining the types of configuration keys. This is useful in the controller because we can't import KafkaConfig, which is part of core. Also introduce the TimelineObject class, which is a more generic version of TimelineInteger / TimelineLong. Reviewers: David Arthur <mumrah@gmail.com>	2022-03-02 14:26:31 -08:00
Kowshik Prakasam	67e99a4236	MINOR: Ensure LocalLog.flush is thread safe to recoveryPoint changes (#11814 ) Issue: Imagine a scenario where two threads T1 and T2 are inside UnifiedLog.flush() concurrently: KafkaScheduler thread T1 -> The periodic work calls LogManager.flushDirtyLogs() which in turn calls UnifiedLog.flush(). For example, this can happen due to log.flush.scheduler.interval.ms here. KafkaScheduler thread T2 -> A UnifiedLog.flush() call is triggered asynchronously during segment roll here. Supposing if thread T1 advances the recovery point beyond the flush offset of thread T2, then this could trip the check within LogSegments.values() here for thread T2, when it is called from LocalLog.flush() here. The exception causes the KafkaScheduler thread to die, which is not desirable. Fix: We fix this by ensuring that LocalLog.flush() is immune to the case where the recoveryPoint advances beyond the flush offset. Reviewers: Jun Rao <junrao@gmail.com>	2022-03-01 10:55:17 -08:00
Jason Gustafson	5f91aa7b4c	KAFKA-13698; KRaft authorizer should use host address instead of name (#11807 ) Use `InetAddress.getHostAddress` in `StandardAuthorizer` instead of `InetAddress.getHostName`. Reviewers: Colin Patrick McCabe <cmccabe@confluent.io>	2022-02-26 10:52:34 -08:00
Jason Gustafson	2c90447a59	KAFKA-13697; KRaft authorizer should support AclOperation.ALL (#11806 ) KRaft authorizer should support AclOperation.ALL correctly. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-02-25 15:43:21 -08:00
Jason Gustafson	711b603ddc	MINOR: Cleanup admin creation logic in integration tests (#11790 ) There seemed to be a little sloppiness in the integration tests in regard to admin client creation. Not only was there duplicated logic, but it wasn't always clear which listener the admin client was targeting. This made it difficult to tell in the context of authorization tests whether we were indeed testing with the right principal. As an example, we had a method in TestUtils which was using the inter-broker listener implicitly. This meant that the test was using the broker principal which had super user privilege. This was intentional, but I think it would be clearer to make the dependence on this listener explicit. This patch attempts to clean this up a bit by consolidating some of the admin creation logic and making the reliance on the listener clearer. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>	2022-02-24 07:37:28 -08:00
dengziming	a4b1e50f08	MINOR; Remove unused AdminZkClient in MetadataSupport (#11785 ) Remove unused AdminZkClient in MetadataSupport Reviewers: Luke Chen <showuon@gmail.com>	2022-02-18 14:31:44 +08:00
Wenjun Ruan	81e709c4e2	MINOR: Remove unused params in `ZkConfigManager` (#11763 ) Remove `changeExpirationMs` and `time` in `ZkConfigManager`, since these two parameters are not used. Reviewers: Jason Gustafson <jason@confluent.io>	2022-02-16 10:12:07 -08:00
Jason Gustafson	b765a2b44e	MINOR: Remove redundant forwarding integration tests (#11766 ) There are a few integration tests for the forwarding logic which were added prior to kraft being ready for integration testing. Now that we have enabled kraft in integration tests, these tests are redundant and can be removed. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>	2022-02-15 18:28:34 -08:00
Wenjun Ruan	77cb8e0a5a	MINOR: Remove repeat creation of `ZkConfigRepository` (#11762 ) In `KafkaServer, `ZkConfigRepository` is just a wrapper of `zkClient`, so we don't need to create a new one. Reviewers: Jason Gustafson <jason@confluent.io>	2022-02-15 13:09:14 -08:00
Luke Chen	71cbff62b6	MINOR: Fix and clarify kraft README and example configuration files (#11616 ) Reviewers: Ron Dagostino <rdagostino@confluent.io>, Jason Gustafson <jason@confluent.io>	2022-02-15 10:27:38 -08:00
dengziming	dd36331a81	MINOR: Enable kraft in ApiVersionTest (#11667 ) This patch enables `ApiVersionsTest` to test both kraft brokers and controllers. It fixes a minor bug in which the `Envelope` request to be exposed from `ApiVersions` requests to the kraft broker. Reviewers: Jason Gustafson <jason@confluent.io>	2022-02-15 10:16:03 -08:00
David Jacot	c8fbe26f3b	KAFKA-13435; Static membership protocol should let the leader skip assignment (KIP-814) (#11688 ) This patch implements KIP-814 as described here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-814%3A+Static+membership+protocol+should+let+the+leader+skip+assignment. Reviewers: Luke Chen <showuon@gmail.com>, Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>	2022-02-14 11:55:38 +01:00
Jason Gustafson	fc20c551d6	MINOR: Clearer field names for ProducerIdsRecord and related classes (#11747 ) The current naming of the fields in `ProducerIdsRecord` is a little confusing in regard to whether the block range was inclusive or exclusive. This patch tries to improve naming to make this clearer. In the record class, instead of `ProducerIdsEnd`, we use `NextProducerId`. We have also updated related classes such as `ProducerIdsBlock.java` with similar changes. Reviewers: dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>	2022-02-11 16:14:31 -08:00
Jason Gustafson	e43916c148	KAFKA-13661; Consistent permissions in KRaft for CreatePartitions API (#11745 ) In #11649, we fixed one permission inconsistency between kraft and zk authorization for the `CreatePartitions` request. Previously kraft was requiring `CREATE` permission on the `Topic` resource when it should have required `ALTER`. A second inconsistency is that kraft was also allowing `CREATE` on the `Cluster` resource, which is not supported in zk clusters and was not documented in KIP-195: https://cwiki.apache.org/confluence/display/KAFKA/KIP-195%3A+AdminClient.createPartitions. This patch fixes this inconsistency and adds additional test coverage for both cases. Reviewers: José Armando García Sancio <jsancio@gmail.com>	2022-02-11 15:01:08 -08:00
Mickael Maison	0269edfc80	KAFKA-13577: Replace easymock with mockito in kafka:core - part 3 (#11674 ) Reviewers: Tom Bentley <tbentley@redhat.com>	2022-02-11 16:16:25 +01:00
dengziming	590df2c8be	KAFKA-13316; Enable KRaft mode in CreateTopics tests (#11655 ) This PR follows #11629 to enable `CreateTopicsRequestWithForwardingTest` and `CreateTopicsRequestWithPolicyTest` in KRaft mode. Reviewers: Jason Gustafson <jason@confluent.io>	2022-02-10 14:10:23 -08:00
Alexandre Garnier	4da515da94	MINOR: Fix storage meta properties comparison (#11546 ) This patch adds missing `equals` and `hashCode` implements for `RawMetaProperties`. This is relied on by the storage tool for detecting when two log directories have different `meta.properties` files. Reproduce current issue: ```shell $ sed -i 's\|log.dirs=/tmp/kraft-combined-logs\|+log.dirs=/tmp/kraft-combined-logs,/tmp/kraft-combined-logs2' ./config/kraft/server.properties $ ./bin/kafka-storage.sh format -t R19xNyxMQvqQRGlkGDi2cg -c ./config/kraft/server.properties Formatting /tmp/kraft-combined-logs Formatting /tmp/kraft-combined-logs2 $ ./bin/kafka-storage.sh info -c ./config/kraft/server.properties Found log directories: /tmp/kraft-combined-logs /tmp/kraft-combined-logs2 Found metadata: {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1} Found problem: Metadata for /tmp/kraft-combined-logs2/meta.properties was {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1}, but other directories featured {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1} ``` It's reporting that same metadata are not the same... With this fix: ```shell $ ./bin/kafka-storage.sh info -c ./config/kraft/server.properties Found log directories: /tmp/kraft-combined-logs /tmp/kraft-combined-logs2 Found metadata: {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1} ``` Reviewers: Igor Soarez <soarez@apple.com>, Jason Gustafson <jason@confluent.io>	2022-02-10 09:55:14 -08:00
prince-mahajan	78a3789496	KAFKA-13636: Fix for the group coordinator issue where the offsets are deleted for unstable groups (#11742 ) This patch ensures that the committed offsets are not expired while the group is rebalancing. The issue is that we can't rely on the subscribed topics if the group is not stable. Reviewers: David Jacot <djacot@confluent.io>	2022-02-10 17:21:17 +01:00
Joseph (Ting-Chou) Lin	2e25ca1355	MINOR: Fix JavaDoc of OffsetIndex#append (#11744 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-02-10 11:27:34 +01:00
Mickael Maison	57235a0cb4	KAFKA-13577: Replace easymock with mockito in kafka:core - part 2 (#11673 ) Reviewers: Tom Bentley <tbentley@redhat.com>	2022-02-10 09:15:58 +01:00
Colin Patrick McCabe	d35283f011	KAFKA-13646; Implement KIP-801: KRaft authorizer (#11649 ) Currently, when using KRaft mode, users still have to have an Apache ZooKeeper instance if they want to use AclAuthorizer. We should have a built-in Authorizer for KRaft mode that does not depend on ZooKeeper. This PR introduces such an authorizer, called StandardAuthorizer. See KIP-801 for a full description of the new Authorizer design. Authorizer.java: add aclCount API as described in KIP-801. StandardAuthorizer is currently the only authorizer that implements it, but eventually we may implement it for AclAuthorizer and others as well. ControllerApis.scala: fix a bug where createPartitions was authorized using CREATE on the topic resource rather than ALTER on the topic resource as it should have been. QuorumTestHarness: rename the controller endpoint to CONTROLLER for consistency (the brokers already called it that). This is relevant in AuthorizerIntegrationTest where we are examining endpoint names. Also add the controllerServers call. TestUtils.scala: adapt the ACL functions to be usable from KRaft, by ensuring that they use the Authorizer from the current active controller. BrokerMetadataPublisher.scala: add broker-side ACL application logic. Controller.java: add ACL APIs. Also add a findAllTopicIds API in order to make junit tests that use KafkaServerTestHarness#getTopicNames and KafkaServerTestHarness#getTopicIds work smoothly. AuthorizerIntegrationTest.scala: convert over testAuthorizationWithTopicExisting (more to come soon) QuorumController.java: add logic for replaying ACL-based records. This means storing them in the new AclControlManager object, and integrating them into controller snapshots. It also means applying the changes in the Authorizer, if one is configured. In renounce, when reverting to a snapshot, also set newBytesSinceLastSnapshot to 0. Reviewers: YeonCheol Jang <YeonCheolGit@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>	2022-02-09 10:38:52 -08:00
Mickael Maison	150fecf6d3	KAFKA-13577: Replace easymock with mockito in kafka:core - part 1 (#11672 ) Reviewers: Tom Bentley <tbentley@redhat.com>	2022-02-09 17:02:27 +01:00
RivenSun	4b468a9d81	KAFKA-13310 : KafkaConsumer cannot jump out of the poll method, and the… (#11340 ) Title: KafkaConsumer cannot jump out of the poll method, and cpu and traffic on the broker side increase sharply description: The local test has been passed, the problem described by jira can be solved JIRA link : https://issues.apache.org/jira/browse/KAFKA-13310 Reviewers: Luke Chen <showuon@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	2022-02-08 23:05:42 -08:00
Ismael Juma	7c2d672413	MINOR: Update library dependencies (Q1 2022) (#11306 ) - scala 2.13: 2.13.6 -> 2.13.8 * Support Java 18 and improve Android compatibility * https://www.scala-lang.org/news/2.13.7 * https://www.scala-lang.org/news/2.13.8 - scala 2.12: 2.12.14 -> 2.12.15. * The `-release` flag now works with Scala 2.12, backend parallelism can be enabled via `-Ybackend-parallelism N` and string interpolation is more efficient. * https://www.scala-lang.org/news/2.12.5 - gradle versions plugin: 0.38.0 -> 0.42.0 * Minor fixes * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.40.0 * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.41.0 * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.42.0 - gradle dependency check plugin: 6.1.6 -> 6.5.3 * Minor fixes - gradle spotbugs plugin: 4.7.1 -> 5.0.5 * Fixes and minor improvements * There were too many releases to include all the links, include the major version bump * https://github.com/spotbugs/spotbugs-gradle-plugin/releases/tag/5.0.0 - gradle scoverage plugin: 5.0.0 -> 7.0.0 * Support newer Gradle versions and other improvements * https://github.com/scoverage/gradle-scoverage/releases/tag/6.0.0 * https://github.com/scoverage/gradle-scoverage/releases/tag/6.1.0 * https://github.com/scoverage/gradle-scoverage/releases/tag/7.0.0 - gradle shadow plugin: 7.0.0 -> 7.1.2 * Support gradle toolchains and security fixes * https://github.com/johnrengelman/shadow/releases/tag/7.1.0 * https://github.com/johnrengelman/shadow/releases/tag/7.1.1 * https://github.com/johnrengelman/shadow/releases/tag/7.1.2 - bcpkix: 1.66 -> 1.70 * Several improvements and fixes * https://www.bouncycastle.org/releasenotes.html - jline: 3.12.1 -> 3.21.0 * Various fixes and improvements - jmh: 1.32 -> 1.34 * Compiler blackhole enabled by default when using Java 17 and improved gradle incremental compilation * https://mail.openjdk.java.net/pipermail/jmh-dev/2021-August/003355.html * https://mail.openjdk.java.net/pipermail/jmh-dev/2021-December/003406.html - scalaLogging: 3.9.3 -> 3.9.4 * Support for Scala 3.0 - jose4j: 0.7.8 -> 0.7.9 * Minor fixes - junit: 5.7.1 -> 5.8.2 * Minor improvements and fixes * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.0 * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.1 * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.2 - jqwik: 1.5.0 -> 1.6.3 * Numerous improvements * https://github.com/jlink/jqwik/releases/tag/1.6.0 - mavenArtifact: 3.8.1 -> 3.8.4 - mockito: 3.12.4 -> 4.3.1 * Removed deprecated methods, `DoNotMock` annotation and minor fixes/improvements * https://github.com/mockito/mockito/releases/tag/v4.0.0 * https://github.com/mockito/mockito/releases/tag/v4.1.0 * https://github.com/mockito/mockito/releases/tag/v4.2.0 * https://github.com/mockito/mockito/releases/tag/v4.3.0 - scalaCollectionCompat: 2.4.4 -> 2.6.0 * Minor fixes * https://github.com/scala/scala-collection-compat/releases/tag/v2.5.0 * https://github.com/scala/scala-collection-compat/releases/tag/v2.6.0 - scalaJava8Compat: 1.0.0 -> 1.0.2 * Minor changes - scoverage: 1.4.1 -> 1.4.11 * Support for newer Scala versions - slf4j: 1.7.30 -> 1.7.32 * Minor fixes, 1.7.35 automatically uses reload4j and 1.7.33/1.7.34 cause build failures, so we stick with 1.7.32 for now. - zstd: 1.5.0-4 -> 1.5.2-1 * zstd 1.5.2 * Small refinements and performance improvements * https://github.com/facebook/zstd/releases/tag/v1.5.1 * https://github.com/facebook/zstd/releases/tag/v1.5.2 Checkstyle, spotBugs and spotless will be upgraded separately as they either require non trivial code changes or they have regressions that affect us. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2022-02-07 15:24:50 -08:00
Luke Chen	ca5d6f9229	KAFKA-13563: clear FindCoordinatorFuture for non consumer group mode (#11631 ) After KAFKA-10793, we clear the findCoordinatorFuture in 2 places: 1. heartbeat thread 2. AbstractCoordinator#ensureCoordinatorReady But in non consumer group mode with group id provided (for offset commitment. So that there will be consumerCoordinator created), there will be no (1)heartbeat thread , and it only call (2)AbstractCoordinator#ensureCoordinatorReady when 1st time consumer wants to fetch committed offset position. That is, after 2nd lookupCoordinator call, we have no chance to clear the findCoordinatorFuture , and causes the offset commit never succeeded. To avoid the race condition as KAFKA-10793 mentioned, it's not safe to clear the findCoordinatorFuture in the future listener. So, I think we can fix this issue by calling AbstractCoordinator#ensureCoordinatorReady when coordinator unknown in non consumer group case, under each ConsumerCoordinator#poll. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-02-06 15:07:59 -08:00
Chia-Ping Tsai	f49524e4c3	MINOR: disable zookeeper.sasl.client to avoid false error (#11469 ) Reviewers: Mickael Maison <mimaison@users.noreply.github.com>	2022-02-06 03:34:22 +08:00
Jason Gustafson	ba0fe610ed	MINOR: Do not use optional args in `ProducerStateManager` (#11734 ) We allowed `maxProducerIdExpirationMs` and `time` to be optional in the `ProducerStateManager` constructor. We generally frown on optional arguments since it is too easy to overlook them. In this case, it was especially dangerous because the recently added `maxTransactionTimeoutMs` argument used the same type as `maxProducerIdExpirationMs`. Reviewers: David Jacot <david.jacot@gmail.com, Ismael Juma <ismael@juma.me.uk>	2022-02-05 11:00:17 -08:00
Luke Chen	e6db0ca48c	KAFKA-13598: enable idempotence producer by default and validate the configs (#11691 ) In v3.0, we changed the default value for `enable.idempotence` to true, but we didn't adjust the validator and the `idempotence` enabled check method. So if a user didn't explicitly enable idempotence, this feature won't be turned on. This patch addresses the problem, cleans up associated logic, and fixes tests that broke as a result of properly applying the new default. Specifically it does the following: 1. fix the `ProducerConfig#idempotenceEnabled` method, to make it correctly detect if `idempotence` is enabled or not 2. remove some unnecessary config overridden and checks due to we already default `acks`, `retries` and `enable.idempotence` configs. 3. move the config validator for the idempotent producer from `KafkaProducer` into `ProducerConfig`. The config validation should be the responsibility of `ProducerConfig` class. 4. add an `AbstractConfig#hasKeyInOriginals` method, to avoid `originals` configs get copied and only want to check the existence of the key. 5. fix many broken tests. As mentioned, we didn't actually enable idempotence in v3.0. After this PR, there are some tests broken due to some different behavior between idempotent and non-idempotent producer. 6. add additional tests to validate configuration behavior Reviewers: Kirk True <kirk@mustardgrain.com>, Ismael Juma <ismael@juma.me.uk>, Mickael Maison <mimaison@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>	2022-02-05 10:53:27 -08:00
Matthew Wong	17dcb8097c	MINOR: Update documentation and DumpLogSegments tool for addition of `deleteHorizonMs` in batch format (#11694 ) This PR updates the documentation and tooling to match https://github.com/apache/kafka/pull/10914, which added support for encoding `deleteHorizonMs` in the record batch schema. The changes include adding the new attribute and updating field names. We have also updated stale references to the old `FirstTimestamp` field in the code and comments. Finally, In the `DumpLogSegments` tool, when record batch information is printed, it will also include the value of `deleteHorizonMs` is (e.g. `OptionalLong.empty` or `OptionalLong[123456]`). Reviewers: Vincent Jiang <84371940+vincent81jiang@users.noreply.github.com>, Kvicii <42023367+Kvicii@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>	2022-02-04 16:20:37 -08:00
Colin Patrick McCabe	d4fb388583	MINOR: fix control plane listener + kraft error message (#11729 ) The current error message suggests that controller.listener.names is a replacement for control.plane.listener.name. This is incorrect since these configurations have very different functions. This PR deletes the incorrect message. Reviewers: David Jacot <david.jacot@gmail.com>, Kvicii	2022-02-03 10:08:00 -08:00
Kvicii	21c3009ac1	KAFKA-13583; Fix FetchRequestBetweenDifferentIbpTest flaky tests (#11699 ) Co-authored-by: Kvicii <Karonazaba@gmail.com> Reviewers: David Jacot <djacot@confluent.io>	2022-02-03 10:59:12 +01:00

1 2 3 4 5 ...

3933 Commits