Commit Graph

10332 Commits

Author SHA1 Message Date
bozhao12 75223b668d
MINOR: A fewer method javadoc and typo fix (#12253)
Fixes an unneeded parameter doc in `MemoryRecordsBuilder` and a typo in `LazyDownConversionRecordsSend`.

Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>

Co-authored-by: zhaobo <zhaobo@kuaishou.com>
2022-06-06 12:25:05 -07:00
Guozhang Wang 5d593287c7 HOTFIX: add space to avoid checkstyle failure 2022-06-06 11:34:59 -07:00
Divij Vaidya 601051354b
MINOR: Correctly mark some tests as integration tests (#12223)
Also fix package name of `ListOffsetsIntegrationTest`.

Reviewers: dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-06-06 11:18:24 -07:00
Guozhang Wang 2047fc3715
HOTFIX: only try to clear discover-coordinator future upon commit (#12244)
This is another way of fixing KAFKA-13563 other than #11631.

Instead of letting the consumer to always try to discover coordinator in pool with either mode (subscribe / assign), we defer the clearance of discover future upon committing async only. More specifically, under manual assign mode, there are only three places where we need the coordinator:

* commitAsync (both by the consumer itself or triggered by caller), this is where we want to fix.
* commitSync, which we already try to re-discovery coordinator.
* committed (both by the consumer itself based on reset policy, or triggered by caller), which we already try to re-discovery coordinator.

The benefits are that for manual assign mode that does not try to trigger any of the above three, then we never would be discovering coordinator. The original fix in #11631 would let the consumer to discover coordinator even if none of the above operations are required.

Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2022-06-06 11:05:41 -07:00
Federico Valeri af71375d6d
KAFKA-13933: Fix stuck SSL unit tests in case of authentication failure (#12159)
When there is an authentication error after the initial TCP connection, the selector never becomes READY, and these tests wait forever waiting for this state.

This will happen while using an JDK like OpenJDK build that does not support the required cipher suites.

Reviewers: Luke Chen <showuon@gmail.com>,  Tom Bentley <tbentley@redhat.com>, Divij Vaidya <diviv@amazon.com>
2022-06-05 15:47:09 +08:00
Richard Joerger 6ff2bf03a9
KAFKA-13875 Adjusted the output the topic describe output to include TopicID & se… (#12170)
Reviewers: Luke Chen <showuon@gmail.com>
2022-06-05 15:13:02 +08:00
JK-Wang 04f23688e7
MINOR: Fix plugin path in quickstart.html (#12252)
Reviewers: Luke Chen <showuon@gmail.com>
2022-06-05 15:08:37 +08:00
Okada Haruki a39d447677
MINOR: fix streams tutorial (#12251)
Reviewers: Luke Chen <showuon@gmail.com>
2022-06-04 13:23:11 +08:00
dengziming 0b3ab4687e
KIP-835: metadata.max.idle.interval.ms shoud be much bigger than broker.heartbeat.interval.ms (#12238)
The active quorum controller will append NoOpRecord periodically to increase metadata LEO, however, when a broker startup, we will wait until its metadata LEO catches up with the controller LEO, we generate NoOpRecord every 500ms and send heartbeat request every 2000ms.

It's almost impossible for a broker to catch up with the controller LEO if the broker sends a query request every 2000ms but the controller LEO increases every 500ms, so the tests in KRaftClusterTest will fail.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, showuon <43372967+showuon@users.noreply.github.com>, Colin Patrick McCabe <cmccabe@confluent.io>
2022-06-03 11:39:30 -07:00
Rittika Adhikari 3467036e01
KAFKA-13803: Refactor Leader API Access (#12005)
This PR refactors the leader API access in the follower fetch path.

Added a LeaderEndPoint interface which serves all access to the leader.

Added a LocalLeaderEndPoint and a RemoteLeaderEndPoint which implements the LeaderEndPoint interface to handle fetches from leader in local & remote storage respectively.

Reviewers: David Jacot <djacot@confluent.io>, Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>
2022-06-03 09:12:06 -07:00
Aneesh Garg 47bb93cfd7
MINOR: Replace ACL_AUTHORIZER attribute with ZK_ACL_AUTHORIZER (#12247)
Replace ACL_AUTHORIZER attribute with ZK_ACL_AUTHORIZER in system tests. Required after the changes merged with https://github.com/apache/kafka/pull/12190.

Reviewers: David Jacot <djacot@confluent.io>
2022-06-03 17:50:49 +02:00
Kvicii 7e71483aed
MINOR: fix doc (#12243)
Reviewers: Luke Chen <showuon@gmail.com>
2022-06-03 15:56:13 +08:00
Bruno Cadonna 5424324722
KAFKA-13930: Add 3.2.0 to core upgrade and compatibility system tests (#12210)
Apache Kafka 3.2.0 was recently released. Now we need
to test upgrades and compatibility with 3.2 in core system tests.

Reviewer: Jason Gustafson <jason@confluent.io>
2022-06-03 09:13:10 +02:00
Richard Joerger 09f5d17e69
KAFKA-13718: kafka-topics describe topic with default config will show `segment.bytes` overridden config (#12246)
Reviewers: Luke Chen <showuon@gmail.com>
2022-06-03 10:46:05 +08:00
Bruno Cadonna 0aea498b9a
MINOR: Pin ducktape version to < 0.9 (#12242)
With newer ducktape versions than < 0.9 system tests
may run into authentication issues with the AK system test
infrastructure.

The version will be bumped up once we have infrastructure
in place for newer paramiko versions brought in by ducktape
0.9.

Reviewers: Lucas Bradstreet <lucas@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Kvicii <Karonazaba@gmail.com>
2022-06-02 20:21:23 +02:00
RivenSun d8d92f0f80
MINOR: Update the kafka-reassign-partitions script command in documentation (#12237)
Reviewers: Luke Chen <showuon@gmail.com>
2022-06-02 21:30:22 +08:00
Mickael Maison e4f4e50b7b
MINOR: Small fixes in docs/upgrade.html (#12239)
Reviewers: David Jacot <djacot@confluent.io>
2022-06-02 12:05:30 +02:00
Chris Egerton a110f1fe85
KAFKA-10000: Add new preflight connector config validation logic (#11776)
Reviewers: Mickael Maison <mickael.maison@gmail.com>

, Tom Bentley <tbentley@redhat.com>
2022-06-02 11:57:50 +02:00
Chris Egerton 61df3034fe
KAFKA-12657: Increase timeouts in Connect integration tests (#12191)
As an initial step to address the notoriously flaky BlockingConnectorTest test suite, we can try increasing test timeouts.

This approach may not be sufficient, and even if it is, it's still suboptimal. Although it may address flakiness on Jenkins, it will make genuine failures harder to detect when testing local changes. Additionally, if the workload on Jenkins continues to increase, we'll probably have to bump these timeouts in the future again at some point.

Potential next steps, for this PR and beyond:

    Stop leaking threads that block during test runs
    Instead of artificially reducing the REST request timeout at the beginning of every test, reduce it selectively right before issuing a REST request that is expected to time out, and then immediately reset it.
    Eliminate artificial reduction of the REST request timeout entirely, as it may be negatively impacting other Connect integration tests that are being run concurrently.
    Test repeatedly on Jenkins, ideally at least 50 times
    Gather information on the number of CPU cores available to each Jenkins node and the distribution of how many threads are allocated over a given time period (maybe a day?); this is especially relevant since local testing indicates that these tests all do much better when parallelism is reduced, which shouldn't be too surprising considering that each Connect integration test spins up separate threads for at least one Zookeeper node, one Kafka broker, one Connect worker, and usually at least one connector and one task.

I'd like to test these changes as a first step before investigating any of the above (except maybe items 1 and 2, which should be fairly straightforward). To trigger new runs I plan on pushing empty commits or, if those do not trigger new Jenkins runs, dummy commits. If this is objectionable let me know and hopefully we can find a suitable alternative.

Reviewers: Kvicii <Karonazaba@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-06-02 10:21:07 +02:00
Luke Chen fa33fb4d3c
KAFKA-13773: catch kafkaStorageException to avoid broker shutdown directly (#12136)
When logManager startup and loadLogs, we expect to catch any IOException (ex: out of space error) and turn the log dir into offline. Later, we'll handle the offline logDir in ReplicaManage, so that the cleanShutdown file won't be created when all logDirs are offline. The reason why the broker shutdown with cleanShutdown file after full disk is because during loadLogs and do log recovery, we'll write leader-epoch-checkpoint fil. And if any IOException thrown, we'll wrap it as KafkaStorageException and rethrow. And since we don't catch KafkaStorageException, so the exception is caught in the other place and go with clean shutdown path.

This PR is to fix the issue by catching the KafkaStorageException with IOException cause exceptions during loadLogs, and mark the logDir as offline to let the ReplicaManager handle the offline logDirs.

Reviewers: Jun Rao <jun@confluent.io>, Alok Thatikunta <alok123thatikunta@gmail.com>
2022-06-02 14:15:51 +08:00
Jason Gustafson 0f9f7e6c78
MINOR: Enable kraft support in quota integration tests (#12217)
Enable kraft support in BaseQuotaTest and its extensions.

Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>
2022-06-01 16:56:10 -07:00
Colin Patrick McCabe 65b4374203
MINOR: implement BrokerRegistrationChangeRecord (#12195)
Implement BrokerRegistrationChangeRecord as specified in KIP-746. This is a more flexible record than the
single-purpose Fence / Unfence records.

Reviewers: José Armando García Sancio <jsancio@gmail.com>, dengziming <dengziming1993@gmail.com>
2022-06-01 16:33:01 -07:00
Colin Patrick McCabe 0ca9cd4d2d
MINOR: Several fixes and improvements for FeatureControlManager (#12207)
This PR fixes a bug where FeatureControlManager#replay(FeatureLevelRecord) was throwing an
exception if not all controllers in the quorum supported the feature being applied. While we do
want to validate this, it needs to be validated earlier, before the record is committed to the log.
Once the record has been committed to the log it should always be applied if the current controller
supports it.

Fix another bug where removing a feature was not supported once it had been configured. Note that
because we reserve feature level 0 for "feature not enabled", we don't need to use
Optional<VersionRange>; we can just return a range of 0-0 when the feature is not supported.

Allow the metadata version to be downgraded when UpgradeType.UNSAFE_DOWNGRADE has been set.
Previously we were unconditionally denying this even when this was set.

Add a builder for FeatureControlManager, so that we can easily add new parameters to the
constructor in the future. This will also be useful for creating FeatureControlManagers that are
initialized to a specific MetadataVersion.

Get rid of RemoveFeatureLevelRecord, since it's easier to just issue a FeatureLevelRecord with
the level set to 0.

Set metadata.max.idle.interval.ms to 0 in RaftClusterSnapshotTest for more predictability.

Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>
2022-06-01 16:09:38 -07:00
dengziming 1d6e3d6cb3
KAFKA-13845: Add support for reading KRaft snapshots in kafka-dump-log (#12084)
The kafka-dump-log command should accept files with a suffix of ".checkpoint". It should also decode and print using JSON the snapshot header and footer control records.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-06-01 14:49:00 -07:00
José Armando García Sancio 7d1b0926fa
KAFKA-13883: Implement NoOpRecord and metadata metrics (#12183)
Implement NoOpRecord as described in KIP-835. This is controlled by the new
metadata.max.idle.interval.ms configuration.

The KRaft controller schedules an event to write NoOpRecord to the metadata log if the metadata
version supports this feature. This event is scheduled at the interval defined in
metadata.max.idle.interval.ms. Brokers and controllers were improved to ignore the NoOpRecord when
replaying the metadata log.

This PR also addsffour new metrics to the KafkaController metric group, as described KIP-835.

Finally, there are some small fixes to leader recovery. This PR fixes a bug where metadata version
3.3-IV1 was not marked as changing the metadata. It also changes the ReplicaControlManager to
accept a metadata version supplier to determine if the leader recovery state is supported.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-06-01 10:48:24 -07:00
David Jacot 283a9955cf
MINOR: inline metrics in RecordAccumulator (#12227)
Reviewers: Kvicii <Karonazaba@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-06-01 09:48:31 +02:00
Colin Patrick McCabe 4c9eeef5b2
MINOR: add timeouts to streams integration tests (#12216)
Reviewers: David Arthur <mumrah@gmail.com>
2022-05-31 14:22:13 -07:00
Amir M. Saeid 8a05884343
MINOR: Fix typo in processor api developer guide (#12203)
The reference to `changlogConfig` should be `changelogConfig`.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-30 15:57:46 -07:00
Clara Fang 31a84dd72e
KAFKA-13946; Add missing parameter to kraft test kit `ControllerNode.setMetadataDirectory()` (#12225)
Added parameter `metadataDirectory` to `setMetadataDirectory()` so that `this.metadataDirectory` would not be set to itself.

Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-05-30 15:55:07 -07:00
Jason Gustafson 645c1ba526
MINOR: Fix buildResponseSend test cases for envelope responses (#12185)
The test cases we have in `RequestChannelTest` for `buildResponseSend` construct the envelope request incorrectly. The request is created using the envelope context, but also a reference to the wrapped envelope request object. This patch fixes `TestUtils.buildEnvelopeRequest` so that the wrapped request is built properly. It also fixes the dependence on this incorrect construction and consolidates the tests in `RequestChannelTest` to avoid duplication.

Reviewers: dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>
2022-05-30 11:34:36 -07:00
Jason Gustafson f980820e2b
MINOR: Send kraft raft/controller logs to controller log in systests (#12222)
Currently the only place we see controller/raft logging in system tests is `server-start-stdout-stderr.log` where they are mixed with all other logs. It is more convenient to send them to `controller.log` as we do for zk tests.

Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, David Jacot <djacot@confluent.io>
2022-05-30 09:21:41 -07:00
David Jacot 6b93652a54
MINOR: Improve code style in FenceProducersHandler (#12208)
Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-28 09:15:22 -07:00
bozhao12 620ada9888
MINOR: Fix typo in ClusterTestExtensionsTest (#12218)
Reviewers: Kvicii <Karonazaba@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-05-27 13:38:54 -07:00
Jason Gustafson 24261b93bc
KAFKA-13941; Reenable ARM in Jenkinsfile (#12221)
This patch reenables the ARM build in Jenkins since https://issues.apache.org/jira/browse/INFRA-23305 has been resolved.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2022-05-27 13:22:07 -07:00
Colin Patrick McCabe 7143267f71
MINOR: Fix some bugs with UNREGISTER_BROKER
Fix some bugs in the KRaft unregisterBroker API and add a junit test.

1. kafka-cluster-tool.sh unregister should fail if no broker ID is passed.

2. UnregisterBrokerRequest must be marked as a KRaft broker API so 
that KRaft brokers can receive it.

3. KafkaApis.scala must forward UNREGISTER_BROKER to the controller.

Reviewers: Jason Gustafson <jason@confluent.io>, dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>
2022-05-26 14:07:29 -07:00
David Arthur 4efdc1a310
MINOR: Consolidate FinalizedFeatureCache into MetadataCache (#12214)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-05-26 16:25:58 -04:00
David Arthur 8fe73648e3
MINOR: Disable ARM builds (#12220)
Also add an additional stage to ARM and PowerPC builds which will fail-fast if the agent is not available.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-26 16:09:39 -04:00
Jason Gustafson 43160bc476
MINOR: Add timeout to LogOffsetTest (#12213)
Reviewers: Kvicii <Karonazaba@gmail.com>, David Arthur <mumrah@gmail.com>
2022-05-26 16:07:54 -04:00
Jason Gustafson 02fc6e7d3c
MINOR: Collect metadata log dir in kraft system tests (#12215)
It is useful to collect the directory for `__cluster_metadata` in system tests. We use a separate directory from user partitions, so it must be configured separately. 

Reviewers: David Arthur <mumrah@gmail.com>
2022-05-25 17:36:58 -07:00
dengziming c22d320a5c
KAFKA-12902: Add unit32 type in generator (#10830)
Add uint32 support in the KRPC generator.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-05-25 16:25:16 -07:00
David Jacot 76477ffd2d
KAFKA-13858; Kraft should not shutdown metadata listener until controller shutdown is finished (#12187)
When the kraft broker begins controlled shutdown, it immediately disables the metadata listener. This means that metadata changes as part of the controlled shutdown do not get sent to the respective components. For partitions that the broker is follower of, that is what we want. It prevents the follower from being able to rejoin the ISR while still shutting down. But for partitions that the broker is leading, it means the leader will remain active until controlled shutdown finishes and the socket server is stopped. That delay can be as much as 5 seconds and probably even worse.

This PR revises the controlled shutdown procedure as follow:
* The broker signals to the replica manager that it is about to start the controlled shutdown.
* The broker requests a controlled shutdown to the controller.
* The controller moves leaders off from the broker, removes the broker from any ISR that it is a member of, and writes those changes to the metadata log.
* When the broker receives a partition metadata change, it looks if it is in the ISR. If it is, it updates the partition as usual. If it is not or if there is no leader defined--as would be the case if the broker was the last member of the ISR--it stops the fetcher/replica. This basically stops all the partitions for which the broker was part of their ISR.

When the broker is a replica of a partition but it is not in the ISR, the controller does not do anything. The leader epoch is not bumped. In this particular case, the follower will continue to run until the replica manager shuts down. In this time, the replica could become in-sync and the leader could try to bring it back to the ISR. This remaining issue will be addressed separately.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-25 16:09:01 -07:00
Lucas Bradstreet 46630a0610
MINOR: fix number of nodes used in test_compatible_brokers_eos_v2_enabled (#12211)
Reviewers: David Jacot <djacot@confluent.io>
2022-05-25 20:03:06 +02:00
dengziming 54d60ced86
KAFKA-13833: Remove the min_version_level from the finalized version range written to ZooKeeper (#12062)
Reviewers: David Arthur <mumrah@gmail.com>
2022-05-25 14:02:34 -04:00
nicolasguyomar 6efde847ca
MINOR: Replace left single quote with single quote in Connect worker's log message (#12201)
Minor change to use ' and not LEFT SINGLE QUOTATION MARK in this log message, as it's the only place we are using such a quote and it can break ingestion pipelines

Reviewers: Kvicii <Karonazaba@gmail.com>, Divij Vaidya <diviv@amazon.com>, Konstantine Karantasis <k.karantasis@gmail.com>
2022-05-25 10:40:46 -07:00
Lucas Bradstreet f7502f430a
MINOR: fix Connect system test runs with JDK 10+ (#12202)
When running our Connect system tests with JDK 10+, we hit the error 
    AttributeError: 'ClusterNode' object has no attribute 'version'
because util.py attempts to check the version variable for non-Kafka service objects.

Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>
2022-05-25 10:25:00 -07:00
Bruno Cadonna 286bae4251
KAFKA-10199: Implement adding standby tasks to the state updater (#12200)
This PR adds adding of standby tasks to the default implementation of the state updater.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-05-24 16:59:14 -07:00
Sayantanu Dey 9dc332f5ca
KAFKA-13217: Reconsider skipping the LeaveGroup on close() or add an overload that does so (#12035)
This is for KIP-812:

* added leaveGroup on a new close function in kafka stream
* added logic to resolve future returned by remove member call in close method
* added max check on remainingTime value in close function


Reviewers: David Jacot <david.jacot@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
2022-05-23 10:07:19 -07:00
Jason Gustafson b5699b5ccd
KAFKA-13923; Generalize authorizer system test for kraft (#12190)
Change `ZookeeperAuthorizerTest` to `AuthorizerTest` and add support for KRaft's `StandardAuthorizer` implementation.

Reviewers: David Jacot <djacot@confluent.io>
2022-05-23 09:47:14 -07:00
andymg3 4878653016
MINOR: Use parameterized logging in StandardAuthorizer and StandardAuthorizerData (#12192)
This updates StandardAuthorizer and StandardAuthorizerData to use parameterized logging per the SLF4J recommendation (see https://www.slf4j.org/faq.html). This also removes a couple if statements that explicitly check if trace is enabled, but the logger should handle not publishing the message and not constructing the String if trace is not enabled.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-05-21 18:14:02 -07:00
Divij Vaidya f6ba10ef9c
MINOR: Fix flaky test TopicCommandIntegrationTest.testDescribeAtMinIsrPartitions(String).quorum=kraft (#12189)
Flaky test as failed in CI https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-12184/1/tests/

The test fails because it does not wait for metadata to be propagated across brokers before killing a broker which may lead to it getting stale information. Note that a similar test was done in #12104 for a different test.

Reviewers: Kvicii Y, Ziming Deng, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2022-05-21 10:33:44 -07:00