Commit Graph

687 Commits

Author SHA1 Message Date
Yash Mayya 8a19f2da27
Update expected task configs for FileStream source and sink connectors in ConnectRestApiTest (#12576)
Reviewer: Chris Egerton <chrise@aiven.io>
2022-08-31 16:34:00 -04:00
Colin Patrick McCabe 28d5a05943
KAFKA-14187: kafka-features.sh: add support for --metadata (#12571)
This PR adds support to kafka-features.sh for the --metadata flag, as specified in KIP-778.  This
flag makes it possible to upgrade to a new metadata version without consulting a table mapping
version names to short integers. Change --feature to use a key=value format.

FeatureCommandTest.scala: make most tests here true unit tests (that don't start brokers) in order
to improve test run time, and allow us to test more cases. For the integration test part, test both
KRaft and ZK-based clusters. Add support for mocking feature operations in MockAdminClient.java.

upgrade.html: add a section describing how the metadata.version should be upgraded in KRaft
clusters.

Add kraft_upgrade_test.py to test upgrades between KRaft versions.

Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@gmail.com>
2022-08-30 16:56:03 -07:00
Alan Sheinberg 481fefb4f9
MINOR: Adds KRaft versions of most streams system tests (#12458)
Migrates Streams sustem tests to either use kraft brokers or to use both kraft and zk in a testing matrix.

This skips tests which use various forms of Kafka versioning since those seem to have issues with KRaft at the moment. Running these tests with KRaft will require a followup PR.

Reviewers: Guozhang Wang <guozhang@apache.org>, John Roesler <vvcephei@apache.org>
2022-08-26 16:11:19 -05:00
José Armando García Sancio 6ace67b2de
MINOR; Bump trunk to 3.4.0-SNAPSHOT (#12463)
Version bumps in trunk after the creation of the 3.3 branch.

Reviewers: David Arthur <mumrah@gmail.com>
2022-08-01 09:54:12 -07:00
Hao Li 5e4ae06d12
MINOR: fix flaky test test_standby_tasks_rebalance (#12428)
* Description
In this test, when third proc join, sometimes there are other rebalance scenarios such as followup joingroup request happens before syncgroup response was received by one of the proc and the previously assigned tasks for that proc is then lost during new joingroup request. This can result in standby tasks assigned as 3, 1, 2. This PR relax the expected assignment of 2, 2, 2 to a range of [1-3].

* Some backgroud from Guozhang:
I talked to @hao Li offline and also inspected the code a bit, and tl;dr is that I think the code logic is correct (i.e. we do not really have a bug), but we need to relax the test verification a little bit. The general idea behind the subscription info is that:

When a client joins the group, its subscription will try to encode all its current assigned active and standby tasks, which would be used as prev active and standby tasks by the assignor in order to achieve some stickiness.

When a client drops all its active/standby tasks due to errors, it does not actually report all empty from its subscription, instead it tries to check its local state directory (you can see that from TaskManager#getTaskOffsetSums which populates the taskOffsetSum. For active task, its offset would be “-2” a.k.a. LATEST_OFFSET, for standby task, its offset is an actual numerical number.

So in this case, the proc2 which drops all its active and standby tasks, would still report all tasks that have some local state still, and since it was previously owning all six tasks (three as active, and three as standby), it would report all six as standbys, and when that happens the resulted assignment as @hao Li verified, is indeed the un-even one.

So I think the actual “issue“ happens here, is when proc2 is a bit late sending the sync-group request, when the previous rebalance has already completed, and a follow-up rebalance has already triggered, in that case, the resulted un-even assignment is indeed expected. Such a scenario, though not common, is still legitimate since in practice all kinds of timing skewness across instances can happen. So I think we should just relax our verification here, i.e. just making sure that each instance has at least one standby replica at the end, not exactly evenly as “2, 2, 2”.

Reviewers: Suhas Satish <ssatish@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2022-07-21 12:12:29 -07:00
Alyssa Huang 8e9869a777
MINOR: Run MessageFormatChangeTest in ZK mode only (#12395)
KRaft mode will not support writing messages with an older message format (2.8) since the min supported IBP is 3.0 for KRaft. Testing support for reading older message formats will be covered by https://issues.apache.org/jira/browse/KAFKA-14056.

Reviewers: David Jacot <djacot@confluent.io>
2022-07-13 08:46:04 +02:00
Bruno Cadonna 4d53dd9972
KAFKA-13930: Add 3.2.0 Streams upgrade system tests (#12209)
* KAFKA-13930: Add 3.2.0 Streams upgrade system tests

Apache Kafka 3.2.0 was recently released. Now we need
to test upgrades from 3.2 to trunk in our system tests.

Reviewer: Bill Bejeck <bbejeck@apache.org>
2022-06-21 16:33:40 +02:00
Ron Dagostino b04937dc65
MINOR: Fix force kill of KRaft colocated controllers in system tests (#11238)
I noticed that a system test using a KRaft cluster with 3 brokers but only 1 co-located controller did not force-kill the second and third broker after shutting down the first broker (the one with the controller).  The issue was a floating point rounding error.  This patch adjusts for the rounding error and also makes the logic work for an even number of controllers.  A local run of `tests/kafkatest/sanity_checks/test_bounce.py` succeeded (and I manually increased the cluster size for the 1 co-located controller case and observed the correct kill behavior: the second and third brokers were force-killed as expected).

Reviewers: Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@users.noreply.github.com>, David Jacot <djacot@confluent.io>
2022-06-15 16:45:00 +02:00
Aneesh Garg 47bb93cfd7
MINOR: Replace ACL_AUTHORIZER attribute with ZK_ACL_AUTHORIZER (#12247)
Replace ACL_AUTHORIZER attribute with ZK_ACL_AUTHORIZER in system tests. Required after the changes merged with https://github.com/apache/kafka/pull/12190.

Reviewers: David Jacot <djacot@confluent.io>
2022-06-03 17:50:49 +02:00
Bruno Cadonna 5424324722
KAFKA-13930: Add 3.2.0 to core upgrade and compatibility system tests (#12210)
Apache Kafka 3.2.0 was recently released. Now we need
to test upgrades and compatibility with 3.2 in core system tests.

Reviewer: Jason Gustafson <jason@confluent.io>
2022-06-03 09:13:10 +02:00
Bruno Cadonna 0aea498b9a
MINOR: Pin ducktape version to < 0.9 (#12242)
With newer ducktape versions than < 0.9 system tests
may run into authentication issues with the AK system test
infrastructure.

The version will be bumped up once we have infrastructure
in place for newer paramiko versions brought in by ducktape
0.9.

Reviewers: Lucas Bradstreet <lucas@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Kvicii <Karonazaba@gmail.com>
2022-06-02 20:21:23 +02:00
Jason Gustafson f980820e2b
MINOR: Send kraft raft/controller logs to controller log in systests (#12222)
Currently the only place we see controller/raft logging in system tests is `server-start-stdout-stderr.log` where they are mixed with all other logs. It is more convenient to send them to `controller.log` as we do for zk tests.

Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, David Jacot <djacot@confluent.io>
2022-05-30 09:21:41 -07:00
Jason Gustafson 02fc6e7d3c
MINOR: Collect metadata log dir in kraft system tests (#12215)
It is useful to collect the directory for `__cluster_metadata` in system tests. We use a separate directory from user partitions, so it must be configured separately. 

Reviewers: David Arthur <mumrah@gmail.com>
2022-05-25 17:36:58 -07:00
Lucas Bradstreet 46630a0610
MINOR: fix number of nodes used in test_compatible_brokers_eos_v2_enabled (#12211)
Reviewers: David Jacot <djacot@confluent.io>
2022-05-25 20:03:06 +02:00
Lucas Bradstreet f7502f430a
MINOR: fix Connect system test runs with JDK 10+ (#12202)
When running our Connect system tests with JDK 10+, we hit the error 
    AttributeError: 'ClusterNode' object has no attribute 'version'
because util.py attempts to check the version variable for non-Kafka service objects.

Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>
2022-05-25 10:25:00 -07:00
Jason Gustafson b5699b5ccd
KAFKA-13923; Generalize authorizer system test for kraft (#12190)
Change `ZookeeperAuthorizerTest` to `AuthorizerTest` and add support for KRaft's `StandardAuthorizer` implementation.

Reviewers: David Jacot <djacot@confluent.io>
2022-05-23 09:47:14 -07:00
Alex Sorokoumov 78dd40123c
MINOR: Add upgrade tests for FK joins (#12122)
Follow up PR for KAFKA-13769.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-05-13 17:21:27 -07:00
Tom Bentley 467bce04ae
MINOR: Update release versions for upgrade tests with 3.1.1 release (#12156)
Updates release versions in files that are used for upgrade test with the 3.1.1 release version.

Reviewers: Bruno Cadonna <bruno@confluent.io>
2022-05-13 09:32:41 +01:00
Bruno Cadonna 020ff2fe0e
MINOR: Update release versions for upgrade tests with 3.2.0 release (#12143)
Updates release versions in files that are used for upgrade test with the 3.2.0 release version.  

Reviewer: David Jacot <djacot@confluent.io>
2022-05-10 14:47:46 +02:00
Jason Gustafson f0a09ea003
MINOR: Fix event output inconsistencies in TransactionalMessageCopier (#12098)
This patch fixes some strangeness and inconsistency in the messages written by `TransactionalMessageCopier` to stdout. Here is a sample of two messages.

Progress message:
```
{"consumed":33000,"stage":"ProcessLoop","totalProcessed":33000,"progress":"copier-0","time":"2022/04/24 05:40:31:649","remaining":333}
```
The `transactionalId` is set to the value of the `progress` key.

And a shutdown message:
```
{"consumed":33333,"shutdown_complete":"copier-0","totalProcessed":33333,"time":"2022/04/24 05:40:31:937","remaining":0}
```
The `transactionalId` this time is set to the `shutdown_complete` key and there is no `stage` key.

In this patch, we change the following:

1. Use a separate key for the `transactionalId`.
2. Drop the `progress` and `shutdown_complete` keys.
3. Use `stage=ShutdownComplete` in the shutdown message.
4. Modify `transactional_message_copier.py` system test service accordingly.

Reviewers: David Arthur <mumrah@gmail.com>
2022-04-29 10:02:25 -07:00
Luke Chen f28a2ee918
MINOR: revert back to 60s session timeout for static membership test (#11881)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-04-21 11:51:31 -07:00
David Jacot 6d36487b68
MINOR: Fix TestDowngrade.test_upgrade_and_downgrade (#12027)
The second validation does not verify the second bounce because the verified producer and the verified consumer are stopped in `self.run_validation`. This means that the second `run_validation` just spit out the same information as the first one. Instead, we should just run the validation at the end.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-04-18 14:22:33 -07:00
Konstantine Karantasis dd62ef2eda
KAFKA-13748: Do not include file stream connectors in Connect's CLASSPATH and plugin.path by default (#11908)
With this change we stop including the non-production grade connectors that are meant to be used for demos and quick starts by default in the CLASSPATH and plugin.path of Connect deployments. The package of these connector will still be shipped with the Apache Kafka distribution and will be available for explicit inclusion. 

The changes have been tested through the system tests and the existing unit and integration tests. 

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Randall Hauch <rhauch@gmail.com>
2022-03-30 13:15:42 -07:00
Bruno Cadonna 4c8685e701
MINOR: Bump trunk to 3.3.0-SNAPSHOT (#11925)
Version bumps on trunk following the creation of the 3.2 release branch.

Reviewer: David Jacot <djacot@confluent.io>
2022-03-21 21:37:05 +01:00
Justine Olshan 7afdb069bf
KAFKA-13750; Client Compatability KafkaTest uses invalid idempotency configs (#11909)
Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2022-03-17 18:00:27 +01:00
Mickael Maison 1783fb14df
MINOR: Bump latest 3.0 version to 3.0.1 (#11885)
Reviewers: Matthias J. Sax <mjsax@apache.org>
2022-03-16 11:43:37 +01:00
Stanislav Vodetskyi 7e683852b4
MINOR: unpin ducktape dependency to always use the newest version (py3 edition) (#11884)
Ensures we always have the latest published ducktape version.
This way whenever we release a new one, we won't have to cherry pick a bunch of commits across a bunch of branches.
2022-03-11 17:48:19 +05:30
Levani Kokhreidze 87eb0cf03c
KAFKA-6718: Update SubscriptionInfoData with clientTags (#10802)
adds ClientTags to SubscriptionInfoData

Reviewer: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-03-11 16:29:05 +08:00
Kamal Chandraprakash 496aa1f84b
MINOR: Provide valid examples in README page. (#10259)
* MINOR: Provide valid examples in README page.

- `testMetadataUpdateWaitTime` method is removed from MetadataTest class.
-  Removed the travis CI documentation.

Reviewers: Luke Chen <showuon@gmail.com>
2022-02-21 14:48:24 +08:00
Michal T 6d7e6d6f87
MINOR: Install missing 'tc' utility - iproute2 for systemtests (#11764)
Signed-off-by: Michal T <mtoth@redhat.com>

Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-02-16 12:56:06 +01:00
Michal T 44fcba980f
MINOR: Fix typo in system tests Dockerfile (#11740)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-02-08 18:03:57 +01:00
David Jacot 7215c90c5e
MINOR: Add 3.0 and 3.1 to streams system tests (#11716)
Reviewers: Bill Bejeck <bill@confluent.io>
2022-01-28 10:06:31 +01:00
David Jacot 110fae2f59
MINOR: Add 3.0 and 3.1 to broker and client compatibility tests (#11701)
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2022-01-25 16:22:48 +01:00
David Jacot 34208e8429
MINOR: Update files with 3.1.0 (#11698)
Reviewers: Bill Bejeck <bbejeck@apache.org>
2022-01-21 21:30:56 +01:00
Ron Dagostino 1785e1223e
KAFKA-13582: TestVerifiableProducer.test_multiple_kraft_security_protocols fails (#11664)
KRaft brokers always use the first controller listener, so if there is not also a colocated KRaft controller on the node be sure to only publish one controller listener in `controller.listener.names` even when the inter-controller listener name differs.  System tests were failing due to unnecessarily publishing a second entry in `controller.listener.names` for a broker-only config and not also publishing a mapping for it in `listener.security.protocol.map`.  Removing the unnecessary entry in `controller.listener.names` solves the problem.

Reviewers: David Jacot <djacot@confluent.io>
2022-01-10 20:54:26 +01:00
Chia-Ping Tsai b6e7f6a4df
MINOR: replace Thread.isAlive by Thread.is_alive for Python code (#11545)
Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2021-11-29 18:49:14 +08:00
Bruno Cadonna 4fed0001ec
MINOR: Fix system test StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance (#11532)
Log messages were changed in the AssignorConfiguration (#11490) that are
also used for verification in system test
StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance.

This commit fixes the test and adds comments to the log messages
that point to the test that needs to be updated in case of
changes to the log messages.

Reviewers: John Roesler <vvcephei@apache.org>, Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2021-11-25 10:48:09 +01:00
David Jacot 3aef0a5ceb
MINOR: Bump trunk to 3.2.0-SNAPSHOT (#11458)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
2021-11-02 13:38:54 +01:00
David Jacot 38a3ddb562
MINOR: Add a replication system test which simulates a slow replica (#11395)
This patch adds a new system test which exercises the shrining/expansion process of the partition leader. It does so by introducing a network partition which isolates a broker from the other brokers in the cluster but not from KRaft Controller/ZK.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-10-20 08:19:36 +02:00
Luke Chen 1af1c80e2d
MINOR: replace deprecated exactly_once_beta into exactly_once_v2 (#10884)
Replace deprecated exactly_once_beta with exactly_once_v2 in system tests.

Follow up for #10870, found out there are still some system tests using the deprecated exactly_once_beta. This PR updates them.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-09-27 17:02:48 +02:00
David Jacot f650a14d56
KAFKA-13312; 'NetworkDegradeTest#test_rate' should wait until iperf server is listening (#11344)
Reviewers: Jason Gustafson <jason@confluent.io>
2021-09-21 10:26:46 +02:00
David Jacot 493280735b
MINOR: Bump latest 2.8 version to 2.8.1 (#11341)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2021-09-20 09:23:15 +02:00
Jason Gustafson 25b0857bdb
KAFKA-13234; Transaction system test should clear URPs after broker restarts (#11267)
Clearing under-replicated-partitions helps ensure that partitions do not become unavailable longer than necessary as brokers are rolled. This prevents flakiness due to transaction timeouts.

Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2021-09-01 08:37:05 -07:00
David Jacot c4e1e23857
KAFKA-13231; `TransactionalMessageCopier.start_node` should wait until the process if fully started (#11264)
This patch ensures that the transaction message copier is fully started in `start_node`. Without this, it is possible that `stop_node` is called before the process is started which results in not stopping it at all.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-08-27 08:28:14 +02:00
John Roesler 45ecaa19f8
MINOR: Set session timeout back to 10s for Streams system tests (#11236)
We increased the default session timeout to 30s in KIP-735:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout

Since then, we are observing sporadic system test failures
due to rebalances taking longer than the test timeout.
Rather than increase the test wait times, we can just override
the session timeout to a value more appropriate in the testing
domain.

Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>
2021-08-20 11:27:54 -05:00
Zara Lim 9bc45d4e03
MINOR: Increase the Kafka shutdown timeout to 120 (#11183)
The streams static membership test has failed several times due to hitting the Kafka shutdown timeout, but the logs were showing that the shutdown did actually succeed after the 60 second timeout.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Walker Carlson <wcarlson@confluent.io>
2021-08-05 15:26:10 -07:00
Kamal Chandraprakash a103c95a31
KAFKA-12724: Add 2.8.0 to system tests and streams upgrade tests. (#10602)
Also adjusted the acceptable recovery lag to stabilize Streams tests.

Reviewers: Justine Olshan <jolshan@confluent.io>, Matthias J. Sax <mjsax@apache.org>, John Roesler <vvcephei@apache.org>
2021-08-04 17:31:10 -05:00
Matthias J. Sax a7d9a8ac36
MINOR: Remove older brokers from upgrade test (#11117)
As of version 2.2.1 , Kafka Streams uses message headers and
thus requires broker version 0.11.0 or newer.

Reviewers: John Roesler <john@confluent.io>, Ismael Juma <ismael@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-07-26 14:09:47 -07:00
Cheng Tan 8ed271e1fd
KAFKA-13026: Idempotent producer (KAFKA-10619) follow-up testings (#11002)
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2021-07-26 21:45:59 +01:00
Niket dc512cc038
KAFKA-13015: Ducktape System Tests for Metadata Snapshots (#11053)
This PR implements system tests in ducktape to test the ability of brokers and controllers to generate
and consume snapshots and catch up with the metadata log.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@gmail.com>
2021-07-23 16:28:21 -07:00