Commit Graph

973 Commits

Author SHA1 Message Date
Colin Patrick McCabe 132e0970fb
KAFKA-17018: update MetadataVersion for the Kafka release 3.9 (#16841)
- Mark 3.9-IV0 as stable. Metadata version 3.9-IV0 should return Fetch version 17.

- Move ELR to 4.0-IV0. Remove 3.9-IV1 since it's no longer needed.

- Create a new 4.0-IV1 MV for KIP-848.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Justine Olshan <jolshan@confluent.io>
2024-08-12 16:30:43 -07:00
Justine Olshan df04887ba5
MINOR: Reduce log levels for transactions_mixed_versions_test with 3.2 due to bug in that version (#16787)
7496e62434 fixed an error that caused an exception to be thrown on broker startup when debug logs were on. This made it to every version except 3.2. 

The Kraft upgrade tests totally turn off debug logs, but I think we only need to remove them for the broken version.

Note: this bug is also present in 3.1, but there is no logging on startup like in subsequent versions.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <david.jacot@gmail.com>
2024-08-12 09:51:04 -07:00
TengYao Chi da14b5a61d
KAFKA-16390: add `group.coordinator.rebalance.protocols=classic,consumer` to broker configs when system tests need the new coordinator (#16715)
Fix an issue that cause system test failing when using AsyncKafkaConsumer.
A configuration option, group.coordinator.rebalance.protocols, was introduced to specify the rebalance protocols used by the group coordinator. By default, the rebalance protocol is set to classic. When the new group coordinator is enabled, the rebalance protocols are set to classic,consumer.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, Kirk True <kirk@kirktrue.pro>, Justine Olshan <jolshan@confluent.io>
2024-08-02 16:07:45 -07:00
Josep Prat 89214d033e
KAFKA-17214: Add 3.8.0 version to core and client system tests (#16726)
Reviewers: Greg Harris <greg.harris@aiven.io>
2024-07-30 19:39:10 +02:00
Josep Prat e299a006c8
KAFKA-17214: Add 3.8.0 version to streams system tests (#16728)
* KAFKA-17214: Add 3.8.0 version to streams system tests

Reviewers: Bill Bejeck <bbejeck@gmail.com>
2024-07-30 19:04:38 +02:00
Josep Prat dcb331c623
MINOR: Add 3.8.0 to system tests (#16714)
Reviewers:  Manikumar Reddy <manikumar.reddy@gmail.com>
2024-07-30 09:19:48 +02:00
Colin P. McCabe 0ec520a2af Bump trunk to 4.0.0-SNAPSHOT 2024-07-29 15:51:54 -07:00
Xuan-Zhang Gong 9a9eb18beb
MINOR: add docs of "ducker-ak down -f" to e2e README (#16560)
Reviewers: Arnav Dadarya <ardada2468@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-18 08:34:35 +08:00
David Arthur 8aee314a46
KAFKA-16667 Avoid stale read in KRaftMigrationDriver (#15918)
When becoming the active KRaftMigrationDriver, there is another race condition similar to KAFKA-16171. This time, the race is due to a stale read from ZK. After writing to /controller and /controller_epoch, it is possible that a read on /migration is not linearized with the writes that were just made. In other words, we get a stale read on /migration. This leads to an inability to sync metadata to ZK due to incorrect zkVersion on the migration ZNode.

The non-linearizability of reads is in fact documented behavior for ZK, so we need to handle it.

To fix the stale read, this patch adds a write to /migration after updating /controller and /controller_epoch. This allows us to learn the correct zkVersion for the migration ZNode before leaving the BECOME_CONTROLLER state.

This patch also adds a check on the current leader epoch when running certain events in KRaftMigrationDriver. Historically, we did not include this check because it is not necessary for correctness. Writes to ZK are gated on the /controller_epoch zkVersion, and RPCs sent to brokers are gated on the controller epoch. However, during a time of rapid failover, there is a lot of processing happening on the controller (i.e., full metadata sync to ZK and full UMRs sent to brokers), so it is best to avoid running events we know will fail.

There is also a small fix in here to improve the logging of ZK operations. The log message are changed to past tense to reflect the fact that they have already happened by the time the log message is created.

Reviewers: Igor Soarez <soarez@apple.com>
2024-07-15 09:32:06 -04:00
Xuan-Zhang Gong 0ada8fac68
KAFKA-17096 Fix kafka_log4j_appender.py (#16559)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-12 22:35:55 +08:00
Justine Olshan e118811da3
MINOR: Fix transactions_upgrade_test to run transactions during upgrade (#16462)
Move when the upgrade happens so we actually upgrade while transactions are running

Reviewers: David Jacot <djacot@confluent.io>
2024-07-11 10:30:15 -07:00
Igor Soarez 1bec3811ad
KAFKA-17083: Update LATEST_STABLE_METADATA_VERSION in system tests (#16533)
LATEST_PRODUCTION version in MetadataVersion.java was updated in
both #16347 and #16400, but it was left unchanged in the system
tests.

Reviewers: Josep Prat <josep.prat@aiven.io>
2024-07-05 21:29:35 +01:00
David Jacot e7a75805fe
KAFKA-17050: Revert `group.version` (#16482)
This patch partially reverts `group.version` in trunk. I kept the `GroupVersion` class but removed it from `Features` so it is not advertised. I also kept all the changes in the test framework. I removed the logic to require `group.version=1` to enable the new consumer rebalance protocol. The new protocol is enabled based on the static configuration.

For the context, I prefer to revert it in trunk now so we don't forget to revert it in the 3.9 release. I will bring it back for the 4.0 release.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-02 07:21:18 -07:00
Igor Soarez 285698e0cf
MINOR: Add 3.7.1 to system tests (#16483)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-01 19:25:31 +08:00
Justine Olshan a599b89fe0
Revert "KAFKA-16275: Update kraft_upgrade_test.py to support KIP-848’s group protocol config (#16409) (#16441)
This reverts commit e95e91a.

With the change to include the group.version flag, these tests fail due to trying to set the feature for the old version.

It is unclear if these tests originally worked as intended and given the upgrade is not expected for 3.8, we will just revert from 3.8.

Reviewers: David Jacot <djacot@confluent.io>
2024-06-25 10:15:58 -07:00
vamossagar12 ceec218351
KAFKA-16949: Fixing test_dynamic_logging in system test connect_distributed_test (#15915)
Reviewers: Chris Egerton <chrise@aiven.io>
2024-06-25 12:36:04 -04:00
Luke Chen 191b6476d7
KAFKA-16988: add 1 more node for test_exactly_once_source system test (#16379)
Reviewers: Igor Soarez <soarez@apple.com>
2024-06-18 13:03:55 +01:00
Gaurav Narula 39b514d350
MINOR: use 2 logdirs in ZK migration system tests (#15394)
Zookeeper migration system tests currently override the config to
use only one log directory.

This PR removes the override so that the system tests run with 2 log
directories following the work done as part of KIP-858.

Reviewers: Igor Soarez <soarez@apple.com>, Proven Provenzano <pprovenzano@confluent.io>
2024-06-18 11:49:25 +01:00
Krishna Agarwal bcf781230e
KAFKA-16932: Add documentation for the native docker image (#16338)
This PR contains the the following documentation changes for the native docker image:

in the docker/README.md: How to build, release and promote the native docker image.
in the tests/README.md: How to run system tests by bringing up kafka in the native mode.
added docker/native/README.md
added html changes for the kafka-site
added native docker image support in the docker compose files examples.

Testing:
Tested all the docker compose files with both the docker images - jvm and native
Tested the html changes locally with the kafka-site

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Vedarth Sharma <vesharma@confluent.io>
2024-06-16 14:01:13 +05:30
A. Sophie Blee-Goldman 4333af5c9f
KAFKA-15045: (KIP-924 pt. 25) Rename old internal StickyTaskAssignor to LegacyStickyTaskAssignor (#16322)
To avoid confusion in 3.8/until we fully remove all the old task assignors and internal config, we should rename the old internal assignor classes like the StickyTaskAssignor so that they won't be mixed up with the new version of the assignor (which is also named StickyTaskAssignor)

Reviewers: Bruno Cadonna <cadonna@apache.org>, Josep Prat <josep.prat@aiven.io>
2024-06-13 11:27:50 -07:00
David Jacot 190dd79457
KAFKA-16860; [2/2] Introduce group.version feature flag (#16149)
This patch updates the system tests to correctly enable the new consumer protocol/coordinator in the tests requiring them.

I went with the simplest approach for now. Long term, I think that we should refactor the tests to better handle features and non-production features.

I got a successful run of the consumer system tests with this patch combined with https://github.com/apache/kafka/pull/16120: https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1717155071--dajac--KAFKA-16860-2--29028ae0dd/2024-05-31--001./2024-05-31--001./report.html.

Reviewers: Justine Olshan <jolshan@confluent.io>
2024-05-31 12:49:26 -07:00
Josep Prat 7e81cc5e68
MINOR: Bump trunk to 3.9.0-SNAPSHOT (#16150)
Signed-off-by: Josep Prat <josep.prat@aiven.io>

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-31 16:41:44 +02:00
Krishna Agarwal bb6a042e99
KAFKA-16827: Integrate kafka native-image with system tests (#16046)
This PR does following things

System tests should bring up Kafka broker in the native mode
System tests should run on Kafka broker in native mode
Extract out native build command so that it can be reused.
Allow system tests to run on Native Kafka broker using Docker mechanism

To run system tests by bringing up Kafka in native mode:
Pass kafka_mode as native in the ducktape globals:--globals '{\"kafka_mode\":\"native\"}'

Running system tests by bringing up kafka in native mode via docker mechanism
_DUCKTAPE_OPTIONS="--globals '{\"kafka_mode\":\"native\"}'" TC_PATHS="tests/kafkatest/tests/"  bash tests/docker/run_tests.sh

To only bring up ducker nodes to cater native kafka
bash tests/docker/ducker-ak up -m native

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2024-05-30 22:24:23 +05:30
Lucas Brutschy 1dcdccf736
MINOR: fix streams_broker_compatibility test (#16015)
The system test was broken in ccf4bd5
which failed to import the matrix symbol. The test was failing silently, not
discovering any tests.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2024-05-21 16:07:49 +02:00
Justine Olshan 3e15ab98ec
KAFKA-16992: InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not enabled when upgrading from kafka (#15971)
We weren't enabling discoverBrokerVersions to check the supported versions in the AddPartitionsToTxnManager. This means that any verification request (or any AddPartitionsToTxnRequest version) from a newer broker would fail when sending to an older broker.

The bulk of this change is adding additional transactions system tests for old versions.
One test upgrades the cluster completely. This didn't catch the issue but could be useful.

The other test forces a new broker to send a verification request to an older one. Without the discoverBrokerVersions change, all tests between mixed brokers failed. (We introduced a new request version in 3.8 -- which is a separate version from the one that caused the bug for 3.5 -> 3.6) With the addition, the tests all passed.

I also manually ran a test for 3.5 -> 3.6 since the issue there was slightly different and was caused by the unstableLatestVersion flag being enabled. This change should fix this as well. 👍

Reviewers:  David Jacot <djacot@confluent.io>
2024-05-17 21:35:28 -07:00
Lianet Magrans a4952572dc
MINOR: Change sys test describe topic parsing to improve extensibility (#15941)
Minor change to how the describe topic output is parsed in system tests, to ensure that the output is preserved, even if only some fields are relevant to the test for now (which is what the test used to do before recent changes)

Initial problem: System tests were parsing the describe topic output in kafka.py assuming all fields would include a value. The describe API was recently changed, breaking this logic, because it included new fields for which there may not be values (ex. LastKnownElr).

Initial fix: The initial fix for this was to drop all fields from the output except for the ones currently used in the test, where in reality only the fields without values are the problematic ones.

Proposed improvement: A more extensible approach would be to drop only the fields that have no values and preserve the full output, which is what the test did before the initial fix mentioned above. This allows to easily extend the test to include more fields as needed, which could follow as the describe API and tests evolves (it will only require to add the fields to the returned value when needed, without having to change how the fields object is stripped).

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-05-15 11:09:44 +02:00
Lucas Brutschy 3b43edd7a1
MINOR: Remove dev_version parameter from streams tests (#15874)
In two tests, we are using the current snapshot version as a test parameter
`to_version`, but as the only option. We can hardcode it. This
simplifies testing downstream, since the test parameters do not change
with every version. In particular, some tests downstream are blacklisted
because they do not work with ARM. These lists need to be updated every
time `DEV_VERSION` is bumped.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2024-05-08 09:43:26 +02:00
Lianet Magrans 636e65aa6b
KAFKA-16465: Fix consumer sys test revocation validation (#15778)
This fixes a consumer system test that was failing for the new protocol. The failure was because the test was expecting the eager behaviour of partitions being revoked on every rebalance, and it was wrongfully applying it to the runs with the new protocol too.
This same situation was previously identified and fixed in other parts of the sys test with #15661.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-04-29 11:56:36 +02:00
Kirk True 21faf874c0
KAFKA-16565: IncrementalAssignmentConsumerEventHandler throws error when attempting to remove a partition that isn't assigned (#15737)
Checking that the TopicPartition is in assignment before attempting to remove it.

Also added some logging and refactoring.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Lianet Magrans <lianetmr@gmail.com>
2024-04-26 10:21:22 +02:00
Kirk True dcdf812880
KAFKA-16609: Update parse_describe_topic to support new topic describe output (#15799)
The format of the 'describe topic' output was changed as part of KAFKA-15585 which required an update in the parsing logic used by system tests.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-04-25 11:12:42 +02:00
Justine Olshan a7ceacdd35
KAFKA-16571: reassign_partitions_test.bounce_brokers should wait for messages to be sent to every partition (#15739)
Added the check before the reassignment occurs and we start bouncing brokers.

Reviewers: David Mao <dmao@confluent.io>, David Jacot <djacot@confluent.io>
2024-04-24 17:30:20 -07:00
vamossagar12 f22ad6645b
KAFKA-16272: Adding new coordinator related changes for connect_distributed.py (#15594)
Summary of the changes:

Parameterizes the tests to use new coordinator and pass in consumer group protocol. This would be applicable to sink connectors only.
Enhances the sink connector creation code in system tests to accept a new optional parameter for consumer group protocol to be used.
Sets the consumer group protocol via consumer.override. override config when the new group coordinator is enabled.
Note about testing: There are 288 tests that need to be run and running on my local takes a lot of time. I will try to post the test results once I have a full run.

Reviewers: Kirk True <ktrue@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>, Philip Nee <pnee@confluent.io>
2024-04-19 17:29:50 +02:00
Philip Nee b87cd66dab
KAFKA-16579: Revert Consumer Rolling Upgrade (#15753)
Consumer Rolling Upgrade is meant to test the protocol upgrade for the old protocol. Therefore, I am removing old changes.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-04-19 11:36:27 +02:00
Lianet Magrans 9ea0503e23
KAFKA-16566: Fix consumer static membership system test with new protocol (#15738)
Updating consumer system test that was failing with the new protocol, related to static membership behaviour. The behaviour regarding static consumers that join with conflicting group instance id is slightly different between the classic and new consumer protocol, so the expectations in the tests needed to be updated.

If static members join with same instance id:

Classic protocol: all members join the group with the same group instance id, and then the first one will eventually fail (receives a HB error with FencedInstanceIdException)

Consumer protocol: new member with an instance Id already in use is not able to join, and first member remains active (new member with same instance Id receives an UnreleasedInstanceIdException in the response to the HB to join the group)

This PR is keeping the single parametrized test that existed before, given that what's being tested and part of the test itself apply to all protocols. This is just updating the expectations that are different, based on the protocol parameter.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Kirk True <ktrue@confluent.io>
2024-04-19 11:34:05 +02:00
Philip Nee dc9fbe453c
KAFKA-16389: ConsumerEventHandler does not support incremental assignment changes causing failure in system test (#15661)
The current AssignmentValidationTest only tests EAGER assignment protocol and does not support incremental assignment like CooperativeStickyAssignor and consumer protocol. Therefore in the ConsumerEventHandler, I subclassed the existing handler overridden the assigned and revoke event handling methods, to permit incremental changes to the current assignments.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Kirk True <ktrue@confluent.io>
2024-04-10 18:52:05 +02:00
Gaurav Narula bdd85405e3
KAFKA-16293: Test log directory failure in Kraft (#15409)
Enables log directory failure system test for all Kraft modes in addition to ZK mode.

Reviewers: Luke Chen <showuon@gmail.com>, Igor Soarez <soarez@apple.com>, Proven Provenzano <pprovenzano@confluent.io>
2024-04-06 16:01:25 +08:00
Manikumar Reddy fd9c7d2932
MINOR: Add 3.6.2 to system tests (#15665)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-05 19:36:23 +05:30
Kirk True c7ef80bb6c
KAFKA-16439: Update replication_replica_failure_test.py to support KIP-848’s group protocol config (#15629)
Added a new optional group_protocol parameter to the test methods, then passed that down to the setup_consumer method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢

Reviewers: Walker Carlson <wcarlson@apache.org>
2024-04-03 12:13:26 -05:00
Kirk True 6bb9caced0
KAFKA-16440: Update security_test.py to support KIP-848’s group protocol config (#15628)
Added a new optional group_protocol parameter to the test methods, then passed that down to the setup_consumer method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢

Reviewers: Walker Carlson <wcarlson@apache.org>
2024-04-03 12:13:14 -05:00
Kirk True 6569a354e6
KAFKA-16438: Update consumer_test.py’s static tests to support KIP-848’s group protocol config (#15627)
Migrated the following tests for the new consumer:

- test_fencing_static_consumer
- test_static_consumer_bounce
- test_static_consumer_persisted_after_rejoin

Reviewers: Walker Carlson <wcarlson@apache.org>
2024-04-03 12:13:03 -05:00
Kirk True e95e91a062
KAFKA-16275: Update kraft_upgrade_test.py to support KIP-848’s group protocol config (#15626)
Added a new optional group_protocol parameter to the test methods, then passed that down to the methods involved.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢

Reviewers: Walker Carlson <wcarlson@apache.org>
2024-04-03 12:12:51 -05:00
Philip Nee ace2152d22
KAFKA-16272: Update connect_distributed_test.py to support KIP-848’s group protocol config (#15576)
Update connect_distributed_test.py to support KIP-848’s group protocol config. Not all tests are updated because only a subset of it is using the consumer directly.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Kirk True <ktrue@confluent.io>
2024-03-27 12:10:29 +01:00
Kirk True 159d25a7df
KAFKA-16276: Update transactions_test.py to support KIP-848’s group protocol config (#15567)
Added a new optional group_protocol parameter to the test methods, then passed that down to the methods involved.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-22 14:58:40 +01:00
Kirk True f66610095c
KAFKA-16274: Update replica_scale_test.py to support KIP-848’s group protocol config (#15577)
Added a new optional group_protocol parameter to the test methods, then passed that down to the methods involved.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-22 14:57:37 +01:00
Philip Nee fb03da0df4
KAFKA-16271: Upgrade consumer_rolling_upgrade_test.py (#15578)
Upgrading the test to use the consumer group protocol. The two tests are failing due to Mismatch Assignment

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-22 11:17:49 +01:00
Alyssa Huang 7e85d7d32e
MINOR: KRaft upgrade tests should only use latest stable mv (#15566)
This should help us avoid testing MVs before they are usable (stable).
We revert back from testing 3.8 in this case since 3.7 is the current stable version.

Reviewers: Proven Provenzano <pprovenzano@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-03-21 11:06:40 -07:00
Philip Nee c5804e7649
KAFKA-16273: Update consumer_bench_test.py to use consumer group protocol (#15548)
Adding this as part of the greater effort to modify the system tests to incorporate the use of consumer group protocol from KIP-848. Following is the test results and the tests using protocol = consumer are expected to fail:

================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.11.4
session_id:       2024-03-16--002
run time:         76 minutes 36.150 seconds
tests run:        28
passed:           25
flaky:            0
failed:           3
ignored:          0
================================================================================

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Kirk True <ktrue@confluent.io>
2024-03-21 16:23:51 +01:00
Kirk True 386347bbca
KAFKA-16270: Update snapshot_test.py to support KIP-848’s group protocol config (#15538)
Added a new optional `group_protocol` parameter to the test methods, then passed that down to the `setup_consumer` method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new `@matrix` block instead of adding the `group_protocol=["classic", "consumer"]` to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-15 14:21:41 +01:00
Kirk True 57557df3ed
KAFKA-16269: Update reassign_partitions_test.py to support KIP-848’s group protocol config (#15540)
Added a new optional `group_protocol` parameter to the test methods, then passed that down to the `setup_consumer` method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new `@matrix` block instead of adding the `group_protocol=["classic", "consumer"]` to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-15 14:20:46 +01:00
Kirk True 8359f947a7
KAFKA-16268: Update fetch_from_follower_test.py to support KIP-848’s group protocol config (#15539)
Added a new optional `group_protocol` parameter to the test methods, then passed that down to the `setup_consumer` method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new `@matrix` block instead of adding the `group_protocol=["classic", "consumer"]` to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-15 14:19:56 +01:00
Kirk True 7945d322f6
KAFKA-16267: Update consumer_group_command_test.py to support KIP-848’s group protocol config (#15537)
* KAFKA-16267: Update consumer_group_command_test.py to support KIP-848’s group protocol config

Added a new optional group_protocol parameter to the test methods, then passed that down to the setup_consumer method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢

Note: this requires #15330.

* Update consumer_group_command_test.py

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-15 14:19:01 +01:00
Kirk True af0ec247cc
KAFKA-16231: Update consumer_test.py to support KIP-848’s group protocol config (#15330)
Added a new optional group_protocol parameter to the test methods, then passed that down to the setup_consumer method.

Unfortunately, because the new consumer can only be used with the new coordinator, this required a new @matrix block instead of adding the group_protocol=["classic", "consumer"] to the existing blocks 😢

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-03-15 14:17:22 +01:00
Stanislav Kozlovski 652537f28e
MINOR: Add 3.7.0 to core and client's upgrade compatibility tests (#15452)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-03-07 22:20:54 +08:00
Matthias J. Sax ccf4bd5f46
MINOR: Add 3.7 to Kafka Streams system tests (#15443)
Reviewers: Bruno Cadonna <bruno@confluent.io>
2024-03-06 12:02:58 -08:00
Stanislav Kozlovski 3075a3f144
MINOR: Add 3.7.0 to system tests (#15436)
As per the release instructions, this patch bumps the versions in the associated files here as part of the 3.7 release
2024-02-27 09:58:48 +01:00
Ahmed Sobeh 6eea40882d
KAFKA-15349: ducker-ak should fail fast when gradlew systemTestLibs fails (#15391)
In this modification, if ./gradlew systemTestLibs fails, the script will output an error message and terminate execution using the die function. This ensures that the script fails fast and prompts the user to address the error before continuing.

Reviewers: Luke Chen <showuon@gmail.com>
2024-02-20 11:19:53 +08:00
Kirk True 7a07aefd86
KAFKA-16230: Update verifiable_consumer.py to support KIP-848’s group protocol config (#15328)
The Python VerifiableConsumer now passes in the --group-protocol and --group-remote-assignor command line arguments to VerifiableConsumer if the node is running 3.7.0+ and using the new consumer group.protocol.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2024-02-16 11:53:53 +01:00
Mickael Maison 0bf830fc9c
KAFKA-14576: Move ConsoleConsumer to tools (#15274)
Reviewers: Josep Prat <josep.prat@aiven.io>, Omnia Ibrahim <o.g.h.ibrahim@gmail.com>
2024-02-13 19:24:07 +01:00
Mickael Maison 092dc7fc46
KAFKA-16238: Fix ConnectRestApiTest system test (#15346)
Reviewers: Yash Mayya <yash.mayya@gmail.com>
2024-02-09 13:36:56 +01:00
Matthias J. Sax 93712eca15
KAFKA-15594: Add version 3.6 to Kafka Streams system tests (#15151)
Reviewers: Walker Carlson <wcarlson@confluent.io>
2024-01-26 14:59:24 -08:00
Mickael Maison 27122634ab
MINOR: Add 3.5.2 and 3.6.1 to system tests (#14932)
Reviewers: Matthias J. Sax <mjsax@apache.org>
2024-01-12 15:55:10 +01:00
David Jacot a8203f9c7a
KAFKA-14505; [4/N] Wire transaction verification (#15142)
This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller.

Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when https://github.com/apache/kafka/pull/15087 is merged.

Reviewers: Justine Olshan <jolshan@confluent.io>
2024-01-11 04:58:57 -08:00
Matthias J. Sax 5ec9899d1a
MINOR: bump dev version for system tests to 3.8 (#15157)
Reviewers: Josep Prat <josep.prat@aiven.io>
2024-01-09 23:08:36 -08:00
Stanislav Kozlovski b352cc6b4e
MINOR: Bump trunk to 3.8.0-SNAPSHOT (#14993)
This patch bumps the next release version to 3.8.0-SNAPSHOT.

Following the Release Process, I created the 3.7 branch and am following the steps to bump these versions:

Modify the version in trunk to bump to the next one (eg. "0.10.1.0-SNAPSHOT") in the following files:

docs/js/templateData.js
gradle.properties
kafka-merge-pr.py
streams/quickstart/java/pom.xml
streams/quickstart/java/src/main/resources/archetype-resources/pom.xml
streams/quickstart/pom.xml
tests/kafkatest/__init__.py
2023-12-14 09:07:18 +01:00
Jeff Kim ba49006561
MINOR: disable test_transactions with new group coordinator
https://issues.apache.org/jira/browse/KAFKA-14505 is not done yet so we need to disable the system test. Added a comment in the jira to re-enable once it's implemented.

Reviewers: Justine Olshan <jolshan@confluent.io>
2023-12-01 08:47:12 -08:00
Ritika Reddy 150b0e8290
KAFKA-15578: Migrating other system tests to use the group coordinator (#14582)
This patch converts a few more system tests to using the new group coordinator. This is only applied to KRaft clusters.

Reviewers: David Jacot <djacot@confluent.io>
2023-11-22 01:52:30 -08:00
José Armando García Sancio 809694a9f6
MINOR; Fix cluster size for migration tests (#14726)
Use smaller cluster sizes instead of the default cluster size

Reviewers: David Arthur <mumrah@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2023-11-09 15:47:15 -08:00
José Armando García Sancio 35317d8f7b
MINOR; Fix KRaft metadata version system tests (#14722)
The latest metadata version is now 3.7. Fix the KRaft upgrade
test to upgrade to that version instead of 3.6.

Change the vagrant setup and gradle dependencies to use 3.3.2 instead of 3.3.1.

Reviewers: David Arthur <mumrah@gmail.com>
2023-11-09 12:03:12 -08:00
David Arthur 37715862d7 KAFKA-15704: Set missing ZkMigrationReady field on ControllerRegistrationRequest
This field was missed by the initial KIP-919 PR(s). The result is that migrations can't begin since
the controllers will never become ready. This patch fixes that as well as pulls over some fixes
from the 3.6 branch.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-10-27 14:16:24 -07:00
Ritika Reddy 68a5072f54
KAFKA-15578: System Tests for running old protocol with new coordinator (#14524)
This patch adds configs to facilitate the testing with the new group coordinator (KIP-848) in kraft mode. Only one test files is converted at the moment. The others will follow.

Reviewers: Ian McDonald <imcdonald@confluent.io>, David Jacot <djacot@confluent.io>
2023-10-27 10:33:40 -07:00
Mickael Maison 8b9f6d17f2
KAFKA-15093: Add 3.5 Streams upgrade system tests (#14602)
Reviewers: Matthias J. Sax <mjsax@apache.org>
2023-10-23 13:26:50 +02:00
shuoer86 27a155c80a
MINOR: Fix typos in build.gradle, tests and trogdor (#14574)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, hudeqi <1217150961@qq.com>
2023-10-23 12:30:57 +02:00
Mickael Maison 9c77c17c4e
KAFKA-15664: Add 3.4 Streams upgrade system tests (#14601)
Reviewers: Luke Chen <showuon@gmail.com>,  Matthias J. Sax <mjsax@apache.org>
2023-10-23 10:33:59 +02:00
Matthias J. Sax 4371214fbe
KAFKA-15378: fix streams upgrade system test (#14539)
Fixing bad test setup. We tried to fix an upgrade bug for FK-joins in 3.1 release, but it later turned out that the PR was not sufficient to fix it. We finally fixed in 3.4 release.

This PR updates the system test matrix to only test working versions with FK-joins, limited to available test versions.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Hao Li <hli@confluent.io>, Mickael Maison <mickael.maison@gmail.com>
2023-10-20 16:20:00 -07:00
Chris Egerton 091eb9b349
KAFKA-15428: Cluster-wide dynamic log adjustments for Connect (#14538)
Reviewers: Greg Harris <greg.harris@aiven.io>, Yang Yang <yayang@uber.com>, Yash Mayya <yash.mayya@gmail.com>
2023-10-20 09:52:37 -04:00
Satish Duggana cc951e3f81
KAFKA-15593: Add 3.6 to core upgrade and compatibility tests (#14527)
Reviewers:  Christo Lolov <lolovc@amazon.com>, Josep Prat <josep.prat@aiven.io>
2023-10-13 20:51:34 +05:30
Satish Duggana 354c9ca0ce
MINOR Added 3.6.0 to system tests (#14488)
Reviewers: Luke Chen <showuon@gmail.com>
2023-10-08 05:29:06 +05:30
Matthias J. Sax cdf726fd35
HOTIFX: fix Kafka versions for system tests (#14490)
Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
2023-10-05 10:23:23 -07:00
Luke Chen 6f9681e10f
MINOR: fix kraft upgrade system test (#14424)
We should use DEV_BRANCH instead of DEV_VERSION in this case, otherwise, error will be thrown:

RunnerClient: kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.6.0-SNAPSHOT.metadata_quorum=ISOLATED_KRAFT: FAIL: RemoteCommandError({'ssh_config': {'host': 'ducker10', 'hostname': 'ducker10', 'user': 'ducker', 'port': 22, 'password': '', 'identityfile': '/home/ducker/.ssh/id_rsa', 'connecttimeout': None}, 'hostname': 'ducker10', 'ssh_hostname': 'ducker10', 'user': 'ducker', 'externally_routable_ip': 'ducker10', '_logger': <Logger kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.6.0-SNAPSHOT.metadata_quorum=ISOLATED_KRAFT-2 (DEBUG)>, 'os': 'linux', '_ssh_client': <paramiko.client.SSHClient object at 0xffffb35d5820>, '_sftp_client': <paramiko.sftp_client.SFTPClient object at 0xffffb35f8ca0>, '_custom_ssh_exception_checks': None}, '/opt/kafka-3.6.0-SNAPSHOT/bin/kafka-storage.sh format --ignore-formatted --config /mnt/kafka/kafka.properties --cluster-id I2eXt9rvSnyhct8BYmW6-w', 127, b'bash: line 1: /opt/kafka-3.6.0-SNAPSHOT/bin/kafka-storage.sh: No such file or directory\n')

Reviewers: Satish Duggana <satishd@apache.org>
2023-09-23 11:20:22 +08:00
Ruslan Krivoshein b72d92919f
KAFKA-14581: Moving GetOffsetShell to tools (#13562)
This PR moves GetOffsetShell from core module to tools module with rewriting from Scala to Java.

Reviewers: Federico Valeri fedevaleri@gmail.com, Ziming Deng dengziming1993@gmail.com, Mickael Maison mimaison@apache.org.
2023-09-11 10:30:22 +08:00
Luke Chen c9715a3485
MINOR: Use "add-exports" only when jdk >= 16 in minikdc (#14232)
Use "add-exports" only when jdk >= 16 in minikdc

Reviewers: Greg Harris <greg.harris@aiven.io>
2023-08-25 11:52:37 +08:00
Satish Duggana 9e3b1f9b9b
MINOR Bump trunk to 3.7.0-SNAPSHOT (#14286)
Reviewers: Divij Vaidya <diviv@amazon.com>
2023-08-25 05:03:38 +05:30
David Arthur 418b8a6e59
KAFKA-14538 Metadata transactions in MetadataLoader (#14208)
This PR contains three main changes:

- Support for transactions in MetadataLoader
- Abort in-progress transaction during controller failover
- Utilize transactions for ZK to KRaft migration

A new MetadataBatchLoader class is added to decouple the loading of record batches from the
publishing of metadata in MetadataLoader. Since a transaction can span across multiple batches (or
multiple transactions could exist within one batch), some buffering of metadata updates was needed
before publishing out to the MetadataPublishers. MetadataBatchLoader accumulates changes into a
MetadataDelta, and uses a callback to publish to the publishers when needed.

One small oddity with this approach is that since we can "splitting" batches in some cases, the
number of bytes returned in the LogDeltaManifest has new semantics. The number of bytes included in
a batch is now only included in the last metadata update that is published as a result of a batch.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-08-21 16:02:14 -07:00
Ron Dagostino 6008af7468
MINOR: Enable delegation token system test for KRaft (#14268)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-08-21 13:02:32 -04:00
Greg Harris 82ae77f945
KAFKA-15226: Add connect-plugin-path and plugin.discovery system test (#14230)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-08-18 15:28:43 -07:00
Lucas Brutschy ee036ed9ef
KAFKA-15319: Upgrade rocksdb to fix CVE-2022-37434 (#14216)
Rocksdbjni<7.9.2 is vulnerable to CVE-2022-37434 due to zlib 1.2.12

Reviewers: Divij Vaidya <diviv@amazon.com>, Bruno Cadonna <cadonna@apache.org>
2023-08-18 18:31:27 +02:00
Greg Harris a9efca0bf6
KAFKA-14759: Move Mock, Schema, and Verifiable connectors to new test-plugins module (#13302)
Reviewers: Hector Geraldino <hgeraldino@gmail.com>, Chris Egerton <chrise@aiven.io>
2023-08-16 10:30:24 -07:00
Maros Orsak ac6a536c7c
MINOR: Fix MiniKdc Java 17 issue in system tests (#14011)
Kafka system tests with Java version 17 are failing on this issue:

```python
TimeoutError("MiniKdc didn't finish startup",)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/ducktape/tests/runner_client.py", line 186, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.6/site-packages/ducktape/tests/runner_client.py", line 246, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.6/site-packages/ducktape/mark/_mark.py", line 433, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/sanity_checks/test_verifiable_producer.py", line 74, in test_simple_run
    self.kafka.start()
  File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 635, in start
    self.start_minikdc_if_necessary(add_principals)
  File "/opt/kafka-dev/tests/kafkatest/services/kafka/kafka.py", line 596, in start_minikdc_if_necessary
    self.minikdc.start()
  File "/usr/local/lib/python3.6/site-packages/ducktape/services/service.py", line 265, in start
    self.start_node(node, **kwargs)
  File "/opt/kafka-dev/tests/kafkatest/services/security/minikdc.py", line 114, in start_node
    monitor.wait_until("MiniKdc Running", timeout_sec=60, backoff_sec=1, err_msg="MiniKdc didn't finish startup")
  File "/usr/local/lib/python3.6/site-packages/ducktape/cluster/remoteaccount.py", line 754, in wait_until
    allow_fail=True) == 0, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/ducktape/utils/util.py", line 58, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: MiniKdc didn't finish startup
```

Specifically, when one runs the test cases and looks at the logs of the MiniKdc:
```java
Exception in thread "main" java.lang.IllegalAccessException: class kafka.security.minikdc.MiniKdc cannot access class sun.security.krb5.Config (in module java.security.jgss) because module java.security.jgss does not export sun.security.krb5 to unnamed module @24959ca4
    at java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
    at java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
    at java.base/java.lang.reflect.Method.invoke(Method.java:560)
    at kafka.security.minikdc.MiniKdc.refreshJvmKerberosConfig(MiniKdc.scala:268)
    at kafka.security.minikdc.MiniKdc.initJvmKerberosConfig(MiniKdc.scala:245)
    at kafka.security.minikdc.MiniKdc.start(MiniKdc.scala:123)
    at kafka.security.minikdc.MiniKdc$.start(MiniKdc.scala:375)
    at kafka.security.minikdc.MiniKdc$.main(MiniKdc.scala:366)
    at kafka.security.minikdc.MiniKdc.main(MiniKdc.scala)
```

This error is caused by the fact that sun.security module is no longer supported in Java 16 and higher. Related to the [1]. 
There are two ways how to solve it, and I present one of them. The second way is to export the ENV variable during the deployment of the containers using Ducktape in [2].

[1] - https://openjdk.org/jeps/396
[2] - https://github.com/apache/kafka/blob/trunk/tests/docker/ducker-ak#L308

Reviewers: Ismael Juma <ismael@juma.me.uk>, Luke Chen <showuon@gmail.com>
2023-08-07 06:19:55 -07:00
Alyssa Huang e5861eeaae
[MINOR] Add latest versions to kraft upgrade kafkatest (#14084)
Reviewers: Ron Dagostino <rndgstn@gmail.com>
2023-07-27 16:12:25 -04:00
Divij Vaidya 353141ed92
KAFKA-15251: Add 3.5.1 to system tests (#14069)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-07-27 12:33:34 +02:00
Federico Valeri bb677c4959
KAFKA-14583: Move ReplicaVerificationTool to tools (#14059)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-07-26 12:04:34 +02:00
Greg Harris 125dbb9286
KAFKA-14760: Move ThroughputThrottler from tools to clients, remove tools dependency from connect-runtime (#13313)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2023-07-20 12:58:48 -07:00
Mickael Maison b584e91036
KAFKA-15093: Add 3.4.0 and 3.5.0 to core upgrade and compatibility system tests (#13859)
Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov  <christololov@gmail.com>
2023-07-12 10:36:57 +02:00
Mickael Maison 354db26b95
MINOR: Add 3.5.0 and 3.4.1 to system tests (#13849)
Reviewers: Luke Chen <showuon@gmail.com>
2023-07-12 10:11:44 +02:00
Yi-Sheng Lien b8f3776f24
KAFKA-15155: Follow PEP 8 best practice in Python to check if a container is empty (#13974)
Reviewers: Divij Vaidya <diviv@amazon.com>
2023-07-11 11:01:50 +02:00
hudeqi 8be601d051
MINOR: Move TROGDOR.md to trogdor module (#13979)
Reviewers: Divij Vaidya <diviv@amazon.com>

---------

Co-authored-by: Deqi Hu <deqi.hu@shopee.com>
2023-07-10 18:11:21 +02:00
DL1231 4149e31cad
KAFKA-15153: Use Python 'is' instead of '==' to compare for None (#13964)
Reviewers: Divij Vaidya <diviv@amazon.com>

Co-authored-by: d00791190 <dinglan6@huawei.com>
2023-07-06 16:59:13 +02:00
Manyanda Chitimbo f32ebeab17
MINOR: Bump requests (python package) from 2.24.0 to 2.31.0 in /tests (#13903)
Update "requests" lib used in system tests to version "2.31.0" to fix CVE-2023-32681: Unintended leak of Proxy-Authorization header in requests

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-06-27 21:17:22 +02:00
David Arthur d27ba5bfba
KAFKA-15010 ZK migration failover support (#13758)
This patch adds snapshot reconciliation during ZK to KRaft migration. This reconciliation happens whenever a snapshot is loaded by KRaft, or during a controller failover. Prior to this patch, it was possible to miss metadata updates coming from KRaft when dual-writing to ZK.

Internally this adds a new state SYNC_KRAFT_TO_ZK to the KRaftMigrationDriver state machine. The controller passes through this state after the initial ZK migration and each time a controller becomes active. 

Logging during dual-write was enhanced to include a count of write operations happening.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-06-01 10:25:46 -04:00
Matthias J. Sax b40a7fc037
HOTFIX: fix broken Streams upgrade system test (#13654)
Reviewers: Victoria Xia <victoria.xia@confluent.io>, John Roesler <john@confluent.io>
2023-05-08 14:24:11 -07:00
David Arthur 0822ce0ed1
KAFKA-14840: Support for snapshots during ZK migration (#13461)
This patch adds support for handling metadata snapshots while in dual-write mode. Prior to this change, if the active
controller loaded a snapshot, it would get out of sync with the ZK state.

In order to reconcile the snapshot state with ZK, several methods were added to scan through the metadata in ZK to
compute differences with the MetadataImage. Since this introduced a lot of code, I opted to split out a lot of methods
from ZkMigrationClient into their own client interfaces, such as TopicMigrationClient, ConfigMigrationClient, and
AclMigrationClient. Each of these has some iterator method that lets the caller examine the ZK state in a single pass
and without using too much memory.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Luke Chen <showuon@gmail.com>
2023-05-05 01:35:26 -07:00
David Arthur c1b5c75d92
KAFKA-14805 KRaft controller supports pre-migration mode (#13407)
This patch adds the concept of pre-migration mode to the KRaft controller. While in this mode, 
the controller will only allow certain write operations. The purpose of this is to disallow metadata 
changes when the controller is waiting for the ZK migration records to be committed.

The following ControllerWriteEvent operations are permitted in pre-migration mode

* completeActivation
* maybeFenceReplicas
* writeNoOpRecord
* processBrokerHeartbeat
* registerBroker (only for migrating ZK brokers)
* unregisterBroker

Raft events and other controller events do not follow the same code path as ControllerWriteEvent, 
so they are not affected by this new behavior.

This patch also add a new metric as defined in KIP-868: kafka.controller:type=KafkaController,name=ZkMigrationState

In order to support upgrades from 3.4.0, this patch also redefines the enum value of value 1 to mean 
MIGRATION rather than PRE_MIGRATION.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2023-04-26 10:20:30 -04:00
Chia-Ping Tsai 2271e748a1
MINOR: fix zookeeper_migration_test.py (#13620)
Reviewers: Mickael Maison <mimaison@users.noreply.github.com>
2023-04-24 17:21:19 +08:00
Mickael Maison dc1ede8d89
MINOR: Bump trunk to 3.6.0-SNAPSHOT (#13570)
Reviewers: David Jacot <djacot@confluent.io>
2023-04-14 14:17:07 +02:00
Gantigmaa Selenge 951894d2ff
MINOR: Install missing iputils-ping for system tests (#13500)
Some system tests from kafkatest.tests.core.network_degrade_test are failing due to missing utility iputils-ping.

[DEBUG - 2023-02-04 01:34:56,322 - network_degrade_test - test_latency -
 lineno:67]: Ping output: bash: line 1: ping: command not found

Reviewers: Luke Chen <showuon@gmail.com>
2023-04-13 09:30:56 +08:00
vamossagar12 c14f56b484
KAFKA-14586: Moving StreamResetter to tools (#13127)
Moves StreamResetter to tools project.

Reviewers: Federico Valeri <fedevaleri@gmail.com>, Christo Lolov <lolovc@amazon.com>, Bruno Cadonna <cadonna@apache.org>
2023-03-28 14:43:22 +02:00
Shay Elkin 797c28cb7c
MINOR: Rename remote_controller_quorum to isolated_controller_quorum (#13448)
Similar to https://github.com/apache/kafka/pull/13439:

ddd652c standardized on "isolated" as the name for all the isolated
modes, and renamed remote_controller_quorum to
kafkatest.services.kafka.quorum.remote_kraft to
isolated_controller_quorum. This broke
SecurityTest.test_quorum_ssl_endpoint_validation_failure, which should
be fixed by this simple rename.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-03-24 10:40:27 -07:00
Shay Elkin e07cc127e1
MINOR: Fix remote_kraft -> isolated_kraft in kafkatest (#13439)
ddd652c6 standardized on "isolated" as the name for all the isolated
modes, and renamed kafkatest.services.kafka.quorum.remote_kraft to
isolated_kraft. However, the tests using remote_kraft weren't
updated, and are broken as a result. This is a simple search and
replace to fix those.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-03-23 09:33:59 -07:00
Colin Patrick McCabe ddd652c672
MINOR: Standardize KRaft logging, thread names, and terminology (#13390)
Standardize KRaft thread names.

- Always use kebab case. That is, "my-thread-name".

- Thread prefixes are just strings, not Option[String] or Optional<String>.
  If you don't want a prefix, use the empty string.

- Thread prefixes end in a dash (except the empty prefix). Then you can
  calculate thread names as $prefix + "my-thread-name"

- Broker-only components get "broker-$id-" as a thread name prefix. For example, "broker-1-"

- Controller-only components get "controller-$id-" as a thread name prefix. For example, "controller-1-"

- Shared components get "kafka-$id-" as a thread name prefix. For example, "kafka-0-"

- Always pass a prefix to KafkaEventQueue, so that threads have names like
  "broker-0-metadata-loader-event-handler" rather than "event-handler". Prior to this PR, we had
  several threads just named "EventHandler" which was not helpful for debugging.

- QuorumController thread name is "quorum-controller-123-event-handler"

- Don't set a thread prefix for replication threads started by ReplicaManager. They run only on the
  broker, and already include the broker ID.

Standardize KRaft slf4j log prefixes.

- Names should be of the form "[ComponentName id=$id] ". So for a ControllerServer with ID 123, we
  will have "[ControllerServer id=123] "

- For the QuorumController class, use the prefix "[QuorumController id=$id] " rather than
  "[Controller <nodeId] ", to make it clearer that this is a KRaft controller.

- In BrokerLifecycleManager, add isZkBroker=true to the log prefix for the migration case.

Standardize KRaft terminology.

- All synonyms of combined mode (colocated, coresident, etc.) should be replaced by "combined"

- All synonyms of isolated mode (remote, non-colocated, distributed, etc.) should be replaced by
  "isolated".
2023-03-16 15:33:03 -07:00
Ron Dagostino e3817cac89
KAFKA-14351: Controller Mutation Quota for KRaft (#13116)
Implement KIP-599 controller mutation quotas for the KRaft controller. These quotas apply to create
topics, create partitions, and delete topic operations. They are specified in terms of number of
partitions.

The approach taken here is to reuse the ControllerMutationQuotaManager that is also used in ZK
mode. The quotas are implemented as Sensor objects and Sensor.checkQuotas enforces the quota,
whereas Sensor.record notes that new partitions have been modified. While ControllerApis handles
fetching the Sensor objects, we must make the final callback to check the quotas from within
QuorumController. The reason is because only QuorumController knows the final number of partitions
that must be modified. (As one example, up-to-date information about the number of partitions that
will be deleted when a topic is deleted is really only available in QuorumController.)

For quota enforcement, the logic is already in place. The KRaft controller is expected to set the
throttle time in the response that is embedded in EnvelopeResponse, but it does not actually apply
the throttle because there is no client connection to throttle. Instead, the broker that forwarded
the request is expected to return the throttle value from the controller and to throttle the client
connection. It also applies its own request quota, so the enforced/returned quota is the maximum of
the two.

This PR also installs a DynamicConfigPublisher in ControllerServer. This allows dynamic
configurations to be published on the controller. Previously, they could be set, but they were not
applied. Note that we still don't have a good way to set node-level configurations for isolatied
controllers. However, this will allow us to set cluster configs (aka default node configs) and have
them take effect on the controllers.

In a similar vein, this PR separates out the dynamic client quota publisher logic used on the
broker into DynamicClientQuotaPublisher. We can now install this on both BrokerServer and
ControllerServer. This makes dynamically configuring quotas (such as controller mutation quotas)
possible.

Also add a ducktape test, controller_mutation_quota_test.py.

Reviewers: David Jacot <djacot@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Colin P. McCabe <cmccabe@apache.org>
2023-03-07 11:25:34 -08:00
Federico Valeri 07e2f6cd4d
KAFKA-14578: Move ConsumerPerformance to tools (#13215)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Alexandre Dupriez <alexandre.dupriez@gmail.com>
2023-03-06 18:16:55 +01:00
vamossagar12 bb3111f472
KAFKA-14580: Moving EndToEndLatency from core to tools module (#13095)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>, Ismael Juma <mlists@juma.me.uk>
2023-03-02 12:05:22 +01:00
Jakub Scholz 56c84853ec
MINOR: Remove unused ZooKeeper log level configuration from `connect-log4j.properties` (#13216)
Signed-off-by: Jakub Scholz <www@scholzj.com>

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
2023-02-22 11:57:38 +01:00
Chia-Ping Tsai 69f0481342
MINOR: the package of JmxTool is incorrect when running quota_test.py (#13233)
Reviewers: Federico Valeri <fedevaleri@gmail.com>, David Jacot <djacot@confluent.io>
2023-02-13 01:31:55 +08:00
Federico Valeri 50e0e3c257
KAFKA-14582: Move JmxTool to tools (#13136)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-02-02 11:23:26 +01:00
José Armando García Sancio 896573f9bc
KAFKA-14279: Add 3.3.x streams system tests (#13077)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-01-09 23:37:05 -08:00
Akhilesh C db49070760
KAFKA-14493: Introduce Zk to KRaft migration state machine STUBs in KRaft controller. (#12998)
This patch introduces a preliminary state machine that can be used by KRaft
controller to drive online migration from Zk to KRaft.

MigrationState -- Defines the states we can have while migration from Zk to
KRaft.

KRaftMigrationDriver -- Defines the state transitions, and events to handle
actions like controller change, metadata change, broker change and have
interfaces through which it claims Zk controllership, performs zk writes and
sends RPCs to ZkBrokers.

MigrationClient -- Interface that defines the functions used to claim and
relinquish Zk controllership, read to and write from Zk.

Co-authored-by: David Arthur <mumrah@gmail.com>
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-01-09 10:44:11 -08:00
José Armando García Sancio f668c8e44b
KAFKA-14279; Add 3.3.x to core compatibility tests (#13076)
Now that 3.3.x exist, add system tests for upgrade, downgrade and client
compatibility.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2023-01-05 11:28:12 -08:00
Matthias J. Sax 9e71f9cc7d
MINOR: fix expected version in streams upgrade test (#13022)
Co-authored-by: Lucas Brutschy <lucasbru@users.noreply.github.com>

Reviewers: Suhas Satish <ssatish@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>,  John Roesler <john@confluent.io>
2022-12-27 10:09:27 -08:00
Simon Woodman 5f265710f1
MINOR: Fix typo (#13044)
fix of Kakfa to Kafka

Reviewers: Luke Chen <showuon@gmail.com>
2022-12-23 20:40:30 +08:00
Lucas Brutschy c8675d4723
KAFKA-14343: Upgrade tests for state updater (#12801)
A test that verifies the upgrade from a version of Streams with
state updater disabled to a version with state updater enabled
and vice versa, so that we can offer a save upgrade path.

 - upgrade test from a version of Streams with state updater
disabled to a version with state updater enabled
 - downgrade test from a version of Streams with state updater
 enabled to a version with state updater disabled

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-12-20 09:35:59 +01:00
Colin Patrick McCabe 29c09e2ca1
MINOR: ControllerServer should use the new metadata loader and snapshot generator (#12983)
This PR introduces the new metadata loader and snapshot generator. For the time being, they are
only used by the controller, but a PR for the broker will come soon.

The new metadata loader supports adding and removing publishers dynamically. (In contrast, the old
loader only supported adding a single publisher.) It also passes along more information about each
new image that is published. This information can be found in the LogDeltaManifest and
SnapshotManifest classes.

The new snapshot generator replaces the previous logic for generating snapshots in
QuorumController.java and associated classes. The new generator is intended to be shared between
the broker and the controller, so it is decoupled from both.

There are a few small changes to the old snapshot generator in this PR. Specifically, we move the
batch processing time and batch size metrics out of BrokerMetadataListener.scala and into
BrokerServerMetrics.scala.

Finally, fix a case where we are using 'is' rather than '==' for a numeric comparison in
snapshot_test.py.

Reviewers: David Arthur <mumrah@gmail.com>
2022-12-15 16:53:07 -08:00
A. Sophie Blee-Goldman c1a54671e8
MINOR: Bump trunk to 3.5.0-SNAPSHOT (#12960)
Version bumps in trunk after the creation of the 3.4 branch.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2022-12-07 18:29:20 -08:00
Bruno Cadonna 18629f6816
MINOR: Fix log message used in version probing system test (#12931)
PR #12684 introduced a better format for timestamps in log
messages. Unfortunately, we missed that one of the modified
log messages is used by a system test for validation.

This PR adapts the system test to look for the modified
log message.

Reviewers: Divij Vaidya <diviv@amazon.com>, Matthias J. Sax <mjsax@apache.org>
2022-12-05 13:15:36 +01:00
Jonathan Albrecht b56e71faee
MINOR: Update unit/integration tests to work with the IBM Semeru JDK (#12343)
The IBM Semeru JDK use the OpenJDK security providers instead of the IBM security providers so test for the OpenJDK classes first where possible and test for Semeru in the java.runtime.name system property otherwise.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-12-01 16:22:00 +01:00
Stanislav Vodetskyi b2b9ecdd61
MINOR: try-finally around super call in http.py (#12924)
Reviewers: Daniel Gospodinow <danielgospodinow@gmail.com>, Ian McDonald <imcdonald@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
2022-12-01 15:16:45 +05:30
Lucas Brutschy 4560978ed7
KAFKA-14309: FK join upgrades not tested with DEV_VERSION (#12760)
The streams upgrade system inserted FK join code for every version of the
the StreamsUpgradeTest except for the latest. Also, the original code
never switched on the `test.run_fk_join` flag for the target version of
the upgrade.

The effect was that FK join upgrades were not tested at all, since
no FK join code was executed after the bounce in the system test.

We introduce `extra_properties` in the system tests, that can be used
to pass any property to the upgrade driver, which is supposed to be
reused by system tests for switching on and off flags (e.g. for the
state restoration code).

Reviewers: Alex Sorokoumov <asorokoumov@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-11-07 15:46:51 -08:00
Jason Gustafson 150a0758cb
MINOR: Change system test console consumer default log level (#12819)
For tests which use the console consumer service, we are currently enabling TRACE logging by default. I have seen some system tests where this produces GBs of logging. A better default is probably DEBUG.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2022-11-07 13:42:36 -08:00
srishti-saraswat 57aefa9c82
MINOR: Migrate connect system tests to KRaft (#12621)
Adds the `metadata_quorum` parameter to the `@matrix(...)` annotation to many existing tests, so that they are run with both zookeeper and remote_kraft nodes.

Reviewers: Randall Hauch <rhauch@gmail.com>, Greg Harris <gharris1727@gmail.com>
2022-10-27 11:19:14 -05:00
José Armando García Sancio 5c5dcb7a96
MINOR; Use 3.3.1 release for system test (#12714)
The following files are available in https://s3-us-west-2.amazonaws.com/kafka-packages/:

kafka-streams-3.3.1-test.jar
kafka_2.12-3.3.1.tgz
kafka_2.13-3.3.1.tgz

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-10-04 16:19:24 -07:00
David Arthur c1f23b6c9a
MINOR: Fix delegation token system test (#12693)
KIP-373 added a "token requester" field to the output of kafka-delegation-tokens.sh. The system test was failing since it was not expecting this new field. This patch adds support for this field and improves the error output if we can't parse.

Reviewers: José Armando García Sancio <jsancio@apache.org>, Manikumar Reddy <manikumar.reddy@gmail.com>
2022-10-01 19:22:46 -07:00
Nikolay 51b079dca7
KAFKA-12878: Support --bootstrap-server in kafka-streams-application-reset tool (#12632)
Reviewers: Chris Egerton <chrise@aiven.io>
2022-09-19 13:20:41 -04:00
Manikumar Reddy 3e8e082fab MINOR: Bump latest 2.8 version to 2.8.2 2022-09-19 17:18:47 +05:30
Tom Bentley 352c71ffb5
MINOR: Update release versions for upgrade tests with 3.0.2, 3.1.2, 3.2.3 release (#12661)
Updates release versions in files that are used for upgrade test with the 3.0.2, 3.1.2, 3.2.3 release version.
2022-09-19 17:13:40 +05:30
Jason Gustafson 921885d31f
MINOR; Remove redundant version system test (#12612)
This patch removes test_kafka_version.py, which contains two tests at the moment. The first test verifies we can start a 0.8.2 cluster. The second verifies we can start a cluster with one node on 0.8.2 and another on the latest. These test are covered in greater depth by upgrade_test.py and downgrade_test.py.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-09-08 18:13:59 -07:00
Chris Egerton 897bf4741c
KAFKA-14143: Exactly-once source connector system tests (#11783)
Also includes a minor quality-of-life improvement to clarify why some internal REST requests to workers may fail while that worker is still starting up.

Reviewers: Tom Bentley <tbentley@redhat.com>, Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@gmail.com>, Mickael Maison <mickael.maison@gmail.com>
2022-09-08 15:13:43 -04:00
Yash Mayya 8a19f2da27
Update expected task configs for FileStream source and sink connectors in ConnectRestApiTest (#12576)
Reviewer: Chris Egerton <chrise@aiven.io>
2022-08-31 16:34:00 -04:00
Colin Patrick McCabe 28d5a05943
KAFKA-14187: kafka-features.sh: add support for --metadata (#12571)
This PR adds support to kafka-features.sh for the --metadata flag, as specified in KIP-778.  This
flag makes it possible to upgrade to a new metadata version without consulting a table mapping
version names to short integers. Change --feature to use a key=value format.

FeatureCommandTest.scala: make most tests here true unit tests (that don't start brokers) in order
to improve test run time, and allow us to test more cases. For the integration test part, test both
KRaft and ZK-based clusters. Add support for mocking feature operations in MockAdminClient.java.

upgrade.html: add a section describing how the metadata.version should be upgraded in KRaft
clusters.

Add kraft_upgrade_test.py to test upgrades between KRaft versions.

Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@gmail.com>
2022-08-30 16:56:03 -07:00
Alan Sheinberg 481fefb4f9
MINOR: Adds KRaft versions of most streams system tests (#12458)
Migrates Streams sustem tests to either use kraft brokers or to use both kraft and zk in a testing matrix.

This skips tests which use various forms of Kafka versioning since those seem to have issues with KRaft at the moment. Running these tests with KRaft will require a followup PR.

Reviewers: Guozhang Wang <guozhang@apache.org>, John Roesler <vvcephei@apache.org>
2022-08-26 16:11:19 -05:00
José Armando García Sancio 6ace67b2de
MINOR; Bump trunk to 3.4.0-SNAPSHOT (#12463)
Version bumps in trunk after the creation of the 3.3 branch.

Reviewers: David Arthur <mumrah@gmail.com>
2022-08-01 09:54:12 -07:00
Hao Li 5e4ae06d12
MINOR: fix flaky test test_standby_tasks_rebalance (#12428)
* Description
In this test, when third proc join, sometimes there are other rebalance scenarios such as followup joingroup request happens before syncgroup response was received by one of the proc and the previously assigned tasks for that proc is then lost during new joingroup request. This can result in standby tasks assigned as 3, 1, 2. This PR relax the expected assignment of 2, 2, 2 to a range of [1-3].

* Some backgroud from Guozhang:
I talked to @hao Li offline and also inspected the code a bit, and tl;dr is that I think the code logic is correct (i.e. we do not really have a bug), but we need to relax the test verification a little bit. The general idea behind the subscription info is that:

When a client joins the group, its subscription will try to encode all its current assigned active and standby tasks, which would be used as prev active and standby tasks by the assignor in order to achieve some stickiness.

When a client drops all its active/standby tasks due to errors, it does not actually report all empty from its subscription, instead it tries to check its local state directory (you can see that from TaskManager#getTaskOffsetSums which populates the taskOffsetSum. For active task, its offset would be “-2” a.k.a. LATEST_OFFSET, for standby task, its offset is an actual numerical number.

So in this case, the proc2 which drops all its active and standby tasks, would still report all tasks that have some local state still, and since it was previously owning all six tasks (three as active, and three as standby), it would report all six as standbys, and when that happens the resulted assignment as @hao Li verified, is indeed the un-even one.

So I think the actual “issue“ happens here, is when proc2 is a bit late sending the sync-group request, when the previous rebalance has already completed, and a follow-up rebalance has already triggered, in that case, the resulted un-even assignment is indeed expected. Such a scenario, though not common, is still legitimate since in practice all kinds of timing skewness across instances can happen. So I think we should just relax our verification here, i.e. just making sure that each instance has at least one standby replica at the end, not exactly evenly as “2, 2, 2”.

Reviewers: Suhas Satish <ssatish@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2022-07-21 12:12:29 -07:00
Alyssa Huang 8e9869a777
MINOR: Run MessageFormatChangeTest in ZK mode only (#12395)
KRaft mode will not support writing messages with an older message format (2.8) since the min supported IBP is 3.0 for KRaft. Testing support for reading older message formats will be covered by https://issues.apache.org/jira/browse/KAFKA-14056.

Reviewers: David Jacot <djacot@confluent.io>
2022-07-13 08:46:04 +02:00
Bruno Cadonna 4d53dd9972
KAFKA-13930: Add 3.2.0 Streams upgrade system tests (#12209)
* KAFKA-13930: Add 3.2.0 Streams upgrade system tests

Apache Kafka 3.2.0 was recently released. Now we need
to test upgrades from 3.2 to trunk in our system tests.

Reviewer: Bill Bejeck <bbejeck@apache.org>
2022-06-21 16:33:40 +02:00
Ron Dagostino b04937dc65
MINOR: Fix force kill of KRaft colocated controllers in system tests (#11238)
I noticed that a system test using a KRaft cluster with 3 brokers but only 1 co-located controller did not force-kill the second and third broker after shutting down the first broker (the one with the controller).  The issue was a floating point rounding error.  This patch adjusts for the rounding error and also makes the logic work for an even number of controllers.  A local run of `tests/kafkatest/sanity_checks/test_bounce.py` succeeded (and I manually increased the cluster size for the 1 co-located controller case and observed the correct kill behavior: the second and third brokers were force-killed as expected).

Reviewers: Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@users.noreply.github.com>, David Jacot <djacot@confluent.io>
2022-06-15 16:45:00 +02:00
Aneesh Garg 47bb93cfd7
MINOR: Replace ACL_AUTHORIZER attribute with ZK_ACL_AUTHORIZER (#12247)
Replace ACL_AUTHORIZER attribute with ZK_ACL_AUTHORIZER in system tests. Required after the changes merged with https://github.com/apache/kafka/pull/12190.

Reviewers: David Jacot <djacot@confluent.io>
2022-06-03 17:50:49 +02:00
Bruno Cadonna 5424324722
KAFKA-13930: Add 3.2.0 to core upgrade and compatibility system tests (#12210)
Apache Kafka 3.2.0 was recently released. Now we need
to test upgrades and compatibility with 3.2 in core system tests.

Reviewer: Jason Gustafson <jason@confluent.io>
2022-06-03 09:13:10 +02:00
Bruno Cadonna 0aea498b9a
MINOR: Pin ducktape version to < 0.9 (#12242)
With newer ducktape versions than < 0.9 system tests
may run into authentication issues with the AK system test
infrastructure.

The version will be bumped up once we have infrastructure
in place for newer paramiko versions brought in by ducktape
0.9.

Reviewers: Lucas Bradstreet <lucas@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Kvicii <Karonazaba@gmail.com>
2022-06-02 20:21:23 +02:00
Jason Gustafson f980820e2b
MINOR: Send kraft raft/controller logs to controller log in systests (#12222)
Currently the only place we see controller/raft logging in system tests is `server-start-stdout-stderr.log` where they are mixed with all other logs. It is more convenient to send them to `controller.log` as we do for zk tests.

Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, David Jacot <djacot@confluent.io>
2022-05-30 09:21:41 -07:00
Jason Gustafson 02fc6e7d3c
MINOR: Collect metadata log dir in kraft system tests (#12215)
It is useful to collect the directory for `__cluster_metadata` in system tests. We use a separate directory from user partitions, so it must be configured separately. 

Reviewers: David Arthur <mumrah@gmail.com>
2022-05-25 17:36:58 -07:00
Lucas Bradstreet 46630a0610
MINOR: fix number of nodes used in test_compatible_brokers_eos_v2_enabled (#12211)
Reviewers: David Jacot <djacot@confluent.io>
2022-05-25 20:03:06 +02:00
Lucas Bradstreet f7502f430a
MINOR: fix Connect system test runs with JDK 10+ (#12202)
When running our Connect system tests with JDK 10+, we hit the error 
    AttributeError: 'ClusterNode' object has no attribute 'version'
because util.py attempts to check the version variable for non-Kafka service objects.

Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>
2022-05-25 10:25:00 -07:00
Jason Gustafson b5699b5ccd
KAFKA-13923; Generalize authorizer system test for kraft (#12190)
Change `ZookeeperAuthorizerTest` to `AuthorizerTest` and add support for KRaft's `StandardAuthorizer` implementation.

Reviewers: David Jacot <djacot@confluent.io>
2022-05-23 09:47:14 -07:00
Alex Sorokoumov 78dd40123c
MINOR: Add upgrade tests for FK joins (#12122)
Follow up PR for KAFKA-13769.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-05-13 17:21:27 -07:00
Tom Bentley 467bce04ae
MINOR: Update release versions for upgrade tests with 3.1.1 release (#12156)
Updates release versions in files that are used for upgrade test with the 3.1.1 release version.

Reviewers: Bruno Cadonna <bruno@confluent.io>
2022-05-13 09:32:41 +01:00
Bruno Cadonna 020ff2fe0e
MINOR: Update release versions for upgrade tests with 3.2.0 release (#12143)
Updates release versions in files that are used for upgrade test with the 3.2.0 release version.  

Reviewer: David Jacot <djacot@confluent.io>
2022-05-10 14:47:46 +02:00
Jason Gustafson f0a09ea003
MINOR: Fix event output inconsistencies in TransactionalMessageCopier (#12098)
This patch fixes some strangeness and inconsistency in the messages written by `TransactionalMessageCopier` to stdout. Here is a sample of two messages.

Progress message:
```
{"consumed":33000,"stage":"ProcessLoop","totalProcessed":33000,"progress":"copier-0","time":"2022/04/24 05:40:31:649","remaining":333}
```
The `transactionalId` is set to the value of the `progress` key.

And a shutdown message:
```
{"consumed":33333,"shutdown_complete":"copier-0","totalProcessed":33333,"time":"2022/04/24 05:40:31:937","remaining":0}
```
The `transactionalId` this time is set to the `shutdown_complete` key and there is no `stage` key.

In this patch, we change the following:

1. Use a separate key for the `transactionalId`.
2. Drop the `progress` and `shutdown_complete` keys.
3. Use `stage=ShutdownComplete` in the shutdown message.
4. Modify `transactional_message_copier.py` system test service accordingly.

Reviewers: David Arthur <mumrah@gmail.com>
2022-04-29 10:02:25 -07:00
Luke Chen f28a2ee918
MINOR: revert back to 60s session timeout for static membership test (#11881)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-04-21 11:51:31 -07:00
David Jacot 6d36487b68
MINOR: Fix TestDowngrade.test_upgrade_and_downgrade (#12027)
The second validation does not verify the second bounce because the verified producer and the verified consumer are stopped in `self.run_validation`. This means that the second `run_validation` just spit out the same information as the first one. Instead, we should just run the validation at the end.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-04-18 14:22:33 -07:00
Konstantine Karantasis dd62ef2eda
KAFKA-13748: Do not include file stream connectors in Connect's CLASSPATH and plugin.path by default (#11908)
With this change we stop including the non-production grade connectors that are meant to be used for demos and quick starts by default in the CLASSPATH and plugin.path of Connect deployments. The package of these connector will still be shipped with the Apache Kafka distribution and will be available for explicit inclusion. 

The changes have been tested through the system tests and the existing unit and integration tests. 

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Randall Hauch <rhauch@gmail.com>
2022-03-30 13:15:42 -07:00
Bruno Cadonna 4c8685e701
MINOR: Bump trunk to 3.3.0-SNAPSHOT (#11925)
Version bumps on trunk following the creation of the 3.2 release branch.

Reviewer: David Jacot <djacot@confluent.io>
2022-03-21 21:37:05 +01:00
Justine Olshan 7afdb069bf
KAFKA-13750; Client Compatability KafkaTest uses invalid idempotency configs (#11909)
Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2022-03-17 18:00:27 +01:00
Mickael Maison 1783fb14df
MINOR: Bump latest 3.0 version to 3.0.1 (#11885)
Reviewers: Matthias J. Sax <mjsax@apache.org>
2022-03-16 11:43:37 +01:00
Stanislav Vodetskyi 7e683852b4
MINOR: unpin ducktape dependency to always use the newest version (py3 edition) (#11884)
Ensures we always have the latest published ducktape version.
This way whenever we release a new one, we won't have to cherry pick a bunch of commits across a bunch of branches.
2022-03-11 17:48:19 +05:30
Levani Kokhreidze 87eb0cf03c
KAFKA-6718: Update SubscriptionInfoData with clientTags (#10802)
adds ClientTags to SubscriptionInfoData

Reviewer: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-03-11 16:29:05 +08:00
Kamal Chandraprakash 496aa1f84b
MINOR: Provide valid examples in README page. (#10259)
* MINOR: Provide valid examples in README page.

- `testMetadataUpdateWaitTime` method is removed from MetadataTest class.
-  Removed the travis CI documentation.

Reviewers: Luke Chen <showuon@gmail.com>
2022-02-21 14:48:24 +08:00
Michal T 6d7e6d6f87
MINOR: Install missing 'tc' utility - iproute2 for systemtests (#11764)
Signed-off-by: Michal T <mtoth@redhat.com>

Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-02-16 12:56:06 +01:00
Michal T 44fcba980f
MINOR: Fix typo in system tests Dockerfile (#11740)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-02-08 18:03:57 +01:00
David Jacot 7215c90c5e
MINOR: Add 3.0 and 3.1 to streams system tests (#11716)
Reviewers: Bill Bejeck <bill@confluent.io>
2022-01-28 10:06:31 +01:00
David Jacot 110fae2f59
MINOR: Add 3.0 and 3.1 to broker and client compatibility tests (#11701)
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2022-01-25 16:22:48 +01:00
David Jacot 34208e8429
MINOR: Update files with 3.1.0 (#11698)
Reviewers: Bill Bejeck <bbejeck@apache.org>
2022-01-21 21:30:56 +01:00
Ron Dagostino 1785e1223e
KAFKA-13582: TestVerifiableProducer.test_multiple_kraft_security_protocols fails (#11664)
KRaft brokers always use the first controller listener, so if there is not also a colocated KRaft controller on the node be sure to only publish one controller listener in `controller.listener.names` even when the inter-controller listener name differs.  System tests were failing due to unnecessarily publishing a second entry in `controller.listener.names` for a broker-only config and not also publishing a mapping for it in `listener.security.protocol.map`.  Removing the unnecessary entry in `controller.listener.names` solves the problem.

Reviewers: David Jacot <djacot@confluent.io>
2022-01-10 20:54:26 +01:00
Chia-Ping Tsai b6e7f6a4df
MINOR: replace Thread.isAlive by Thread.is_alive for Python code (#11545)
Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2021-11-29 18:49:14 +08:00
Bruno Cadonna 4fed0001ec
MINOR: Fix system test StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance (#11532)
Log messages were changed in the AssignorConfiguration (#11490) that are
also used for verification in system test
StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance.

This commit fixes the test and adds comments to the log messages
that point to the test that needs to be updated in case of
changes to the log messages.

Reviewers: John Roesler <vvcephei@apache.org>, Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2021-11-25 10:48:09 +01:00
David Jacot 3aef0a5ceb
MINOR: Bump trunk to 3.2.0-SNAPSHOT (#11458)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
2021-11-02 13:38:54 +01:00
David Jacot 38a3ddb562
MINOR: Add a replication system test which simulates a slow replica (#11395)
This patch adds a new system test which exercises the shrining/expansion process of the partition leader. It does so by introducing a network partition which isolates a broker from the other brokers in the cluster but not from KRaft Controller/ZK.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-10-20 08:19:36 +02:00
Luke Chen 1af1c80e2d
MINOR: replace deprecated exactly_once_beta into exactly_once_v2 (#10884)
Replace deprecated exactly_once_beta with exactly_once_v2 in system tests.

Follow up for #10870, found out there are still some system tests using the deprecated exactly_once_beta. This PR updates them.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-09-27 17:02:48 +02:00
David Jacot f650a14d56
KAFKA-13312; 'NetworkDegradeTest#test_rate' should wait until iperf server is listening (#11344)
Reviewers: Jason Gustafson <jason@confluent.io>
2021-09-21 10:26:46 +02:00
David Jacot 493280735b
MINOR: Bump latest 2.8 version to 2.8.1 (#11341)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2021-09-20 09:23:15 +02:00
Jason Gustafson 25b0857bdb
KAFKA-13234; Transaction system test should clear URPs after broker restarts (#11267)
Clearing under-replicated-partitions helps ensure that partitions do not become unavailable longer than necessary as brokers are rolled. This prevents flakiness due to transaction timeouts.

Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2021-09-01 08:37:05 -07:00
David Jacot c4e1e23857
KAFKA-13231; `TransactionalMessageCopier.start_node` should wait until the process if fully started (#11264)
This patch ensures that the transaction message copier is fully started in `start_node`. Without this, it is possible that `stop_node` is called before the process is started which results in not stopping it at all.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-08-27 08:28:14 +02:00
John Roesler 45ecaa19f8
MINOR: Set session timeout back to 10s for Streams system tests (#11236)
We increased the default session timeout to 30s in KIP-735:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout

Since then, we are observing sporadic system test failures
due to rebalances taking longer than the test timeout.
Rather than increase the test wait times, we can just override
the session timeout to a value more appropriate in the testing
domain.

Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>
2021-08-20 11:27:54 -05:00
Zara Lim 9bc45d4e03
MINOR: Increase the Kafka shutdown timeout to 120 (#11183)
The streams static membership test has failed several times due to hitting the Kafka shutdown timeout, but the logs were showing that the shutdown did actually succeed after the 60 second timeout.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Walker Carlson <wcarlson@confluent.io>
2021-08-05 15:26:10 -07:00
Kamal Chandraprakash a103c95a31
KAFKA-12724: Add 2.8.0 to system tests and streams upgrade tests. (#10602)
Also adjusted the acceptable recovery lag to stabilize Streams tests.

Reviewers: Justine Olshan <jolshan@confluent.io>, Matthias J. Sax <mjsax@apache.org>, John Roesler <vvcephei@apache.org>
2021-08-04 17:31:10 -05:00
Matthias J. Sax a7d9a8ac36
MINOR: Remove older brokers from upgrade test (#11117)
As of version 2.2.1 , Kafka Streams uses message headers and
thus requires broker version 0.11.0 or newer.

Reviewers: John Roesler <john@confluent.io>, Ismael Juma <ismael@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-07-26 14:09:47 -07:00
Cheng Tan 8ed271e1fd
KAFKA-13026: Idempotent producer (KAFKA-10619) follow-up testings (#11002)
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2021-07-26 21:45:59 +01:00
Niket dc512cc038
KAFKA-13015: Ducktape System Tests for Metadata Snapshots (#11053)
This PR implements system tests in ducktape to test the ability of brokers and controllers to generate
and consume snapshots and catch up with the metadata log.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@gmail.com>
2021-07-23 16:28:21 -07:00
Ryan Dielhenn 04fd555475
MINOR: Enable KRaft in transactions_test.py #11121
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-07-23 16:01:54 -07:00
Ismael Juma f34bb28ab6
KAFKA-13116: Fix message_format_change_test and compatibility_test_new_broker_test failures (#11108)
These failures were caused by a46b82bea9. Details for each test:

* message_format_change_test: use IBP 2.8 so that we can write in older message
formats.
* compatibility_test_new_broker_test_failures: fix down-conversion path to handle
empty record batches correctly. The record scan in the old code ensured that
empty record batches were never down-converted, which hid this bug.
* upgrade_test: set the IBP 2.8 when message format is < 0.11 to ensure we are
actually writing with the old message format even though the test was passing
without the change.

Verified with ducker that some variants of these tests failed without these changes
and passed with them. Also added a unit test for the down-conversion bug fix.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-07-23 13:43:31 -07:00
Luke Chen f959e6c583
KAFKA-13129: replace describe topic via zk with describe users (#11115)
Replace the unsupported describe topic via zk with describe users to fix the system tests.
For the upgrade_test case where TLS support is not required, use list_acls instead.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-07-23 05:33:43 -07:00
Bruno Cadonna 9b3687e0ac
HOTFIX: Modify system test config to reduce time to stable task assignment. (#11090)
Currently, we verify the startup of a Streams client by checking the transition
from REBALANCING to RUNNING and if the client processed some records
in the EOS system test. However, if the Streams client only
has standby tasks assigned as it can happen if the client is catching 
up by using warm-up replicas, the client will never process
records within the timeout of the startup verification. Hence, the test 
will fail although everything is fine. This commit fixes this by reducing
the time to the next probing rebalance and by increasing the number of 
max warm-up replicas. In such a way, the catch up of the client and the 
following processing of records should still be within the startup verification 
timeout of the client.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-21 07:58:14 +02:00
Ron Dagostino 1e78dcda69
MINOR: Fix ZooKeeperAuthorizerTest for KRaft (#11095)
This patch fixes the ZooKeeperAuthorizerTest for KRaft. The system test was not
configuring/reconfiguring/restarting the remote controller quorum with the correct security settings.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-07-20 16:35:14 -07:00
Colin Patrick McCabe bfc57aa4dd
MINOR: enable reassign_partitions_test.py for kraft (#11064)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-07-19 09:08:55 -07:00
CHUN-HAO TANG 98bd590718
MINOR: Replace unused variable with underscore (#11037)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-07-17 16:36:52 +08:00
Ron Dagostino 762d11c13f
MINOR: ducktape should start brokers in parallel and support co-located kraft
This patch adds a sanity-check bounce system test for the case where we have 3
co-located KRaft controllers and fixes the system test code so that this case
will pass by starting brokers in parallel by default instead of serially. We
now also send SIGKILL to any running KRaft broker or controller nodes for the
co-located case when a majority of co-located controllers have been stopped --
otherwise they do not shutdown, and we spin for the 60 second timeout. Finally,
this patch adds the ability to specify that certain brokers should not be
started when starting the cluster, and then we can start those nodes at a later
time via the add_broker() method call; this is going to be helpful for KRaft
snapshot system testing.

We were not testing the 3 co-located KRaft controller case previously, and it
would not pass because the first Kafka node would never be considered started.
We were starting the Kafka nodes serially, and we decide that a node has
successfully started when it logs a particular message. This message is not
logged until the broker has identified the controller (i.e. the leader of the
KRaft quorum). There cannot be a leader until a majority of the KRaft quorum
has started, so with 3 co-located controllers the first node could never be
considered "started" by the system test.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-07-16 16:28:09 -07:00
Bruno Cadonna 332db13047
HOTFIX: Fix verification of version probing (#10943)
Fixes and improves version probing in system test test_version_probing_upgrade().
2021-07-12 18:50:25 +02:00
Colin Patrick McCabe 5a88a59ddd
MINOR: Hint about "docker system prune" when ducker-ak build fails (#10995)
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Jason Gustafson <jason@confluent.io>
2021-07-08 09:58:46 -07:00
Stanislav Vodetskyi 058589b03d
KAFKA-13041: Enable connecting VS Code remote debugger (#10915)
The changes in this PR enable connecting VS Code's remote debugger to a system test running locally with ducker-ak.
Changes include:
- added zip_safe=False to setup.py - this enables installing kafkatest module together with source code when running `python setup.py  develop/install`.
- install [debugpy](https://github.com/microsoft/debugpy) on ducker nodes
- expose 5678 (default debugpy port) on ducker01 node - ducker01 is the one that actually executes tests, so that's where you'd connect to.
- added `-d|--debug` option to `ducker-ak test` command - if used, tests will run via `python3.7 -m debugpy` command, which would listen on 5678 and pause until debugger is connected.
- changed the logic of the `ducker-ak test` command so that ducktape args are collected separately after `--` - otherwise any argument we add to the `test` command in the future might potentially
shadow a similar ducktape argument. 
	- we don't really check that `ducktape_args` are args while `test_name_args` are actual test names, so the difference between the two is minimal actually - most importantly we do check that `test_name_args` is not empty, but we are ok if `ducktape_args` is.

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
2021-07-08 20:35:14 +05:30
Konstantine Karantasis d2a05d71c0
Bump trunk to 3.1.0-SNAPSHOT (#10981)
Typical version bumps on trunk following the creation of the 3.0 release branch.

Reviewer: Randall Hauch <rhauch@gmail.com>
2021-07-06 14:28:13 -07:00
kpatelatwork 527ba111c7
KAFKA-4793: Connect API to restart connector and tasks (KIP-745) (#10822)
Implements KIP-745 https://cwiki.apache.org/confluence/display/KAFKA/KIP-745%3A+Connect+API+to+restart+connector+and+tasks to change connector REST API to restart a connector and its tasks as a whole.

Testing strategy 
- [x]  Unit tests added for all possible combinations of onlyFailed and includeTasks
- [x]  Integration tests added for all possible combinations of onlyFailed and includeTasks
- [x]  System tests for happy path 

Reviewers: Randall Hauch <rhauch@gmail.com>, Diego Erdody <erdody@gmail.com>, Konstantine Karantasis <k.karantasis@gmail.com>
2021-06-30 21:13:07 -07:00
Ron Dagostino 4f5b4c868e
KAFKA-12756: Update ZooKeeper to v3.6.3 (#10918)
Update the ZooKeeper version to v3.6.3. This requires adding dropwizard
as a new dependency.

Also, add Kafka v2.8.0 to the ducktape system test image.

Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
2021-06-30 11:21:33 -07:00