Commit Graph

933 Commits

Author SHA1 Message Date
Federico Valeri bb677c4959
KAFKA-14583: Move ReplicaVerificationTool to tools (#14059)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-07-26 12:04:34 +02:00
Greg Harris 125dbb9286
KAFKA-14760: Move ThroughputThrottler from tools to clients, remove tools dependency from connect-runtime (#13313)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2023-07-20 12:58:48 -07:00
Mickael Maison b584e91036
KAFKA-15093: Add 3.4.0 and 3.5.0 to core upgrade and compatibility system tests (#13859)
Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov  <christololov@gmail.com>
2023-07-12 10:36:57 +02:00
Mickael Maison 354db26b95
MINOR: Add 3.5.0 and 3.4.1 to system tests (#13849)
Reviewers: Luke Chen <showuon@gmail.com>
2023-07-12 10:11:44 +02:00
Yi-Sheng Lien b8f3776f24
KAFKA-15155: Follow PEP 8 best practice in Python to check if a container is empty (#13974)
Reviewers: Divij Vaidya <diviv@amazon.com>
2023-07-11 11:01:50 +02:00
hudeqi 8be601d051
MINOR: Move TROGDOR.md to trogdor module (#13979)
Reviewers: Divij Vaidya <diviv@amazon.com>

---------

Co-authored-by: Deqi Hu <deqi.hu@shopee.com>
2023-07-10 18:11:21 +02:00
DL1231 4149e31cad
KAFKA-15153: Use Python 'is' instead of '==' to compare for None (#13964)
Reviewers: Divij Vaidya <diviv@amazon.com>

Co-authored-by: d00791190 <dinglan6@huawei.com>
2023-07-06 16:59:13 +02:00
Manyanda Chitimbo f32ebeab17
MINOR: Bump requests (python package) from 2.24.0 to 2.31.0 in /tests (#13903)
Update "requests" lib used in system tests to version "2.31.0" to fix CVE-2023-32681: Unintended leak of Proxy-Authorization header in requests

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-06-27 21:17:22 +02:00
David Arthur d27ba5bfba
KAFKA-15010 ZK migration failover support (#13758)
This patch adds snapshot reconciliation during ZK to KRaft migration. This reconciliation happens whenever a snapshot is loaded by KRaft, or during a controller failover. Prior to this patch, it was possible to miss metadata updates coming from KRaft when dual-writing to ZK.

Internally this adds a new state SYNC_KRAFT_TO_ZK to the KRaftMigrationDriver state machine. The controller passes through this state after the initial ZK migration and each time a controller becomes active. 

Logging during dual-write was enhanced to include a count of write operations happening.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-06-01 10:25:46 -04:00
Matthias J. Sax b40a7fc037
HOTFIX: fix broken Streams upgrade system test (#13654)
Reviewers: Victoria Xia <victoria.xia@confluent.io>, John Roesler <john@confluent.io>
2023-05-08 14:24:11 -07:00
David Arthur 0822ce0ed1
KAFKA-14840: Support for snapshots during ZK migration (#13461)
This patch adds support for handling metadata snapshots while in dual-write mode. Prior to this change, if the active
controller loaded a snapshot, it would get out of sync with the ZK state.

In order to reconcile the snapshot state with ZK, several methods were added to scan through the metadata in ZK to
compute differences with the MetadataImage. Since this introduced a lot of code, I opted to split out a lot of methods
from ZkMigrationClient into their own client interfaces, such as TopicMigrationClient, ConfigMigrationClient, and
AclMigrationClient. Each of these has some iterator method that lets the caller examine the ZK state in a single pass
and without using too much memory.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Luke Chen <showuon@gmail.com>
2023-05-05 01:35:26 -07:00
David Arthur c1b5c75d92
KAFKA-14805 KRaft controller supports pre-migration mode (#13407)
This patch adds the concept of pre-migration mode to the KRaft controller. While in this mode, 
the controller will only allow certain write operations. The purpose of this is to disallow metadata 
changes when the controller is waiting for the ZK migration records to be committed.

The following ControllerWriteEvent operations are permitted in pre-migration mode

* completeActivation
* maybeFenceReplicas
* writeNoOpRecord
* processBrokerHeartbeat
* registerBroker (only for migrating ZK brokers)
* unregisterBroker

Raft events and other controller events do not follow the same code path as ControllerWriteEvent, 
so they are not affected by this new behavior.

This patch also add a new metric as defined in KIP-868: kafka.controller:type=KafkaController,name=ZkMigrationState

In order to support upgrades from 3.4.0, this patch also redefines the enum value of value 1 to mean 
MIGRATION rather than PRE_MIGRATION.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2023-04-26 10:20:30 -04:00
Chia-Ping Tsai 2271e748a1
MINOR: fix zookeeper_migration_test.py (#13620)
Reviewers: Mickael Maison <mimaison@users.noreply.github.com>
2023-04-24 17:21:19 +08:00
Mickael Maison dc1ede8d89
MINOR: Bump trunk to 3.6.0-SNAPSHOT (#13570)
Reviewers: David Jacot <djacot@confluent.io>
2023-04-14 14:17:07 +02:00
Gantigmaa Selenge 951894d2ff
MINOR: Install missing iputils-ping for system tests (#13500)
Some system tests from kafkatest.tests.core.network_degrade_test are failing due to missing utility iputils-ping.

[DEBUG - 2023-02-04 01:34:56,322 - network_degrade_test - test_latency -
 lineno:67]: Ping output: bash: line 1: ping: command not found

Reviewers: Luke Chen <showuon@gmail.com>
2023-04-13 09:30:56 +08:00
vamossagar12 c14f56b484
KAFKA-14586: Moving StreamResetter to tools (#13127)
Moves StreamResetter to tools project.

Reviewers: Federico Valeri <fedevaleri@gmail.com>, Christo Lolov <lolovc@amazon.com>, Bruno Cadonna <cadonna@apache.org>
2023-03-28 14:43:22 +02:00
Shay Elkin 797c28cb7c
MINOR: Rename remote_controller_quorum to isolated_controller_quorum (#13448)
Similar to https://github.com/apache/kafka/pull/13439:

ddd652c standardized on "isolated" as the name for all the isolated
modes, and renamed remote_controller_quorum to
kafkatest.services.kafka.quorum.remote_kraft to
isolated_controller_quorum. This broke
SecurityTest.test_quorum_ssl_endpoint_validation_failure, which should
be fixed by this simple rename.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-03-24 10:40:27 -07:00
Shay Elkin e07cc127e1
MINOR: Fix remote_kraft -> isolated_kraft in kafkatest (#13439)
ddd652c6 standardized on "isolated" as the name for all the isolated
modes, and renamed kafkatest.services.kafka.quorum.remote_kraft to
isolated_kraft. However, the tests using remote_kraft weren't
updated, and are broken as a result. This is a simple search and
replace to fix those.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-03-23 09:33:59 -07:00
Colin Patrick McCabe ddd652c672
MINOR: Standardize KRaft logging, thread names, and terminology (#13390)
Standardize KRaft thread names.

- Always use kebab case. That is, "my-thread-name".

- Thread prefixes are just strings, not Option[String] or Optional<String>.
  If you don't want a prefix, use the empty string.

- Thread prefixes end in a dash (except the empty prefix). Then you can
  calculate thread names as $prefix + "my-thread-name"

- Broker-only components get "broker-$id-" as a thread name prefix. For example, "broker-1-"

- Controller-only components get "controller-$id-" as a thread name prefix. For example, "controller-1-"

- Shared components get "kafka-$id-" as a thread name prefix. For example, "kafka-0-"

- Always pass a prefix to KafkaEventQueue, so that threads have names like
  "broker-0-metadata-loader-event-handler" rather than "event-handler". Prior to this PR, we had
  several threads just named "EventHandler" which was not helpful for debugging.

- QuorumController thread name is "quorum-controller-123-event-handler"

- Don't set a thread prefix for replication threads started by ReplicaManager. They run only on the
  broker, and already include the broker ID.

Standardize KRaft slf4j log prefixes.

- Names should be of the form "[ComponentName id=$id] ". So for a ControllerServer with ID 123, we
  will have "[ControllerServer id=123] "

- For the QuorumController class, use the prefix "[QuorumController id=$id] " rather than
  "[Controller <nodeId] ", to make it clearer that this is a KRaft controller.

- In BrokerLifecycleManager, add isZkBroker=true to the log prefix for the migration case.

Standardize KRaft terminology.

- All synonyms of combined mode (colocated, coresident, etc.) should be replaced by "combined"

- All synonyms of isolated mode (remote, non-colocated, distributed, etc.) should be replaced by
  "isolated".
2023-03-16 15:33:03 -07:00
Ron Dagostino e3817cac89
KAFKA-14351: Controller Mutation Quota for KRaft (#13116)
Implement KIP-599 controller mutation quotas for the KRaft controller. These quotas apply to create
topics, create partitions, and delete topic operations. They are specified in terms of number of
partitions.

The approach taken here is to reuse the ControllerMutationQuotaManager that is also used in ZK
mode. The quotas are implemented as Sensor objects and Sensor.checkQuotas enforces the quota,
whereas Sensor.record notes that new partitions have been modified. While ControllerApis handles
fetching the Sensor objects, we must make the final callback to check the quotas from within
QuorumController. The reason is because only QuorumController knows the final number of partitions
that must be modified. (As one example, up-to-date information about the number of partitions that
will be deleted when a topic is deleted is really only available in QuorumController.)

For quota enforcement, the logic is already in place. The KRaft controller is expected to set the
throttle time in the response that is embedded in EnvelopeResponse, but it does not actually apply
the throttle because there is no client connection to throttle. Instead, the broker that forwarded
the request is expected to return the throttle value from the controller and to throttle the client
connection. It also applies its own request quota, so the enforced/returned quota is the maximum of
the two.

This PR also installs a DynamicConfigPublisher in ControllerServer. This allows dynamic
configurations to be published on the controller. Previously, they could be set, but they were not
applied. Note that we still don't have a good way to set node-level configurations for isolatied
controllers. However, this will allow us to set cluster configs (aka default node configs) and have
them take effect on the controllers.

In a similar vein, this PR separates out the dynamic client quota publisher logic used on the
broker into DynamicClientQuotaPublisher. We can now install this on both BrokerServer and
ControllerServer. This makes dynamically configuring quotas (such as controller mutation quotas)
possible.

Also add a ducktape test, controller_mutation_quota_test.py.

Reviewers: David Jacot <djacot@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Colin P. McCabe <cmccabe@apache.org>
2023-03-07 11:25:34 -08:00
Federico Valeri 07e2f6cd4d
KAFKA-14578: Move ConsumerPerformance to tools (#13215)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Alexandre Dupriez <alexandre.dupriez@gmail.com>
2023-03-06 18:16:55 +01:00
vamossagar12 bb3111f472
KAFKA-14580: Moving EndToEndLatency from core to tools module (#13095)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>, Ismael Juma <mlists@juma.me.uk>
2023-03-02 12:05:22 +01:00
Jakub Scholz 56c84853ec
MINOR: Remove unused ZooKeeper log level configuration from `connect-log4j.properties` (#13216)
Signed-off-by: Jakub Scholz <www@scholzj.com>

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
2023-02-22 11:57:38 +01:00
Chia-Ping Tsai 69f0481342
MINOR: the package of JmxTool is incorrect when running quota_test.py (#13233)
Reviewers: Federico Valeri <fedevaleri@gmail.com>, David Jacot <djacot@confluent.io>
2023-02-13 01:31:55 +08:00
Federico Valeri 50e0e3c257
KAFKA-14582: Move JmxTool to tools (#13136)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-02-02 11:23:26 +01:00
José Armando García Sancio 896573f9bc
KAFKA-14279: Add 3.3.x streams system tests (#13077)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-01-09 23:37:05 -08:00
Akhilesh C db49070760
KAFKA-14493: Introduce Zk to KRaft migration state machine STUBs in KRaft controller. (#12998)
This patch introduces a preliminary state machine that can be used by KRaft
controller to drive online migration from Zk to KRaft.

MigrationState -- Defines the states we can have while migration from Zk to
KRaft.

KRaftMigrationDriver -- Defines the state transitions, and events to handle
actions like controller change, metadata change, broker change and have
interfaces through which it claims Zk controllership, performs zk writes and
sends RPCs to ZkBrokers.

MigrationClient -- Interface that defines the functions used to claim and
relinquish Zk controllership, read to and write from Zk.

Co-authored-by: David Arthur <mumrah@gmail.com>
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-01-09 10:44:11 -08:00
José Armando García Sancio f668c8e44b
KAFKA-14279; Add 3.3.x to core compatibility tests (#13076)
Now that 3.3.x exist, add system tests for upgrade, downgrade and client
compatibility.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2023-01-05 11:28:12 -08:00
Matthias J. Sax 9e71f9cc7d
MINOR: fix expected version in streams upgrade test (#13022)
Co-authored-by: Lucas Brutschy <lucasbru@users.noreply.github.com>

Reviewers: Suhas Satish <ssatish@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>,  John Roesler <john@confluent.io>
2022-12-27 10:09:27 -08:00
Simon Woodman 5f265710f1
MINOR: Fix typo (#13044)
fix of Kakfa to Kafka

Reviewers: Luke Chen <showuon@gmail.com>
2022-12-23 20:40:30 +08:00
Lucas Brutschy c8675d4723
KAFKA-14343: Upgrade tests for state updater (#12801)
A test that verifies the upgrade from a version of Streams with
state updater disabled to a version with state updater enabled
and vice versa, so that we can offer a save upgrade path.

 - upgrade test from a version of Streams with state updater
disabled to a version with state updater enabled
 - downgrade test from a version of Streams with state updater
 enabled to a version with state updater disabled

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-12-20 09:35:59 +01:00
Colin Patrick McCabe 29c09e2ca1
MINOR: ControllerServer should use the new metadata loader and snapshot generator (#12983)
This PR introduces the new metadata loader and snapshot generator. For the time being, they are
only used by the controller, but a PR for the broker will come soon.

The new metadata loader supports adding and removing publishers dynamically. (In contrast, the old
loader only supported adding a single publisher.) It also passes along more information about each
new image that is published. This information can be found in the LogDeltaManifest and
SnapshotManifest classes.

The new snapshot generator replaces the previous logic for generating snapshots in
QuorumController.java and associated classes. The new generator is intended to be shared between
the broker and the controller, so it is decoupled from both.

There are a few small changes to the old snapshot generator in this PR. Specifically, we move the
batch processing time and batch size metrics out of BrokerMetadataListener.scala and into
BrokerServerMetrics.scala.

Finally, fix a case where we are using 'is' rather than '==' for a numeric comparison in
snapshot_test.py.

Reviewers: David Arthur <mumrah@gmail.com>
2022-12-15 16:53:07 -08:00
A. Sophie Blee-Goldman c1a54671e8
MINOR: Bump trunk to 3.5.0-SNAPSHOT (#12960)
Version bumps in trunk after the creation of the 3.4 branch.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2022-12-07 18:29:20 -08:00
Bruno Cadonna 18629f6816
MINOR: Fix log message used in version probing system test (#12931)
PR #12684 introduced a better format for timestamps in log
messages. Unfortunately, we missed that one of the modified
log messages is used by a system test for validation.

This PR adapts the system test to look for the modified
log message.

Reviewers: Divij Vaidya <diviv@amazon.com>, Matthias J. Sax <mjsax@apache.org>
2022-12-05 13:15:36 +01:00
Jonathan Albrecht b56e71faee
MINOR: Update unit/integration tests to work with the IBM Semeru JDK (#12343)
The IBM Semeru JDK use the OpenJDK security providers instead of the IBM security providers so test for the OpenJDK classes first where possible and test for Semeru in the java.runtime.name system property otherwise.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-12-01 16:22:00 +01:00
Stanislav Vodetskyi b2b9ecdd61
MINOR: try-finally around super call in http.py (#12924)
Reviewers: Daniel Gospodinow <danielgospodinow@gmail.com>, Ian McDonald <imcdonald@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
2022-12-01 15:16:45 +05:30
Lucas Brutschy 4560978ed7
KAFKA-14309: FK join upgrades not tested with DEV_VERSION (#12760)
The streams upgrade system inserted FK join code for every version of the
the StreamsUpgradeTest except for the latest. Also, the original code
never switched on the `test.run_fk_join` flag for the target version of
the upgrade.

The effect was that FK join upgrades were not tested at all, since
no FK join code was executed after the bounce in the system test.

We introduce `extra_properties` in the system tests, that can be used
to pass any property to the upgrade driver, which is supposed to be
reused by system tests for switching on and off flags (e.g. for the
state restoration code).

Reviewers: Alex Sorokoumov <asorokoumov@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-11-07 15:46:51 -08:00
Jason Gustafson 150a0758cb
MINOR: Change system test console consumer default log level (#12819)
For tests which use the console consumer service, we are currently enabling TRACE logging by default. I have seen some system tests where this produces GBs of logging. A better default is probably DEBUG.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2022-11-07 13:42:36 -08:00
srishti-saraswat 57aefa9c82
MINOR: Migrate connect system tests to KRaft (#12621)
Adds the `metadata_quorum` parameter to the `@matrix(...)` annotation to many existing tests, so that they are run with both zookeeper and remote_kraft nodes.

Reviewers: Randall Hauch <rhauch@gmail.com>, Greg Harris <gharris1727@gmail.com>
2022-10-27 11:19:14 -05:00
José Armando García Sancio 5c5dcb7a96
MINOR; Use 3.3.1 release for system test (#12714)
The following files are available in https://s3-us-west-2.amazonaws.com/kafka-packages/:

kafka-streams-3.3.1-test.jar
kafka_2.12-3.3.1.tgz
kafka_2.13-3.3.1.tgz

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-10-04 16:19:24 -07:00
David Arthur c1f23b6c9a
MINOR: Fix delegation token system test (#12693)
KIP-373 added a "token requester" field to the output of kafka-delegation-tokens.sh. The system test was failing since it was not expecting this new field. This patch adds support for this field and improves the error output if we can't parse.

Reviewers: José Armando García Sancio <jsancio@apache.org>, Manikumar Reddy <manikumar.reddy@gmail.com>
2022-10-01 19:22:46 -07:00
Nikolay 51b079dca7
KAFKA-12878: Support --bootstrap-server in kafka-streams-application-reset tool (#12632)
Reviewers: Chris Egerton <chrise@aiven.io>
2022-09-19 13:20:41 -04:00
Manikumar Reddy 3e8e082fab MINOR: Bump latest 2.8 version to 2.8.2 2022-09-19 17:18:47 +05:30
Tom Bentley 352c71ffb5
MINOR: Update release versions for upgrade tests with 3.0.2, 3.1.2, 3.2.3 release (#12661)
Updates release versions in files that are used for upgrade test with the 3.0.2, 3.1.2, 3.2.3 release version.
2022-09-19 17:13:40 +05:30
Jason Gustafson 921885d31f
MINOR; Remove redundant version system test (#12612)
This patch removes test_kafka_version.py, which contains two tests at the moment. The first test verifies we can start a 0.8.2 cluster. The second verifies we can start a cluster with one node on 0.8.2 and another on the latest. These test are covered in greater depth by upgrade_test.py and downgrade_test.py.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-09-08 18:13:59 -07:00
Chris Egerton 897bf4741c
KAFKA-14143: Exactly-once source connector system tests (#11783)
Also includes a minor quality-of-life improvement to clarify why some internal REST requests to workers may fail while that worker is still starting up.

Reviewers: Tom Bentley <tbentley@redhat.com>, Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@gmail.com>, Mickael Maison <mickael.maison@gmail.com>
2022-09-08 15:13:43 -04:00
Yash Mayya 8a19f2da27
Update expected task configs for FileStream source and sink connectors in ConnectRestApiTest (#12576)
Reviewer: Chris Egerton <chrise@aiven.io>
2022-08-31 16:34:00 -04:00
Colin Patrick McCabe 28d5a05943
KAFKA-14187: kafka-features.sh: add support for --metadata (#12571)
This PR adds support to kafka-features.sh for the --metadata flag, as specified in KIP-778.  This
flag makes it possible to upgrade to a new metadata version without consulting a table mapping
version names to short integers. Change --feature to use a key=value format.

FeatureCommandTest.scala: make most tests here true unit tests (that don't start brokers) in order
to improve test run time, and allow us to test more cases. For the integration test part, test both
KRaft and ZK-based clusters. Add support for mocking feature operations in MockAdminClient.java.

upgrade.html: add a section describing how the metadata.version should be upgraded in KRaft
clusters.

Add kraft_upgrade_test.py to test upgrades between KRaft versions.

Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@gmail.com>
2022-08-30 16:56:03 -07:00
Alan Sheinberg 481fefb4f9
MINOR: Adds KRaft versions of most streams system tests (#12458)
Migrates Streams sustem tests to either use kraft brokers or to use both kraft and zk in a testing matrix.

This skips tests which use various forms of Kafka versioning since those seem to have issues with KRaft at the moment. Running these tests with KRaft will require a followup PR.

Reviewers: Guozhang Wang <guozhang@apache.org>, John Roesler <vvcephei@apache.org>
2022-08-26 16:11:19 -05:00
José Armando García Sancio 6ace67b2de
MINOR; Bump trunk to 3.4.0-SNAPSHOT (#12463)
Version bumps in trunk after the creation of the 3.3 branch.

Reviewers: David Arthur <mumrah@gmail.com>
2022-08-01 09:54:12 -07:00
Hao Li 5e4ae06d12
MINOR: fix flaky test test_standby_tasks_rebalance (#12428)
* Description
In this test, when third proc join, sometimes there are other rebalance scenarios such as followup joingroup request happens before syncgroup response was received by one of the proc and the previously assigned tasks for that proc is then lost during new joingroup request. This can result in standby tasks assigned as 3, 1, 2. This PR relax the expected assignment of 2, 2, 2 to a range of [1-3].

* Some backgroud from Guozhang:
I talked to @hao Li offline and also inspected the code a bit, and tl;dr is that I think the code logic is correct (i.e. we do not really have a bug), but we need to relax the test verification a little bit. The general idea behind the subscription info is that:

When a client joins the group, its subscription will try to encode all its current assigned active and standby tasks, which would be used as prev active and standby tasks by the assignor in order to achieve some stickiness.

When a client drops all its active/standby tasks due to errors, it does not actually report all empty from its subscription, instead it tries to check its local state directory (you can see that from TaskManager#getTaskOffsetSums which populates the taskOffsetSum. For active task, its offset would be “-2” a.k.a. LATEST_OFFSET, for standby task, its offset is an actual numerical number.

So in this case, the proc2 which drops all its active and standby tasks, would still report all tasks that have some local state still, and since it was previously owning all six tasks (three as active, and three as standby), it would report all six as standbys, and when that happens the resulted assignment as @hao Li verified, is indeed the un-even one.

So I think the actual “issue“ happens here, is when proc2 is a bit late sending the sync-group request, when the previous rebalance has already completed, and a follow-up rebalance has already triggered, in that case, the resulted un-even assignment is indeed expected. Such a scenario, though not common, is still legitimate since in practice all kinds of timing skewness across instances can happen. So I think we should just relax our verification here, i.e. just making sure that each instance has at least one standby replica at the end, not exactly evenly as “2, 2, 2”.

Reviewers: Suhas Satish <ssatish@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2022-07-21 12:12:29 -07:00
Alyssa Huang 8e9869a777
MINOR: Run MessageFormatChangeTest in ZK mode only (#12395)
KRaft mode will not support writing messages with an older message format (2.8) since the min supported IBP is 3.0 for KRaft. Testing support for reading older message formats will be covered by https://issues.apache.org/jira/browse/KAFKA-14056.

Reviewers: David Jacot <djacot@confluent.io>
2022-07-13 08:46:04 +02:00
Bruno Cadonna 4d53dd9972
KAFKA-13930: Add 3.2.0 Streams upgrade system tests (#12209)
* KAFKA-13930: Add 3.2.0 Streams upgrade system tests

Apache Kafka 3.2.0 was recently released. Now we need
to test upgrades from 3.2 to trunk in our system tests.

Reviewer: Bill Bejeck <bbejeck@apache.org>
2022-06-21 16:33:40 +02:00
Ron Dagostino b04937dc65
MINOR: Fix force kill of KRaft colocated controllers in system tests (#11238)
I noticed that a system test using a KRaft cluster with 3 brokers but only 1 co-located controller did not force-kill the second and third broker after shutting down the first broker (the one with the controller).  The issue was a floating point rounding error.  This patch adjusts for the rounding error and also makes the logic work for an even number of controllers.  A local run of `tests/kafkatest/sanity_checks/test_bounce.py` succeeded (and I manually increased the cluster size for the 1 co-located controller case and observed the correct kill behavior: the second and third brokers were force-killed as expected).

Reviewers: Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@users.noreply.github.com>, David Jacot <djacot@confluent.io>
2022-06-15 16:45:00 +02:00
Aneesh Garg 47bb93cfd7
MINOR: Replace ACL_AUTHORIZER attribute with ZK_ACL_AUTHORIZER (#12247)
Replace ACL_AUTHORIZER attribute with ZK_ACL_AUTHORIZER in system tests. Required after the changes merged with https://github.com/apache/kafka/pull/12190.

Reviewers: David Jacot <djacot@confluent.io>
2022-06-03 17:50:49 +02:00
Bruno Cadonna 5424324722
KAFKA-13930: Add 3.2.0 to core upgrade and compatibility system tests (#12210)
Apache Kafka 3.2.0 was recently released. Now we need
to test upgrades and compatibility with 3.2 in core system tests.

Reviewer: Jason Gustafson <jason@confluent.io>
2022-06-03 09:13:10 +02:00
Bruno Cadonna 0aea498b9a
MINOR: Pin ducktape version to < 0.9 (#12242)
With newer ducktape versions than < 0.9 system tests
may run into authentication issues with the AK system test
infrastructure.

The version will be bumped up once we have infrastructure
in place for newer paramiko versions brought in by ducktape
0.9.

Reviewers: Lucas Bradstreet <lucas@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Kvicii <Karonazaba@gmail.com>
2022-06-02 20:21:23 +02:00
Jason Gustafson f980820e2b
MINOR: Send kraft raft/controller logs to controller log in systests (#12222)
Currently the only place we see controller/raft logging in system tests is `server-start-stdout-stderr.log` where they are mixed with all other logs. It is more convenient to send them to `controller.log` as we do for zk tests.

Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, David Jacot <djacot@confluent.io>
2022-05-30 09:21:41 -07:00
Jason Gustafson 02fc6e7d3c
MINOR: Collect metadata log dir in kraft system tests (#12215)
It is useful to collect the directory for `__cluster_metadata` in system tests. We use a separate directory from user partitions, so it must be configured separately. 

Reviewers: David Arthur <mumrah@gmail.com>
2022-05-25 17:36:58 -07:00
Lucas Bradstreet 46630a0610
MINOR: fix number of nodes used in test_compatible_brokers_eos_v2_enabled (#12211)
Reviewers: David Jacot <djacot@confluent.io>
2022-05-25 20:03:06 +02:00
Lucas Bradstreet f7502f430a
MINOR: fix Connect system test runs with JDK 10+ (#12202)
When running our Connect system tests with JDK 10+, we hit the error 
    AttributeError: 'ClusterNode' object has no attribute 'version'
because util.py attempts to check the version variable for non-Kafka service objects.

Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>
2022-05-25 10:25:00 -07:00
Jason Gustafson b5699b5ccd
KAFKA-13923; Generalize authorizer system test for kraft (#12190)
Change `ZookeeperAuthorizerTest` to `AuthorizerTest` and add support for KRaft's `StandardAuthorizer` implementation.

Reviewers: David Jacot <djacot@confluent.io>
2022-05-23 09:47:14 -07:00
Alex Sorokoumov 78dd40123c
MINOR: Add upgrade tests for FK joins (#12122)
Follow up PR for KAFKA-13769.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-05-13 17:21:27 -07:00
Tom Bentley 467bce04ae
MINOR: Update release versions for upgrade tests with 3.1.1 release (#12156)
Updates release versions in files that are used for upgrade test with the 3.1.1 release version.

Reviewers: Bruno Cadonna <bruno@confluent.io>
2022-05-13 09:32:41 +01:00
Bruno Cadonna 020ff2fe0e
MINOR: Update release versions for upgrade tests with 3.2.0 release (#12143)
Updates release versions in files that are used for upgrade test with the 3.2.0 release version.  

Reviewer: David Jacot <djacot@confluent.io>
2022-05-10 14:47:46 +02:00
Jason Gustafson f0a09ea003
MINOR: Fix event output inconsistencies in TransactionalMessageCopier (#12098)
This patch fixes some strangeness and inconsistency in the messages written by `TransactionalMessageCopier` to stdout. Here is a sample of two messages.

Progress message:
```
{"consumed":33000,"stage":"ProcessLoop","totalProcessed":33000,"progress":"copier-0","time":"2022/04/24 05:40:31:649","remaining":333}
```
The `transactionalId` is set to the value of the `progress` key.

And a shutdown message:
```
{"consumed":33333,"shutdown_complete":"copier-0","totalProcessed":33333,"time":"2022/04/24 05:40:31:937","remaining":0}
```
The `transactionalId` this time is set to the `shutdown_complete` key and there is no `stage` key.

In this patch, we change the following:

1. Use a separate key for the `transactionalId`.
2. Drop the `progress` and `shutdown_complete` keys.
3. Use `stage=ShutdownComplete` in the shutdown message.
4. Modify `transactional_message_copier.py` system test service accordingly.

Reviewers: David Arthur <mumrah@gmail.com>
2022-04-29 10:02:25 -07:00
Luke Chen f28a2ee918
MINOR: revert back to 60s session timeout for static membership test (#11881)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-04-21 11:51:31 -07:00
David Jacot 6d36487b68
MINOR: Fix TestDowngrade.test_upgrade_and_downgrade (#12027)
The second validation does not verify the second bounce because the verified producer and the verified consumer are stopped in `self.run_validation`. This means that the second `run_validation` just spit out the same information as the first one. Instead, we should just run the validation at the end.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-04-18 14:22:33 -07:00
Konstantine Karantasis dd62ef2eda
KAFKA-13748: Do not include file stream connectors in Connect's CLASSPATH and plugin.path by default (#11908)
With this change we stop including the non-production grade connectors that are meant to be used for demos and quick starts by default in the CLASSPATH and plugin.path of Connect deployments. The package of these connector will still be shipped with the Apache Kafka distribution and will be available for explicit inclusion. 

The changes have been tested through the system tests and the existing unit and integration tests. 

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Randall Hauch <rhauch@gmail.com>
2022-03-30 13:15:42 -07:00
Bruno Cadonna 4c8685e701
MINOR: Bump trunk to 3.3.0-SNAPSHOT (#11925)
Version bumps on trunk following the creation of the 3.2 release branch.

Reviewer: David Jacot <djacot@confluent.io>
2022-03-21 21:37:05 +01:00
Justine Olshan 7afdb069bf
KAFKA-13750; Client Compatability KafkaTest uses invalid idempotency configs (#11909)
Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2022-03-17 18:00:27 +01:00
Mickael Maison 1783fb14df
MINOR: Bump latest 3.0 version to 3.0.1 (#11885)
Reviewers: Matthias J. Sax <mjsax@apache.org>
2022-03-16 11:43:37 +01:00
Stanislav Vodetskyi 7e683852b4
MINOR: unpin ducktape dependency to always use the newest version (py3 edition) (#11884)
Ensures we always have the latest published ducktape version.
This way whenever we release a new one, we won't have to cherry pick a bunch of commits across a bunch of branches.
2022-03-11 17:48:19 +05:30
Levani Kokhreidze 87eb0cf03c
KAFKA-6718: Update SubscriptionInfoData with clientTags (#10802)
adds ClientTags to SubscriptionInfoData

Reviewer: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2022-03-11 16:29:05 +08:00
Kamal Chandraprakash 496aa1f84b
MINOR: Provide valid examples in README page. (#10259)
* MINOR: Provide valid examples in README page.

- `testMetadataUpdateWaitTime` method is removed from MetadataTest class.
-  Removed the travis CI documentation.

Reviewers: Luke Chen <showuon@gmail.com>
2022-02-21 14:48:24 +08:00
Michal T 6d7e6d6f87
MINOR: Install missing 'tc' utility - iproute2 for systemtests (#11764)
Signed-off-by: Michal T <mtoth@redhat.com>

Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-02-16 12:56:06 +01:00
Michal T 44fcba980f
MINOR: Fix typo in system tests Dockerfile (#11740)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-02-08 18:03:57 +01:00
David Jacot 7215c90c5e
MINOR: Add 3.0 and 3.1 to streams system tests (#11716)
Reviewers: Bill Bejeck <bill@confluent.io>
2022-01-28 10:06:31 +01:00
David Jacot 110fae2f59
MINOR: Add 3.0 and 3.1 to broker and client compatibility tests (#11701)
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2022-01-25 16:22:48 +01:00
David Jacot 34208e8429
MINOR: Update files with 3.1.0 (#11698)
Reviewers: Bill Bejeck <bbejeck@apache.org>
2022-01-21 21:30:56 +01:00
Ron Dagostino 1785e1223e
KAFKA-13582: TestVerifiableProducer.test_multiple_kraft_security_protocols fails (#11664)
KRaft brokers always use the first controller listener, so if there is not also a colocated KRaft controller on the node be sure to only publish one controller listener in `controller.listener.names` even when the inter-controller listener name differs.  System tests were failing due to unnecessarily publishing a second entry in `controller.listener.names` for a broker-only config and not also publishing a mapping for it in `listener.security.protocol.map`.  Removing the unnecessary entry in `controller.listener.names` solves the problem.

Reviewers: David Jacot <djacot@confluent.io>
2022-01-10 20:54:26 +01:00
Chia-Ping Tsai b6e7f6a4df
MINOR: replace Thread.isAlive by Thread.is_alive for Python code (#11545)
Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2021-11-29 18:49:14 +08:00
Bruno Cadonna 4fed0001ec
MINOR: Fix system test StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance (#11532)
Log messages were changed in the AssignorConfiguration (#11490) that are
also used for verification in system test
StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance.

This commit fixes the test and adds comments to the log messages
that point to the test that needs to be updated in case of
changes to the log messages.

Reviewers: John Roesler <vvcephei@apache.org>, Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2021-11-25 10:48:09 +01:00
David Jacot 3aef0a5ceb
MINOR: Bump trunk to 3.2.0-SNAPSHOT (#11458)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
2021-11-02 13:38:54 +01:00
David Jacot 38a3ddb562
MINOR: Add a replication system test which simulates a slow replica (#11395)
This patch adds a new system test which exercises the shrining/expansion process of the partition leader. It does so by introducing a network partition which isolates a broker from the other brokers in the cluster but not from KRaft Controller/ZK.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-10-20 08:19:36 +02:00
Luke Chen 1af1c80e2d
MINOR: replace deprecated exactly_once_beta into exactly_once_v2 (#10884)
Replace deprecated exactly_once_beta with exactly_once_v2 in system tests.

Follow up for #10870, found out there are still some system tests using the deprecated exactly_once_beta. This PR updates them.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2021-09-27 17:02:48 +02:00
David Jacot f650a14d56
KAFKA-13312; 'NetworkDegradeTest#test_rate' should wait until iperf server is listening (#11344)
Reviewers: Jason Gustafson <jason@confluent.io>
2021-09-21 10:26:46 +02:00
David Jacot 493280735b
MINOR: Bump latest 2.8 version to 2.8.1 (#11341)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2021-09-20 09:23:15 +02:00
Jason Gustafson 25b0857bdb
KAFKA-13234; Transaction system test should clear URPs after broker restarts (#11267)
Clearing under-replicated-partitions helps ensure that partitions do not become unavailable longer than necessary as brokers are rolled. This prevents flakiness due to transaction timeouts.

Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2021-09-01 08:37:05 -07:00
David Jacot c4e1e23857
KAFKA-13231; `TransactionalMessageCopier.start_node` should wait until the process if fully started (#11264)
This patch ensures that the transaction message copier is fully started in `start_node`. Without this, it is possible that `stop_node` is called before the process is started which results in not stopping it at all.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-08-27 08:28:14 +02:00
John Roesler 45ecaa19f8
MINOR: Set session timeout back to 10s for Streams system tests (#11236)
We increased the default session timeout to 30s in KIP-735:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout

Since then, we are observing sporadic system test failures
due to rebalances taking longer than the test timeout.
Rather than increase the test wait times, we can just override
the session timeout to a value more appropriate in the testing
domain.

Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>
2021-08-20 11:27:54 -05:00
Zara Lim 9bc45d4e03
MINOR: Increase the Kafka shutdown timeout to 120 (#11183)
The streams static membership test has failed several times due to hitting the Kafka shutdown timeout, but the logs were showing that the shutdown did actually succeed after the 60 second timeout.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Walker Carlson <wcarlson@confluent.io>
2021-08-05 15:26:10 -07:00
Kamal Chandraprakash a103c95a31
KAFKA-12724: Add 2.8.0 to system tests and streams upgrade tests. (#10602)
Also adjusted the acceptable recovery lag to stabilize Streams tests.

Reviewers: Justine Olshan <jolshan@confluent.io>, Matthias J. Sax <mjsax@apache.org>, John Roesler <vvcephei@apache.org>
2021-08-04 17:31:10 -05:00
Matthias J. Sax a7d9a8ac36
MINOR: Remove older brokers from upgrade test (#11117)
As of version 2.2.1 , Kafka Streams uses message headers and
thus requires broker version 0.11.0 or newer.

Reviewers: John Roesler <john@confluent.io>, Ismael Juma <ismael@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-07-26 14:09:47 -07:00
Cheng Tan 8ed271e1fd
KAFKA-13026: Idempotent producer (KAFKA-10619) follow-up testings (#11002)
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2021-07-26 21:45:59 +01:00
Niket dc512cc038
KAFKA-13015: Ducktape System Tests for Metadata Snapshots (#11053)
This PR implements system tests in ducktape to test the ability of brokers and controllers to generate
and consume snapshots and catch up with the metadata log.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@gmail.com>
2021-07-23 16:28:21 -07:00
Ryan Dielhenn 04fd555475
MINOR: Enable KRaft in transactions_test.py #11121
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-07-23 16:01:54 -07:00
Ismael Juma f34bb28ab6
KAFKA-13116: Fix message_format_change_test and compatibility_test_new_broker_test failures (#11108)
These failures were caused by a46b82bea9. Details for each test:

* message_format_change_test: use IBP 2.8 so that we can write in older message
formats.
* compatibility_test_new_broker_test_failures: fix down-conversion path to handle
empty record batches correctly. The record scan in the old code ensured that
empty record batches were never down-converted, which hid this bug.
* upgrade_test: set the IBP 2.8 when message format is < 0.11 to ensure we are
actually writing with the old message format even though the test was passing
without the change.

Verified with ducker that some variants of these tests failed without these changes
and passed with them. Also added a unit test for the down-conversion bug fix.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-07-23 13:43:31 -07:00
Luke Chen f959e6c583
KAFKA-13129: replace describe topic via zk with describe users (#11115)
Replace the unsupported describe topic via zk with describe users to fix the system tests.
For the upgrade_test case where TLS support is not required, use list_acls instead.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-07-23 05:33:43 -07:00
Bruno Cadonna 9b3687e0ac
HOTFIX: Modify system test config to reduce time to stable task assignment. (#11090)
Currently, we verify the startup of a Streams client by checking the transition
from REBALANCING to RUNNING and if the client processed some records
in the EOS system test. However, if the Streams client only
has standby tasks assigned as it can happen if the client is catching 
up by using warm-up replicas, the client will never process
records within the timeout of the startup verification. Hence, the test 
will fail although everything is fine. This commit fixes this by reducing
the time to the next probing rebalance and by increasing the number of 
max warm-up replicas. In such a way, the catch up of the client and the 
following processing of records should still be within the startup verification 
timeout of the client.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-07-21 07:58:14 +02:00
Ron Dagostino 1e78dcda69
MINOR: Fix ZooKeeperAuthorizerTest for KRaft (#11095)
This patch fixes the ZooKeeperAuthorizerTest for KRaft. The system test was not
configuring/reconfiguring/restarting the remote controller quorum with the correct security settings.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-07-20 16:35:14 -07:00
Colin Patrick McCabe bfc57aa4dd
MINOR: enable reassign_partitions_test.py for kraft (#11064)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-07-19 09:08:55 -07:00
CHUN-HAO TANG 98bd590718
MINOR: Replace unused variable with underscore (#11037)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-07-17 16:36:52 +08:00
Ron Dagostino 762d11c13f
MINOR: ducktape should start brokers in parallel and support co-located kraft
This patch adds a sanity-check bounce system test for the case where we have 3
co-located KRaft controllers and fixes the system test code so that this case
will pass by starting brokers in parallel by default instead of serially. We
now also send SIGKILL to any running KRaft broker or controller nodes for the
co-located case when a majority of co-located controllers have been stopped --
otherwise they do not shutdown, and we spin for the 60 second timeout. Finally,
this patch adds the ability to specify that certain brokers should not be
started when starting the cluster, and then we can start those nodes at a later
time via the add_broker() method call; this is going to be helpful for KRaft
snapshot system testing.

We were not testing the 3 co-located KRaft controller case previously, and it
would not pass because the first Kafka node would never be considered started.
We were starting the Kafka nodes serially, and we decide that a node has
successfully started when it logs a particular message. This message is not
logged until the broker has identified the controller (i.e. the leader of the
KRaft quorum). There cannot be a leader until a majority of the KRaft quorum
has started, so with 3 co-located controllers the first node could never be
considered "started" by the system test.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-07-16 16:28:09 -07:00
Bruno Cadonna 332db13047
HOTFIX: Fix verification of version probing (#10943)
Fixes and improves version probing in system test test_version_probing_upgrade().
2021-07-12 18:50:25 +02:00
Colin Patrick McCabe 5a88a59ddd
MINOR: Hint about "docker system prune" when ducker-ak build fails (#10995)
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Jason Gustafson <jason@confluent.io>
2021-07-08 09:58:46 -07:00
Stanislav Vodetskyi 058589b03d
KAFKA-13041: Enable connecting VS Code remote debugger (#10915)
The changes in this PR enable connecting VS Code's remote debugger to a system test running locally with ducker-ak.
Changes include:
- added zip_safe=False to setup.py - this enables installing kafkatest module together with source code when running `python setup.py  develop/install`.
- install [debugpy](https://github.com/microsoft/debugpy) on ducker nodes
- expose 5678 (default debugpy port) on ducker01 node - ducker01 is the one that actually executes tests, so that's where you'd connect to.
- added `-d|--debug` option to `ducker-ak test` command - if used, tests will run via `python3.7 -m debugpy` command, which would listen on 5678 and pause until debugger is connected.
- changed the logic of the `ducker-ak test` command so that ducktape args are collected separately after `--` - otherwise any argument we add to the `test` command in the future might potentially
shadow a similar ducktape argument. 
	- we don't really check that `ducktape_args` are args while `test_name_args` are actual test names, so the difference between the two is minimal actually - most importantly we do check that `test_name_args` is not empty, but we are ok if `ducktape_args` is.

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
2021-07-08 20:35:14 +05:30
Konstantine Karantasis d2a05d71c0
Bump trunk to 3.1.0-SNAPSHOT (#10981)
Typical version bumps on trunk following the creation of the 3.0 release branch.

Reviewer: Randall Hauch <rhauch@gmail.com>
2021-07-06 14:28:13 -07:00
kpatelatwork 527ba111c7
KAFKA-4793: Connect API to restart connector and tasks (KIP-745) (#10822)
Implements KIP-745 https://cwiki.apache.org/confluence/display/KAFKA/KIP-745%3A+Connect+API+to+restart+connector+and+tasks to change connector REST API to restart a connector and its tasks as a whole.

Testing strategy 
- [x]  Unit tests added for all possible combinations of onlyFailed and includeTasks
- [x]  Integration tests added for all possible combinations of onlyFailed and includeTasks
- [x]  System tests for happy path 

Reviewers: Randall Hauch <rhauch@gmail.com>, Diego Erdody <erdody@gmail.com>, Konstantine Karantasis <k.karantasis@gmail.com>
2021-06-30 21:13:07 -07:00
Ron Dagostino 4f5b4c868e
KAFKA-12756: Update ZooKeeper to v3.6.3 (#10918)
Update the ZooKeeper version to v3.6.3. This requires adding dropwizard
as a new dependency.

Also, add Kafka v2.8.0 to the ducktape system test image.

Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
2021-06-30 11:21:33 -07:00
Chia-Ping Tsai 01c2345658
MINOR: fix round_trip_fault_test.py - don't assign replicas to nonexistent brokers (#10908)
The broker id starts with 1 (https://github.com/apache/kafka/blob/trunk/tests/kafkatest/services/kafka/kafka.py#L207) so round_trip_fault_test.py fails because it assigns replica to nonexistent broker.

The interesting story is the failure happens only on KRaft only. KRaft mode checks the existent ids (https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L950). By contrast, ZK mode has no such check and the min.insync.replicas is set to 1 so this test works with ZK mode even though there is one replica is always off-line.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-06-19 23:54:02 +08:00
Ron Dagostino ebef7d0c21
MINOR: TestSecurityRollingUpgrade system test fixes (#10886)
The TestSecurityRollingUpgrade. test_disable_separate_interbroker_listener() system test had a design flaw: it was migrating inter-broker communication from a SASL_SSL listener to an SSL listener in one roll while immediately removing the SASL_SSL listener in that roll. This requires two rolls because the existing SASL_SSL listener must remain available throughout the first roll so that unrolled brokers can continue to communicate with rolled brokers throughout. This patch adds the second roll to this test and removes the original SASL_SSL listener on that second roll instead of the first one. The test was not failing all the time -- it was flaky.

The TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two() system test was not explicitly identifying the SASL mechanism to enable on a third port when that port was using SASL but the client security protocol was not SASL-based. This was resulting in an empty sasl.enabled.mechanisms config, which applied to that third port, and then when the cluster was rolled to take advantage of this third port for inter-broker communication the potential for an inability to communicate with other, unrolled brokers existed (similar to above, this resulted in a flaky test).

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-06-18 15:50:21 +08:00
John Roesler 987391958d
MINOR: enable EOS during smoke test IT (#10870)
This IT has been failing on trunk recently. Enabling EOS during the integration test
makes it easier to be sure that the test's assumptions are really true during verification
and should make the test more reliable.

I also noticed that in the actual system test file, we are using the deprecated property
name "beta" instead of "v2".

Reviewers: Boyang Chen <boyang@apache.org>
2021-06-13 21:35:02 -05:00
Chia-Ping Tsai 398800a4f3
MINOR: fix client_compatibility_features_test.py - DescribeAcls is already supported by KRaft (#10860)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-06-10 22:02:17 +08:00
A. Sophie Blee-Goldman 48379bd6e5
KAFKA-12648: Pt. 1 - Add NamedTopology to protocol and state directory structure (#10609)
This PR includes adding the NamedTopology to the Subscription/AssignmentInfo, and to the StateDirectory so it can place NamedTopology tasks within the hierarchical structure with task directories under the NamedTopology parent dir.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-06-07 15:38:12 -07:00
Chia-Ping Tsai 0bf4b47f58
MINOR: upgrade pip from 20.2.2 to 21.1.1 (#10661)
The following error happens on my mac m1 when building docker image for system tests.

Collecting pynacl
  Using cached PyNaCl-1.4.0.tar.gz (3.4 MB)
  Installing build dependencies ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-k867aac0/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel 'cffi>=1.4.1; python_implementation != '"'"'PyPy'"'"''
       cwd: None
  Complete output (14 lines):
  Traceback (most recent call last):
    File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "/usr/local/lib/python3.8/dist-packages/pip/__main__.py", line 23, in <module>
      from pip._internal.cli.main import main as _main  # isort:skip # noqa
    File "/usr/local/lib/python3.8/dist-packages/pip/_internal/cli/main.py", line 5, in <module>
      import locale
    File "/usr/lib/python3.8/locale.py", line 16, in <module>
      import re
    File "/usr/lib/python3.8/re.py", line 145, in <module>
      class RegexFlag(enum.IntFlag):
  AttributeError: module 'enum' has no attribute 'IntFlag'
  ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-k867aac0/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel 'cffi>=1.4.1; python_implementation != '"'"'PyPy'"'"'' Check the logs for full command output.

There was a related issue: pypa/pip#9689 and it is already fixed by pypa/pip#9689 (included by pip 21.1.1). I test the pip 21.1.1 and it works well on mac m1.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-05-29 14:49:25 +08:00
Mickael Maison 7f91d2935f
MINOR: Updating files with release 2.7.1 (#10660)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>,  Matthias J. Sax <mjsax@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2021-05-20 10:43:15 +01:00
Ron Dagostino 5b0c58ed53
MINOR: Support using the ZK authorizer with KRaft (#10550)
This patch adds support for running the ZooKeeper-based
kafka.security.authorizer.AclAuthorizer with KRaft clusters. Set the
authorizer.class.name config as well as the zookeeper.connect config while also
setting the typical KRaft configs (node.id, process.roles, etc.), and the
cluster will use KRaft for metadata and ZooKeeper for ACL storage. A system
test that exercises the authorizer is included.

This patch also changes "Raft" to "KRaft" in several system test files. It also
fixes a bug where system test admin clients were unable to connect to a cluster
with broker credentials via the SSL security protocol when the broker was using
that for inter-broker communication and SASL for client communication.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
2021-05-19 10:32:56 -07:00
Colin Patrick McCabe 9e5b77fb96
KAFKA-12788: improve KRaft replica placement (#10494)
Implement a striped replica placement algorithm for KRaft. This also
means implementing rack awareness.  Previously, KRraft just chose
replicas randomly in a non-rack-aware fashion.  Also, allow replicas to
be placed on fenced brokers if there are no other choices.  This was
specified in KIP-631 but previously not implemented.

Reviewers: Jun Rao <junrao@gmail.com>
2021-05-17 16:49:47 -07:00
Ron Dagostino 12377bd3c6
MINOR: Add missing @cluster annotation to StreamsNamedRepartitionTopicTest (#10697)
The StreamsNamedRepartitionTopicTest system tests did not have the @cluster annotation and was therefore taking up the entire cluster. For example, we see this in the log output:

kafkatest.tests.streams.streams_named_repartition_topic_test.StreamsNamedRepartitionTopicTest.test_upgrade_topology_with_named_repartition_topic is using entire cluster. It's possible this test has no associated cluster metadata.

This PR adds the missing annotation.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2021-05-17 17:33:43 -04:00
Ron Dagostino 55b24ce9d6
MINOR: fix system test TestSecurityRollingUpgrade (#10694)
Ensure security protocol and sasl mechanism are updated in the cached SecurityConfig during rolling system tests. Also explicitly indicate which SASL mechanisms we wish to expose during the tests.

Reviewers: David Arthur <mumrah@gmail.com>
2021-05-17 13:46:44 -04:00
Chia-Ping Tsai 29c55fdbbc
MINOR: set replication.factor to 1 to make StreamsBrokerCompatibilityService work with old broker (#10673)
Reviewers: Matthias J. Sax <mjsax@conflunet.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-05-14 13:51:31 +08:00
Chia-Ping Tsai d881d11388
MINOR: fix streams_broker_compatibility_test.py (#10632)
The log message was changed and so the system test can't capture expected message.

Reviewers: Anna Sophie Blee-Goldman ableegoldman@apache.org>
2021-05-05 11:12:00 -07:00
Ron Dagostino 1f4207c7c1
MINOR: system test spelling/pydoc/dead code fixes (#10604)
Reviewers: Kamal Chandraprakash <kamal@nmsworks.co.in>, Chia-Ping Tsai <chia7712@gmail.com>
2021-05-01 23:22:46 +08:00
A. Sophie Blee-Goldman 3bfc9fe486
MINOR: Bump latest 2.6 version to 2.6.2 (#10582)
Bump the version for system tests to 2.6.2
2021-04-21 12:50:30 -07:00
Ismael Juma 976e78e405
KAFKA-12590: Remove deprecated kafka.security.auth.Authorizer, SimpleAclAuthorizer and related classes in 3.0 (#10450)
These were deprecated in Apache Kafka 2.4 (released in December 2019) to be replaced
by `org.apache.kafka.server.authorizer.Authorizer` and `AclAuthorizer`.

As part of KIP-500, we will implement a new `Authorizer` implementation that relies
on a topic (potentially a KRaft topic) instead of `ZooKeeper`, so we should take the chance
to remove related tech debt in 3.0.

Details on the issues affecting the old Authorizer interface can be found in the KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-504+-+Add+new+Java+Authorizer+Interface

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ron Dagostino <rdagostino@confluent.io>
2021-04-03 08:23:26 -07:00
John Roesler 4ed7f2cd01
KAFKA-12593: Fix Apache License headers (#10452)
* Standardize license headers in scala, python, and gradle files.
* Relocate copyright attribution to the NOTICE.
* Add a license header check to `spotless` for scala files.

Reviewers: Ewen Cheslack-Postava <ewencp@apache.org>, Matthias J. Sax <mjsax@apache.org>, A. Sophie Blee-Goldman <ableegoldman@apache.org
2021-04-01 10:38:37 -05:00
Ismael Juma 16b2d4f3a7
MINOR: Self-managed -> KRaft (Kafka Raft) (#10414)
`Self-managed` is also used in the context of Cloud vs on-prem and it can
be confusing.

`KRaft` is a cute combination of `Kafka Raft` and it's pronounced like `craft`
(as in `craftsmanship`).

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jose Sancio <jsancio@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Ron Dagostino <rdagostino@confluent.io>
2021-03-29 15:39:10 -07:00
Ismael Juma 7c7e8078e4
MINOR: Use self-managed mode instead of KIP-500 and nozk (#10362)
KIP-500 is not particularly descriptive. I also tweaked the readme text a bit.

Tested that the readme for self-managed still works after these changes.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ron Dagostino <rdagostino@confluent.io>, Jason Gustafson <jason@confluent.io>
2021-03-19 16:42:37 -07:00
Justine Olshan fdd11a034c
KAFKA-12318: system tests need to fetch Topic IDs via Admin Client instead of via ZooKeeper (#10286)
Change the ducktape system tests to support both ZK and raft topic IDs. Clarifies that
the IBP check applies to the ZK code path.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ron Dagostino <rdagostino@confluent.io>
2021-03-19 11:41:50 -07:00
Ron Dagostino 9adfac2803
MINOR: fix failing ZooKeeper system tests (#10297)
ZooKeeper-related system tests in zookeeper_security_upgrade_test.py and
zookeeper_tls_test.py broke due to #10199. That patch changed the logic of
SecurityConfig.enabled_sasl_mechanisms() to only add the inter-broker SASL
mechanism when the inter-broker protocol was SASL_{PLAINTEXT,SSL}. The
inter-broker protocol is left to default to PLAINTEXT for the SecurityConfig
instance associated with Zookeeper since that value doesn't apply to ZooKeeper,
so the default inter-broker SASL mechanism of GSSAPI was not being added into
the set returned by enabled_sasl_mechanisms(). This is actually correct --
GSSAPI shouldn't be added since inter-broker communication is a Kafka concept
and doesn't apply to ZooKeeper. GSSAPI should be added when ZooKeeper uses it,
though -- which is the case in these tests. So the prior patch referred to
above uncovered a bug: we were relying on the default inter-broker SASL
mechanism to signal that Kerberos was being used by ZooKeeper even though the
inter-broker protocol has nothing to do with that determination in such cases.
This patch explicitly includes GSSAPI in the list of enabled SASL mechanisms
when SASL is enabled for use by ZooKeeper.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-03-17 10:58:42 -07:00
Chia-Ping Tsai 3288db5ed1
MINOR: fix client_compatibility_features_test.py (#10292)
Reviewers: Colin Patrick McCabe <cmccabe@confluent.io>, Ron Dagostino <rdagostino@confluent.io>
2021-03-18 01:27:06 +08:00
Ron Dagostino b96fc7892f
KAFKA-12455: Fix OffsetValidationTest.test_broker_rolling_bounce failure with Raft (#10322)
This test was failing when used with a Raft-based metadata quorum but succeeding with a
ZooKeeper-based quorum. This patch increases the consumers' session timeouts to 30 seconds,
which fixes the Raft case and also eliminates flakiness that has historically existed in the
Zookeeper case.

This patch also fixes a minor logging bug in RaftReplicaManager.endMetadataChangeDeferral() that
was discovered during the debugging of this issue, and it adds an extra logging statement in RaftReplicaManager.handleMetadataRecords() when a single metadata batch is applied to mirror
the same logging statement that occurs when deferred metadata changes are applied.

In the Raft system test case the consumer was sometimes receiving a METADATA response with just
1 alive broker, and then when that broker rolled the consumer wouldn't know about any alive nodes.
It would have to wait until the broker returned before it could reconnect, and by that time the group
coordinator on the second broker would have timed-out the client and initiated a group rebalance. The
test explicitly checks that no rebalances occur, so the test would fail. It turns out that the reason why
the ZooKeeper configuration wasn't seeing rebalances was just plain luck. The brokers' metadata
caches in the ZooKeeper configuration show 1 alive broker even more frequently than the Raft
configuration does. If we tweak the metadata.max.age.ms value on the consumers we can easily
get the ZooKeeper test to fail, and in fact this system test has historically been flaky for the
ZooKeeper configuration. We can get the test to pass by setting session.timeout.ms=30000 (which
is longer than the roll time of any broker), or we can increase the broker count so that the client
never sees a METADATA response with just a single alive broker and therefore never loses contact
with the cluster for an extended period of time. We have plenty of system tests with 3+ brokers, so
we choose to keep this test with 2 brokers and increase the session timeout.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2021-03-16 13:57:29 -07:00
Ron Dagostino b92d606379
MINOR: disable round_trip_fault_test system tests for Raft quorums (#10249)
The KIP-500 early access release will not support creating a partition with a manual
partition assignment that includes a broker that is not currently online. This patch disables
system tests for Raft-based metadata quorums where the test depends on this functionality
to pass.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-03-09 13:57:15 -08:00
Ron Dagostino 0fc53652e1
MINOR: fix failing system test delegation_token_test (#10237)
Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>
2021-03-09 13:55:29 -08:00
Boyang Chen 17851da667
KAFKA-12381: remove live broker checks for forwarding topic creation (#10240)
Removed broker number checks for invalid replication factor when doing the forwarding, in order to reduce false alarms for clients.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-03-05 15:55:14 -08:00
Ron Dagostino 29b4a3d1fe
MINOR: Disable transactional/idempotent system tests for Raft quorums (#10224) 2021-03-02 12:57:12 -05:00
Ron Dagostino 5d37901500
KAFKA-12374: Add missing config sasl.mechanism.controller.protocol (#10199)
Fix some cases where we were erroneously using the configuration of the inter broker
listener instead of the controller listener.  Add the sasl.mechanism.controller.protocol
configuration key specified by KIP-631.  Add some ducktape tests.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>, Boyang Chen <boyang@confluent.io>
2021-02-26 16:56:11 -08:00
Ron Dagostino 02226fa090
MINOR: disable test_produce_bench_transactions for Raft metadata quorum (#10222)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-02-26 13:54:21 -08:00
Jason Gustafson 74dfe80bb8
KAFKA-12365; Disable APIs not supported by KIP-500 broker/controller (#10194)
This patch updates request `listeners` tags to be in line with what the KIP-500 broker/controller support today. We will re-enable these APIs as needed once we have added the support.

I have also updated `ControllerApis` to use `ApiVersionManager` and simplified the envelope handling logic.

Reviewers: Ron Dagostino <rdagostino@confluent.io>, Colin P. McCabe <cmccabe@apache.org>
2021-02-25 19:38:21 -08:00
Ron Dagostino bd04f7557a
MINOR: fix syntax error in upgrade_test.py (#10210)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-02-25 12:14:38 -08:00
Matthias J. Sax e2a0d0c90e
MINOR: bump release version to 3.0.0-SNAPSHOT (#10186)
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2021-02-24 17:49:18 -08:00
Guozhang Wang 059c9b3fcf
MINOR: Fix the generation extraction util (#10204)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2021-02-24 12:23:24 -08:00
Ron Dagostino 9e799cb23c
MINOR: fix some ducktape test issues (#10181)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-02-22 15:09:25 -08:00
Ron Dagostino 0711d15582
MINOR: Test the new KIP-500 quorum mode in ducktape (#10105)
Add the necessary test annotations to test the new KIP-500 quorum broker mode
in many of our ducktape tests. This mode is tested in addition to the classic
Apache ZooKeeper mode.

This PR also adds a new sanity_checks/bounce_test.py system test that runs
through a simple produce/bounce/produce series of events.

Finally, this PR adds @cluster annotations to dozens of system tests that were
missing them. The lack of this annotation was causing these tests to grab the
entire cluster of nodes.  Adding the @cluster annotation dramatically reduced
the time needed to run these tests.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
2021-02-22 13:57:17 -08:00
Justine Olshan a524a751c1 MINOR: Added missing import (KafkaVersion) to kafka.py (#10154)
Reviewers: Ron Dagostino <rdagostino@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2021-02-19 12:10:12 +08:00
Ron Dagostino a30f92bf59
MINOR: Add KIP-500 BrokerServer and ControllerServer (#10113)
This PR adds the KIP-500 BrokerServer and ControllerServer classes and 
makes some related changes to get them working.  Note that the ControllerServer 
does not instantiate a QuorumController object yet, since that will be added in
PR #10070.

* Add BrokerServer and ControllerServer

* Change ApiVersions#computeMaxUsableProduceMagic so that it can handle
endpoints which do not support PRODUCE (such as KIP-500 controller nodes)

* KafkaAdminClientTest: fix some lingering references to decommissionBroker
that should be references to unregisterBroker.

* Make some changes to allow SocketServer to be used by ControllerServer as
we as by the broker.

* We now return a random active Broker ID as the Controller ID in
MetadataResponse for the Raft-based case as per KIP-590.

* Add the RaftControllerNodeProvider

* Add EnvelopeUtils

* Add MetaLogRaftShim

* In ducktape, in config_property.py: use a KIP-500 compatible cluster ID.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>
2021-02-17 21:35:13 -08:00
Ron Dagostino faaef2c2df
MINOR: Support Raft-based metadata quorums in system tests (#10093)
We need to be able to run system tests with Raft-based metadata quorums -- both
co-located brokers and controllers as well as remote controllers -- in addition to the
ZooKepeer-based mode we run today. This PR adds this capability to KafkaService in a
backwards-compatible manner as follows.

If no changes are made to existing system tests then they function as they always do --
they instantiate ZooKeeper, and Kafka will use ZooKeeper. On the other hand, if we want
to use a Raft-based metadata quorum we can do so by introducing a metadata_quorum
argument to the test method and using @matrix to set it to the quorums we want to use for
the various runs of the test. We then also have to skip creating a ZooKeeperService when
the quorum is Raft-based.

This PR does not update any tests -- those will come later after all the KIP-500 code is
merged.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2021-02-11 09:44:17 -08:00
Dániel Urbán 202ff6336f
KAFKA-5235: GetOffsetShell: Support for multiple topics and consumer configuration override (KIP-635) (#9430)
This patch implements KIP-635 which mainly adds support for querying offsets of multiple topics/partitions.

Reviewers: David Jacot <djacot@confluent.io>
2021-02-11 12:06:21 +01:00
John Roesler 1f240ce179 bump to 2.9 development version 2021-02-07 09:25:36 -06:00
Stanislav Vodetskyi 91d6c55da4
MINOR: Upgrade ducktape to version 0.8.1 (#9933)
ducktape 0.8.1 was updated to include the following changes/fixes from 0.7.x branch:
* Junit reporting support
* fix for an issue where unicode characters in exception message would cause test runner to hang on py27.

Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>
2021-01-22 20:23:55 -08:00
Justine Olshan 86b9fdef2b
KAFKA-10869: Gate topic IDs behind IBP 2.8 (KIP-516) (#9814)
Topics processed by the controller and topics newly created will only be given topic IDs if the inter-broker protocol version on the controller is greater than 2.8. This PR also adds a kafka config to specify whether the IBP is greater or equal to 2.8. System tests have been modified to include topic ID checks for upgrade/downgrade tests. This PR also adds a new integration test file for requests/responses that are not gated by IBP (ex: metadata) 

Reviewers: dengziming <dengziming1993@gmail.com>, Lucas Bradstreet <lucas@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2021-01-20 22:32:06 +00:00
John Roesler be88f5a1aa
MINOR: Fix StreamsOptimizedTest (#9911)
We have seen recent system test timeouts associated with this test.
Analysis revealed an excessive amount of time spent searching
for test conditions in the logs.

This change addresses the issue by dropping some unnecessary
checks and using a more efficient log search mechanism.

Reviewers: Bill Bejeck <bbejeck@apache.org>, Guozhang Wang <guozhang@apache.org>
2021-01-19 14:57:34 -06:00
Mickael Maison 966e9dd6a2
MINOR: Updating files with release 2.6.1 (#9844)
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Matthias J. Sax <mjsax@apache.org>
2021-01-14 12:24:18 +00:00
Bill Bejeck bf694b2943
MINOR: Add 2.7.0 release to broker and client compat tests (#9774)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@confluent.io>
2021-01-05 09:45:00 -05:00
Chia-Ping Tsai ac7b5d3389
KAFKA-10893 Increase target_messages_per_sec of ReplicaScaleTest to reduce the run time (#9797)
Reviewers: David Arthur <mumrah@gmail.com>
2021-01-05 00:23:34 +08:00
Bill Bejeck b6891f6729
MINOR: Kafka Streams updates for 2.7.0 release (#9773)
Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-12-22 14:34:59 -08:00
Bill Bejeck 300909d9e6
MINOR: Updating files with latest release 2.7.0 (#9772)
Changes to trunk for the 2.7.0 release. Updating dependencies.gradle, Dockerfile, and vagrant/bash.sh

Reviewers: Matthias J. Sax <mjsax@apache.org>
2020-12-21 11:52:49 -05:00
Chia-Ping Tsai 6e15937feb
KAFKA-10289; Fix failed connect_distributed_test.py (ConnectDistributedTest.test_bounce) (#9673)
In Python 3, `filter` functions return iterators rather than `list` so it can traverse only once. Hence, the following loop will only see "empty" and then validation fails.

```python
        src_messages = self.source.committed_messages() # return iterator
        sink_messages = self.sink.flushed_messages()) # return iterator
        for task in range(num_tasks):
            # only first task can "see" the result. following tasks see empty result
            src_seqnos = [msg['seqno'] for msg in src_messages if msg['task'] == task]
```

Reference: https://portingguide.readthedocs.io/en/latest/iterators.html#new-behavior-of-map-and-filter.

Reviewers: Jason Gustafson <jason@confluent.io>
2020-12-09 13:38:17 -08:00
Chia-Ping Tsai 1cf9ce95ad
MINOR: add "flush=True" to all print in system tests (#9711)
That makes the behavior of print equal to pyhton2.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-12-09 11:19:06 -08:00
Chia-Ping Tsai abb8ff61cc
MINOR: Align the UID inside/outside container (#9652)
Reviewers: Jason Gustafson <jason@confluent.io>
2020-12-03 10:39:58 +08:00
Bruno Cadonna 60139d5b25
MINOR: fix reading SSH output in Streams system tests (#9665)
SSH outputs in system tests originating from paramiko are bytes. However, the logger in the system tests does not accept bytes and instead throws an exception. That means, the bytes returned as SSH output from paramiko need to converted to a type that the logger (or other objects) can process.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-12-01 10:28:04 -08:00
Luke Chen 9412fc1151
MINOR: Update vagrant/tests readme (#9650)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2020-11-28 13:06:48 +08:00
Tom Bentley 91679f247a
KAFKA-10692: Add delegation.token.secret.key, deprecate ...master.key (#9623)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2020-11-19 15:26:25 +00:00
Walker Carlson 5899f5fc4a
KAFKA-9331: Add a streams specific uncaught exception handler (#9487)
This PR introduces a streams specific uncaught exception handler that currently has the option to close the client or the application. If the new handler is set as well as the old handler (java thread handler) will be ignored and an error will be logged.
The application shutdown is achieved through the rebalance protocol.

Reviewers: Bruno Cadonna <cadonna@confluent.io>, Leah Thomas <lthomas@confluent.io>, John Roesler <john@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2020-11-17 22:55:09 -08:00
feyman2016 3e2d1fc8aa
Add system test coverage for group coordinator migration (#9588)
This newly added system test is to verify that with the fix in #9270 , the member.id update caused by static member rejoin would be persisted correctly.

Reviewers: Boyang Chen <boyang@confluent.io>
2020-11-12 19:36:27 -08:00
Gardner Vickers f978d0551b
MINOR: Increase the amount of time available to the `test_verifiable_producer` (#9201)
Increase the amount of time available to the `test_verifiable_producer` test to login and get the process name for the verifiable producer from 5 seconds to 10 seconds.

We were seeing some test failures due to the assertion failing because the verifiable producer would complete before we could login, list the processes, and parse out the producer version. Previously, we were giving this operation 5 seconds to run, this PR bumps it up to 10 seconds. 

I verified locally that this does not flake, but even at 5 seconds I wasn't seeing any flakes. Ultimately we should find a better strategy than racing to query the producer process (as outlined in the existing comments). 

Reviewers: Jason Gustafson <jason@confluent.io>
2020-11-12 13:09:15 -08:00
David Mao ee1aa07036
MINOR: Fix group_mode_transactions_test (#9538)
KIP-431 (#9099) changed the format of console consumer output to `Partition:$PARTITION\t$VALUE` whereas previously the output format was `$VALUE\t$PARTITION`. This PR updates the message verifier to accommodate the updated console consumer output format.
2020-10-31 13:43:13 +01:00
Bruno Cadonna a85b944011
MINOR: Fix verification in StreamsUpgradeTest.test_version_probing_upgrade (#9530)
The system test StreamsUpgradeTest.test_version_probing_upgrade tries to verify the wrong version for version probing.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2020-10-29 16:10:30 -07:00
Manikumar Reddy 36493efa59 MINOR: fix error in quota_test.py system tests
quota_test.py tests are failing with below error.

```
23:24:42 [INFO:2020-10-24 17:54:42,366]: RunnerClient: kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=user.override_quota=False: FAIL: not enough arguments for format string
23:24:42 Traceback (most recent call last):
23:24:42   File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/tests/runner_client.py", line 134, in run
23:24:42     data = self.run_test()
23:24:42   File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/tests/runner_client.py", line 192, in run_test
23:24:42     return self.test_context.function(self.test)
23:24:42   File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/mark/_mark.py", line 429, in wrapper
23:24:42     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
23:24:42   File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/client/quota_test.py", line 141, in test_quota
23:24:42     self.quota_config = QuotaConfig(quota_type, override_quota, self.kafka)
23:24:42   File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/client/quota_test.py", line 60, in __init__
23:24:42     self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['users', None])
23:24:42   File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/client/quota_test.py", line 83, in configure_quota
23:24:42     (kafka.kafka_configs_cmd_with_optional_security_settings(node, force_use_zk_conection), producer_byte_rate, consumer_byte_rate)
23:24:42 TypeError: not enough arguments for format string
23:24:42
```

ran thee tests locally.

Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: David Jacot <djacot@confluent.io>, Ron Dagostino <rndgstn@gmail.com>

Closes #9496 from omkreddy/quota-tests
2020-10-25 14:45:05 +05:30
Nikolay Izhikov c8c1baf4e1 KAFKA-10592: Fix vagrant for a system tests with python3
Fix vagrant for a system tests with a python3.

Author: Nikolay Izhikov <nizhikov@apache.org>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #9480 from nizhikov/KAFKA-10592
2020-10-25 01:09:16 +05:30
Ron Dagostino c7f19aba37
MINOR: fix system tests sending ACLs through ZooKeeper (#9458)
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2020-10-20 13:13:50 +01:00
Ron Dagostino 1636481c5f
MINOR: fix error in quota_test.py system tests (#9443) 2020-10-15 17:08:42 +01:00
Ron Dagostino 147a19036e
MINOR: ACLs for secured cluster system tests (#9378)
This PR adds missing broker ACLs required to create topics and SCRAM credentials when ACLs are enabled for a system test. This PR also adds support for using PLAINTEXT as the inter broker security protocol when using SCRAM from the client in a system test with a secured cluster-- without this it would always be necessary to set both the inter-broker and client mechanisms to a SCRAM mechanism. Also contains some refactoring to make assumptions clearer.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2020-10-09 15:34:53 +01:00
bill 4d3036bb4e Updating trunk versions after cutting branch for 2.7 2020-10-08 07:47:36 -04:00
Nikolay 4e65030e05
KAFKA-10402: Upgrade system tests to python3 (#9196)
For now, Kafka system tests use python2 which is outdated and not supported.
This PR upgrades python to the third version.

Reviewers: Ivan Daschinskiy, Mickael Maison <mickael.maison@gmail.com>, Magnus Edenhill <magnus@edenhill.se>, Guozhang Wang <wangguoz@gmail.com>
2020-10-07 09:41:30 -07:00
Nikolay bc7674fe1b
KAFKA-10505: Fix parsing of generation log string. (#9312)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2020-09-23 14:24:02 -07:00
Bruno Cadonna a46c07ec8d
KAFKA-10292: Set min.insync.replicas to 1 of __consumer_offsets (#9286)
The test StreamsBrokerBounceTest.test_all_brokers_bounce() fails on
2.5 because in the last stage of the test there is only one broker
left and the offset commit cannot succeed because the
min.insync.replicas of __consumer_offsets is set to 2 and acks is
set to all. This causes a time out and extends the closing of the
Kafka Streams client to beyond the duration passed to the close
method of the client.

This affects especially the 2.5 branch since there Kafka Streams
commits offsets for each task, i.e., close() needs to wait for the
timeout for each task. In 2.6 and trunk the offset commit is done
per thread, so close() does only need to wait for one time out per
stream thread.

I opened this PR on trunk, since the test could also become
flaky on trunk and we want to avoid diverging system tests across
branches.

A more complete solution would be to improve the test by defining
a better success criteria.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-09-15 11:12:37 -07:00
Ron Dagostino ebd64b5d55
KAFKA-10131: Remove use_zk_connection flag from ducktape (#9274)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2020-09-14 15:56:21 -07:00
Chia-Ping Tsai ee68b999c4
KAFKA-10463: Install `git` explicitly in Dockerfile (#9257)
`openjdk:8` includes `git` by default, but `openjdk:11` does not. Install `git` explicitly to make it easier to
test with newer openjdk versions.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2020-09-14 15:22:00 -07:00
Ron Dagostino e8524ccd8f
KAFKA-10259: KIP-554 Broker-side SCRAM Config API (#9032)
Implement the KIP-554 API to create, describe, and alter SCRAM user configurations via the AdminClient.  Add ducktape tests, and modify JUnit tests to test and use the new API where appropriate.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Rajini Sivaram <rajinisivaram@googlemail.com>
2020-09-04 13:05:01 -07:00
Justine Olshan a027b9a934
MINOR: Fix typo in ducker-ak test example
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2020-08-22 00:30:22 +05:30
Sanjana Kaundinya b7856df21b
MINOR: Include security configs for topic delete in system tests (#9142)
Reviewers: Ron Dagostino <rdagostino@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2020-08-19 13:37:27 +01:00
Andrew Egelhofer f6c26eaa04 MINOR: Use new version of ducktape
ducktape diff: https://github.com/confluentinc/ducktape/compare/v0.7.8...v0.7.9

- bcrypt (a dependency of ducktape) dropped Python2.7 support.
ducktape-0.7.9 now pins bcrypt to a Python2.7-supported version.

Author: Andrew Egelhofer <aegelhofer@confluent.io>

Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #9192 from andrewegel/trunk
2020-08-18 07:11:24 +05:30
Sanjana Kaundinya f7a4fe7c14
MINOR: fix the way total consumed is calculated for verifiable consumer (#9143)
Reviewers: Ron Dagostino <rdagostino@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2020-08-16 11:15:41 +01:00
John Roesler 7159c6ddd0
MINOR: bump 2.5 versions to 2.5.1 (#9165)
Reviewers: Bill Bejeck <bbejeck@apache.org>
2020-08-11 15:18:33 -05:00
Randall Hauch 1112fd4723
KAFKA-10341: Add 2.6.0 to system tests and streams upgrade tests (#9116)
Author: Randall Hauch <rhauch@gmail.com>
Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-08-04 18:04:52 -05:00
Bruno Cadonna ac3a51d013
MINOR: Remove staticmethod tag to be able to use logger of instance (#9086)
A system test failed with the following error: global name 'self' is not defined

The reason was that `self` was accessed to log a message in a static method. This commit makes the method an instance method.

Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-07-27 09:38:14 -07:00
Chia-Ping Tsai 0d5c967073
KAFKA-10300 fix flaky core/group_mode_transactions_test.py (#9059)
the root cause is same to #9026 so I copy the approach of #9026 to resolve core/group_mode_transactions_test.py

Reviewers: Jun Rao <junrao@gmail.com>
2020-07-23 15:59:57 -07:00
Jason Gustafson 67f5b5de77
KAFKA-10274; Consistent timeouts in transactions_test (#9026)
KAFKA-10235 fixed a consistency issue with the transaction timeout and the progress timeout. Since the test case relies on transaction timeouts, we need to wait at last as long as the timeout in order to ensure progress. However, having a low transaction timeout makes the test prone to the issue identified in KAFKA-9802, in which the coordinator timed out the transaction while the producer was awaiting a Produce response.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>,  Boyang Chen <boyang@confluent.io>, Jun Rao <junrao@gmail.com>
2020-07-22 12:06:47 -07:00
Manikumar Reddy c38825ab97 KAFKA-9432:(follow-up) Set `configKeys` to null in `describeConfigs()` to make it backward compatible with older Kafka versions.
- After #8312, older brokers are returning empty configs,  with latest `adminClient.describeConfigs`.  Old brokers  are receiving empty configNames in `AdminManageer.describeConfigs()` method. Older brokers does not handle empty configKeys. Due to this old brokers are filtering all the configs.
- Update ClientCompatibilityTest to verify describe configs
- Add test case to test describe configs with empty configuration Keys

Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #9046 from omkreddy/KAFKA-9432
2020-07-21 17:32:11 +05:30
Greg Harris f4944ee460
KAFKA-10295: Wait for connector recovery in test_bounce (#9043)
Signed-off-by: Greg Harris <gregh@confluent.io>
2020-07-20 08:50:05 -05:00
Greg Harris 5a2a7c6348
KAFKA-10286: Connect system tests should wait for workers to join group (#9040)
Currently, the system tests `connect_distributed_test` and `connect_rest_test` only wait for the REST api to come up.
The startup of the worker includes an asynchronous process for joining the worker group and syncing with other workers.
There are some situations in which this sync takes an unusually long time, and the test continues without all workers up.
This leads to flakey test failures, as worker joins are not given sufficient time to timeout and retry without waiting explicitly.

This changes the `ConnectDistributedTest` to wait for the Joined group message to be printed to the logs before continuing with tests. I've activated this behavior by default, as it's a superset of the checks that were performed by default before.

This log message is present in every version of DistributedHerder that I could find, in slightly different forms, but always with `Joined group` at the beginning of the log message. This change should be safe to backport to any branch.

Signed-off-by: Greg Harris <gregh@confluent.io>
Author: Greg Harris <gregh@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
2020-07-20 08:48:02 -05:00
Manikumar Reddy b02fa53419 MINOR: Enable broker/client compatibility tests for 2.5.0 release
- Add missing broker/client compatibility tests for 2.5.0 release

Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #9041 from omkreddy/compat
2020-07-20 18:20:48 +05:30
Jason Gustafson 6d2c7802da
MINOR: Fix flaky system test assertion after static member fencing (#9033)
The test case `OffsetValidationTest.test_fencing_static_consumer` fails periodically due to this error:
```
Traceback (most recent call last):
  File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/tests/runner_client.py", line 134, in run
    data = self.run_test()
  File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/tests/runner_client.py", line 192, in run_test
    return self.test_context.function(self.test)
  File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/mark/_mark.py", line 429, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/tests/kafkatest/tests/client/consumer_test.py", line 257, in test_fencing_static_consumer
    assert len(consumer.dead_nodes()) == num_conflict_consumers
AssertionError
```
When a consumer stops, there is some latency between when the shutdown is observed by the service and when the node is added to the dead nodes. This patch fixes the problem by giving some time for the assertion to be satisfied.

Reviewers: Boyang Chen <boyang@confluent.io>
2020-07-17 11:27:33 -07:00
vinoth chandar 796fae25c3
KAFKA-10174: Prefer --bootstrap-server for configs command in ducker tests (#8948)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2020-07-16 09:01:46 -07:00
Chia-Ping Tsai 598a0d16fa
KAFKA-10257 system test kafkatest.tests.core.security_rolling_upgrade_test fails (#9021)
security_rolling_upgrade_test may change the security listener and then restart Kafka servers. has_sasl and has_ssl get out-of-date due to cached _security_config. This PR offers a simple fix that we always check the changes of port mapping and then update the sasl/ssl flag.

Reviewers:  Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
2020-07-15 11:33:49 -07:00
Chia-Ping Tsai e099b58df5
KAFKA-10235 Fix flaky transactions_test.py (#8981)
Reducing timeout of transaction to clean up the unstable offsets quicker. IN hard_bounce mode, transactional client is killed ungracefully. Hence, it produces unstable offsets which obstructs TransactionalMessageCopier from receiving position of group.

Reviewers: Jun Rao <junrao@gmail.com>
2020-07-09 09:33:07 -07:00
Chia-Ping Tsai 80cab851ee
KAFKA-10225 Increase default zk timeout for system tests (#8974)
Increase ZK connection and session timeout in system tests to match the defaults.

Reviewers: Jun Rao <junrao@gmail.com>
2020-07-08 13:19:40 -07:00
Chia-Ping Tsai 6953161125
KAFKA-10191 fix flaky StreamsOptimizedTest (#8913)
Call KafkaStreams#cleanUp to reset local state before starting application up the second run.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>, John Roesler <john@confluent.io>
2020-07-07 12:48:36 -05:00
John Roesler 34f749db30
MINOR: prune the metadata upgrade test matrix (#8971)
Most of the values in the metadata upgrade test matrix are just testing
the upgrade/downgrade path between two previous releases. This is
unnecessary. We run the tests for all supported branches, so what we
should test is the up-/down-gradability of released versions with respect
to the current branch.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-07-06 18:52:51 -05:00
Chia-Ping Tsai 72042f26af
KAFKA-10209: Fix connect_rest_test.py after the introduction of new connector configs (#8944)
There are two new configs introduced by 371f14c3c1 and 1c4eb1a575 so we have to update the expected configs in the connect_rest_test.py system test too.

Reviewer: Konstantine Karantasis <konstantine@confluent.io>
2020-07-03 10:38:42 -07:00
John Roesler 3b2ae7b95a
KAFKA-10173: Use SmokeTest for upgrade system tests (#8938)
Replaces the previous upgrade test's trivial Streams app
with the commonly used SmokeTest, exercising many more
features. Also adjust the test matrix to test upgrading
from each released version since 2.2 to the current branch.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-07-02 18:14:46 -05:00
Chia-Ping Tsai 6094af8974 KAFKA-10214: Fix zookeeper_tls_test.py system test
After 3661f981fff2653aaf1d5ee0b6dde3410b5498db security_config is cached. Hence, the later changes to security flag can't impact the security_config used by later tests.

issue: https://issues.apache.org/jira/browse/KAFKA-10214

Author: Chia-Ping Tsai <chia7712@gmail.com>

Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #8949 from chia7712/KAFKA-10214
2020-07-01 17:08:54 +05:30
Bruno Cadonna f3a9ce4a69
MINOR: Do not swallow exception when collecting PIDs (#8914)
During Streams' system tests the PIDs of the Streams
clients are collected. The method the collects the PIDs
swallows any exception that might be thrown by the
ssh_capture() function. Swallowing any exceptions
might make the investigation of failures harder,
because no information about what happened are recorded.

Reviewers: John Roesler <vvcephei@apache.org>
2020-06-30 12:18:23 -05:00
Nikolay 3661f981ff
KAFKA-10180: Fix security_config caching in system tests (#8917)
Reviewers: Jun Rao <junrao@gmail.com>
2020-06-27 09:27:49 -07:00
vinoth chandar 54dbd041bc
KAFKA-10138: Prefer --bootstrap-server for reassign_partitions command in ducktape tests (#8898)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2020-06-19 12:35:49 -07:00
Ego d3d65dd5dd
MINOR: Upgrade ducktape to 0.7.8 (#8879)
Newer version of ducktape that updates some dependencies and adds some features. You can see that diff here:

https://github.com/confluentinc/ducktape/compare/v0.7.7...v0.7.8

Reviewer: Konstantine Karantasis <konstantine@confluent.io>
2020-06-17 21:53:22 -07:00
Nikolay 8b22b81596
KAFKA-9320: Enable TLSv1.3 by default (KIP-573) (#8695)
1. Enables `TLSv1.3` by default with Java 11 or newer.
2. Add unit tests that cover the various TLSv1.2 and TLSv1.3 combinations.
3. Extend `benchmark_test.py` and `replication_test.py` to run with 'TLSv1.2'
or 'TLSv1.3'.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2020-06-02 15:34:43 -07:00
Randall Hauch 19e40788e7
Bump trunk to 2.7.0-SNAPSHOT (#8746) 2020-06-01 21:23:09 -05:00
Jason Gustafson d9fe30dab0
KAFKA-9802; Increase transaction timeout in system tests to reduce flakiness (#8736)
We have been seeing increased flakiness in transaction system tests. I believe the cause might be due to KIP-537, which increased the default zk session timeout from 6s to 18s and the default replica lag timeout from 10s to 30s. In the system test, we use the default transaction timeout of 10s. However, since the system test involves hard failures, the Produce request could be blocking for as long as the max of these two in order to wait for an ISR shrink. Hence this patch increases the timeout to 30s.

Note this patch also includes a minor logging fix in `Partition`. Previously we would see messages like the following:
```
[Broker id=3] Leader output-topic-0 starts at leader epoch 0 from offset 0 with high watermark 0 ISR 3,2,1 addingReplicas  removingReplicas .Previous leader epoch was -1.
```
This patch fixes the log to print as the following:
```
[Broker id=3] Leader output-topic-0 starts at leader epoch 0 from offset 0 with high watermark 0 ISR [3,2,1] addingReplicas []  removingReplicas []. Previous leader epoch was -1.
```

Reviewers: Bob Barrett <bob.barrett@confluent.io>, Ismael Juma <github@juma.me.uk>
2020-05-27 20:54:09 -07:00
Nikolay 2951b6dd99
KAFKA-10050: kafka_log4j_appender.py fixed for JDK11 (#8731)
kafka_log4j_appender.py was broken on JDK11 by befd80b38.
`fix_opts_for_new_jvm` requires `node.version` to be set, we
add the relevant code to the test.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2020-05-27 20:49:57 -07:00
John Roesler 2cff1fab3f
KAFKA-6145: KIP-441: Fix assignor config passthough (#8716)
Also fixes a system test by configuring the HATA to perform a one-shot balanced assignment

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>
2020-05-27 13:50:12 -05:00
Magnus Edenhill 4aa4786a81
MINOR: Deploy VerifiableClient in constructor to avoid test timeouts (#8651)
Previous to this fix a plugged-in verifiable client, such as
confluent-kafka-python, would be deployed on the node in the background
worker thread as the client was started. Since this could be time consuming
(e.g., 10+ seconds) and since the main test thread would continue to
operate, it was common for the current test to time out waiting
for e.g. the verifiable producer to produce messages while it was in fact
still deploying.

The fix here is to deploy the verifiable client on the node when
the verifiable client is instantiated, which is thus a blocking
operation on the main test thread, avoiding any test-based timeouts.

Reviewers: Jason Gustafson <jason@confluent.io>
2020-05-21 09:59:32 -07:00
Boyang Chen fad8db67bb
MINOR: add option to rebuild source for system tests (#6656)
Reviewers: Jason Gustafson <jason@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-05-13 17:54:43 -07:00
A. Sophie Blee-Goldman 58f7a97314
KAFKA-9821: consolidate Streams rebalance triggering mechanisms (#8596)
Persist followup rebalance in assignment and consolidate rebalance triggering mechanisms

Reviewers: John Roesler <vvcephei@apache.org>
2020-05-12 15:57:18 -05:00
Bruno Cadonna c19a3be198
KAFKA-6145: Set HighAvailabilityTaskAssignor as default in streams_upgrade_test.py (#8613)
Generalize the verification in the upgrade test so that it
does not rely on the task assignor's behavior.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>
2020-05-07 21:10:44 -05:00
Lucas Bradstreet 4e1d6a3d04 MINOR: add support for kafka 2.4 and 2.5 to downgrade test
The downgrade test does not currently support 2.4 and 2.5. When you enable them, it fails as a result of consumer group static membership. This PR makes the downgrade test work with all of our released versions again.

Author: Lucas Bradstreet <lucas@confluent.io>

Reviewers: Boyang Chen, Gwen Shapira

Closes #8518 from lbradstreet/downgrade-test-2.4-2.5
2020-04-28 18:01:24 -07:00
John Roesler 5bb3415c77
KAFKA-6145: KIP-441: Add TaskAssignor class config (#8541)
* add a config to set the TaskAssignor
* set the default assignor to HighAvailabilityTaskAssignor
* fix broken tests (with some TODOs in the system tests)

Implements: KIP-441
Reviewers: Bruno Cadonna <bruno@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2020-04-28 15:57:11 -05:00
John Roesler 88eae49a8d
MINOR: document how to escape json parameters to ducktape tests (#8546)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-04-27 18:22:00 -05:00
Bruno Cadonna 362d199dbe
HOTFIX: Fix broker bounce system tests (#8532)
Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-04-24 08:49:47 -07:00
Lucas Bradstreet b161358998
MINOR: Downgrade test should wait for ISR rejoin between rolls (#8495)
I added a change to the upgrade test a while back that would make it wait for
ISR rejoin before rolls. This prevents incompatible brokers charging through a
bad roll and disguising a downgrade problem.

We now also check for protocol errors in the broker logs.

Reviewers: Boyang Chen <boyang@confluent.io>, Ismael Juma <ismael@juma.me.uk>
2020-04-23 00:15:43 -07:00
Boyang Chen df41713d64
KAFKA-9779: Add Stream system test for 2.5 release (#8378)
Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-04-15 15:59:03 -07:00
Matthias J. Sax 17f9879261
KAFKA-9832: extend Kafka Streams EOS system test (#8440)
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-04-15 13:13:23 -07:00
Rajini Sivaram 8820055744
KAFKA-9797; Fix TestSecurityRollingUpgrade.test_enable_separate_interbroker_listener (#8403)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2020-04-15 13:04:11 +01:00
Ewen Cheslack-Postava cadb3499ff
MINOR: Upgrade ducktape to 0.7.7 (#8487)
This fixes a version pinning issue where a transitive dependency had a
major version upgrade that a dependency did not account for, breaking
the build.

Reviewers: Andrew Egelhofer <aegelhofer@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-04-14 16:36:52 -07:00
Boyang Chen ea47a885b1
MINOR: remove stream simple benchmark suite (#8353)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2020-04-14 09:49:03 -07:00
Matthias J. Sax 20e4a74c35
KAFKA-9832: Extend Streams system tests for EOS-beta (#8443)
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-04-10 11:55:01 -07:00
Boyang Chen 7f640f13b4
KAFKA-9776: Downgrade TxnCommit API v3 when broker doesn't support (#8375)
Revert the decision for the sendOffsetsToTransaction(groupMetadata) API to fail with old version of brokers for the sake of making the application easier to adapt between versions. This PR silently downgrade the TxnOffsetCommit API when the build version is small than 3.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-04-02 21:48:37 -07:00
Ron Dagostino f0ad03069a
MINOR: System test ZooKeeper upgrades (#8384)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2020-04-02 23:23:48 +05:30
Michal T 86a3ebe537
MINOR: Fix typo in version 2.4.1 of kafka folder in Dockerfile (#8393) 2020-04-01 17:56:47 -07:00
Matthias J. Sax 6ad5407350
KAFKA-9719: Streams with EOS-beta should fail fast for older brokers (#8367)
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-03-30 15:21:27 -07:00
Bill Bejeck c725c2338b
MINOR: Update dependencies.gradle, Dockerfile, version.py, and bash.sh for 2.4.1 upgrade (#8387)
These files were missed in the 2.4.1 release

Reviewers: Ismael Juma <ismael@confluent.io>
2020-03-30 12:55:35 -04:00
Nikolay befd80b38d
KAFKA-9573: Fix JVM options to run early versions of Kafka on the latest JVMs (#8138)
Startup scripts for the early version of Kafka contain removed JVM options like `-XX:+PrintGCDateStamps` or `-XX:UseParNewGC`. 
When system tests run on JVM that doesn't support these options we should set up
environment variables with correct options.

Reviewers: Guozhang Wang <guozhang@confluent.io>, Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <ismael@juma.me.uk
2020-03-25 10:31:07 -07:00
A. Sophie Blee-Goldman e1cbefef60
HOTFIX: fix log message in version probing system test (#8341)
Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-03-24 21:46:37 -07:00
Rajini Sivaram 6b419933a0
KAFKA-9662: Wait for consumer offset reset in throttle test to avoid losing early messages (#8227) 2020-03-06 14:50:22 -05:00
A. Sophie Blee-Goldman 674360f5b3
KAFKA-6145: Encode task positions in SubscriptionInfo (#8121)
* Replace Prev/Standby task lists with a representation of the current poasition
  of all tasks, where each task is encoded as the sum of the positions of all the
  changelogs in that task.
* Only the protocol change is implemented, not actual positions, and the
  assignor is updated to translate the new protocol back to lists of Prev/Standby
  tasks so that the current assignment protocol still functions without modification.

Implements: KIP-441

Reviewers: John Roesler <vvcephei@apache.org>, Bruno Cadonna <bruno@confluent.io>
2020-03-06 09:19:04 -06:00
A. Sophie Blee-Goldman a1f2ece323
KAFKA-9525: add enforceRebalance method to Consumer API (#8087)
As described in KIP-568.

Waiting on acceptance of the KIP to write the tests, on the off chance something changes. But rest assured unit tests are coming ️

Will also kick off existing Streams system tests which leverage this new API (eg version probing, sometimes broker bounce)

Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-02-29 18:44:22 -08:00
Brian Bushree 72a5aa8b07
MINOR: add wait_for_assigned_partitions to console-consumer (#8192)
what/why
the throttling_test was broken by this PR (#7785) since it depends on the consumer having partitions-assigned before starting the producer

this PR provides the ability to wait for partitions to be assigned in the console consumer before considering it started.

caveat
this does not support starting up the JmxTool inside the console-consumer for custom metrics while using this wait_until_partitions_assigned flag since the code assumes one JmxTool running per node.

I think a proper fix for this would be to make JmxTool its own standalone single-node service

alternatives
we could use the EndToEnd test suite which uses the verifiable producer/consumer under the hood but I found that there were more changes necessary to get this working unfortunately (specifically doesn't seem like this test suite plays nicely with the ProducerPerformanceService)

Reviewers: Mathew Wong <mwong@confluent.io>, Bill Bejeck <bbejeck.com>
2020-02-29 19:43:51 -05:00
Matthew Wong 294b62963b
throttle consumer timeout increase (#8188)
The test_throttled_reassignment test fails because the consumer that is used to validate reassignment does not start on time to consume all messages. This does not seem like an issue with the throttling of the reassignment, since increasing the timeout allowed the test to pass multiple consecutive runs locally.

This test seemed to rely on the default JmxTool for the console consumer that was removed in this commit: 179d0d7
The console consumer would check to see if it had partitions assigned to it before beginning to consume. Although the test occasionally failed with the JmxTool, it began to fail much more after the removal.

Error messages of failures followed the below format with varying numbers of missed messages. They are the first messages by the producer.

535 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 515 more. Total Acked: 192792, Total Consumed: 192259. We validated that the first 535 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer.
In the scope of the test, this error suggests that the test is falling into the race condition described in produce_consume_validate.py, which has the timeout to prevent the consumer from missing initial messages.

This can serve as a temporary fix until the logic of consumer startup is addressed further.

Reviewers: Jason Gustafson <jason@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
2020-02-27 17:46:55 -05:00
Ron Dagostino 9d53ad794d KAFKA-9567: Docs, system tests for ZooKeeper 3.5.7
These changes depend on [KIP-515: Enable ZK client to use the new TLS supported authentication](https://cwiki.apache.org/confluence/display/KAFKA/KIP-515%3A+Enable+ZK+client+to+use+the+new+TLS+supported+authentication), which was only added to 2.5.0. The upgrade to ZooKeeper 3.5.7 was merged to both 2.5.0 and 2.4.1 via https://issues.apache.org/jira/browse/KAFKA-9515, but this change must only be merged to 2.5.0 (it will break the system tests if merged to 2.4.1).

Author: Ron Dagostino <rdagostino@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Andrew Choi <li_andchoi@microsoft.com>

Closes #8132 from rondagostino/KAFKA-9567
2020-02-25 19:59:55 +05:30
Nikolay f364281431
KAFKA-9319: Fix generation of CA certificate for system tests. (#8106)
Newer versions of Java have added checks to ensure that trust anchors are CA certificates and contain proper extensions. This PR adds Basic Constraints extension with the CA field set to true for system tests.

Reviewers: ajini Sivaram <rajinisivaram@googlemail.com>
2020-02-17 09:49:35 +00:00
Boyang Chen 07db26c20f
KAFKA-9417: New Integration Test for KIP-447 (#8000)
This change mainly have 2 components:

1. extend the existing transactions_test.py to also try out new sendTxnOffsets(groupMetadata) API to make sure we are not introducing any regression or compatibility issue
  a. We shrink the time window to 10 seconds for the txn timeout scheduler on broker so that we could trigger expiration earlier than later

2. create a completely new system test class called group_mode_transactions_test which is more complicated than the existing system test, as we are taking rebalance into consideration and using multiple partitions instead of one. For further breakdown:
  a. The message count was done on partition level, instead of global as we need to visualize 
the per partition order throughout the test. For this sake, we extend ConsoleConsumer to print out the data partition as well to help message copier interpret the per partition data.
  b. The progress count includes the time for completing the pending txn offset expiration
  c. More visibility and feature improvements on TransactionMessageCopier to better work under either standalone or group mode.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-02-12 12:34:12 -08:00
Ron Dagostino 342f13a838 KAFKA-8843: KIP-515: Zookeeper TLS support
Signed-off-by: Ron Dagostino <rdagostinoconfluent.io>

Author: Ron Dagostino <rdagostino@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #8003 from rondagostino/KAFKA-8843
2020-02-08 21:16:48 +05:30
Guozhang Wang 4090f9a2b0
KAFKA-9113: Clean up task management and state management (#7997)
This PR is collaborated by Guozhang Wang and John Roesler. It is a significant tech debt cleanup on task management and state management, and is broken down by several sub-tasks listed below:

Extract embedded clients (producer and consumer) into RecordCollector from StreamTask.
guozhangwang#2
guozhangwang#5

Consolidate the standby updating and active restoring logic into ChangelogReader and extract out of StreamThread.
guozhangwang#3
guozhangwang#4

Introduce Task state life cycle (created, restoring, running, suspended, closing), and refactor the task operations based on the current state.
guozhangwang#6
guozhangwang#7

Consolidate AssignedTasks into TaskManager and simplify the logic of changelog management and task management (since they are already moved in step 2) and 3)).
guozhangwang#8
guozhangwang#9

Also simplified the StreamThread logic a bit as the embedded clients / changelog restoration logic has been moved into step 1) and 2).
guozhangwang#10

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Boyang Chen <boyang@confluent.io>
2020-02-04 21:06:39 -08:00
David Arthur 7e776b0462
Bump trunk to 2.6.0-SNAPSHOT (#8026) 2020-02-03 13:04:56 -05:00
vinoth chandar 71c5729a41 KAFKA-6144: Add KeyQueryMetadata APIs to KafkaStreams (#7960)
Deprecate existing metadata query APIs in favor of new
ones that include standby hosts as well as partition
information.

Closes: #7960
Implements: KIP-535
Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com>
Reviewed-by: John Roesler <vvcephei@apache.org>
2020-01-15 09:39:02 -06:00
Brian Bushree 422bc1f0fa MINOR: Disable JmxTool in kafkatest console-consumer by default (#7785)
Do not initialize `JmxTool` by default when running console consumer. In order to support this, we remove `has_partitions_assigned` and its only usage in an assertion inside `ProduceConsumeValidateTest`, which did not seem to contribute much to the validation.

Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>
2020-01-09 16:53:36 -08:00
A. Sophie Blee-Goldman 3453e9e2ee HOTFIX: fix system test race condition (#7836)
In some system tests a Streams app is started and then prints a message to stdout, which the system test waits for to confirm the node has successfully been brought up. It then greps for certain log messages in a retriable loop.

But waiting on the Streams app to start/print to stdout does not mean the log file has been created yet, so the grep may return an error. Although this occurs in a retriable loop it is assumed that grep will not fail, and the result is piped to wc and then blindly converted to an int in the python function, which fails since the error message is a string (throws ValueError)

We should catch the ValueError and return a 0 so it can try again rather than immediately crash

Reviewers: Bill Bejeck <bbejeck@gmail.com>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
2019-12-31 18:44:31 -08:00
Lucas Bradstreet 8fd7cd6a43 MINOR: upgrade system test should check for ISR rejoin on each roll (#7827)
The upgrade system test correctly rolls by upgrading the broker and 
leaving the IBP, and then rolling again with the latest IBP version.
Unfortunately, this is not sufficient to pick up many problems in our IBP
gating as we charge through the rolls and after the second roll all of
the brokers will rejoin the ISR and the test will be treated as a
success.

This test adds two new checks:
1. We wait for the ISR to stabilize for all partitions. This is best
practice during rolls, and is enough to tell us if a broker hasn't
rejoined after each roll.
2. We check the broker logs for some common protocol errors. This is a
fail safe as it's possible for the test to be successful even if some
protocols are incompatible and the ISR is rejoined.

Reviewers: Nikhil Bhatia <nikhil@confluent.io>, Jason Gustafson <jason@confluent.io>
2019-12-30 11:02:30 -08:00