Commit Graph

429 Commits

Author SHA1 Message Date
Stanislav Vodetskyi 64f267b6c1 MINOR: Upgrade ducktape to version 0.7.11 (#9932)
ducktape 0.7.11 fixes a bug where a unicode exception message would cause test runner to hang up and never finish.
This change should be backported to all the branches using ducktape 0.7.10

Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>
2021-01-22 20:33:57 -08:00
Stanislav Vodetskyi 3a99aa37a2 MINOR: Pin ducktape to version 0.7.10
Ducktape version 0.7.10 pinned paramiko to version 2.3.2 to deal with random SSHExceptions confluent had been seeing since ducktape was updated to a later version of paramiko.

The idea is that we can backport ducktape 0.7.10 change as far back as possible, while 2.7 and trunk can update to 0.8.0 and python3 separately.

Tested:
In progress, but unlikely to affect anything, since the only difference between ducktape 0.7.9 and 0.7.10 is paramiko version downgrade.

Author: Stanislav Vodetskyi <stan@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #9490 from stan-confluent/ducktape-710-26

(cherry picked from commit 1cbc4da0c9)
Signed-off-by: Manikumar Reddy <manikumar.reddy@gmail.com>
2020-10-24 21:53:20 +05:30
Andrew Egelhofer a5d883c774 MINOR: Use new version of ducktape
ducktape diff: https://github.com/confluentinc/ducktape/compare/v0.7.8...v0.7.9

- bcrypt (a dependency of ducktape) dropped Python2.7 support.
ducktape-0.7.9 now pins bcrypt to a Python2.7-supported version.

Author: Andrew Egelhofer <aegelhofer@confluent.io>

Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #9192 from andrewegel/trunk
2020-08-18 07:25:21 +05:30
Ego b2052faf75 MINOR: Upgrade ducktape to 0.7.8 (#8879)
Newer version of ducktape that updates some dependencies and adds some features. You can see that diff here:

https://github.com/confluentinc/ducktape/compare/v0.7.7...v0.7.8

Reviewer: Konstantine Karantasis <konstantine@confluent.io>
2020-06-17 21:56:43 -07:00
Bruno Cadonna 8019272d14 MINOR: Fix Streams EOS system tests by adding clean-up of state dir (#7693)
Recently, system tests test_rebalance_[simple|complex] failed
repeatedly with a verfication error. The cause was most probably
the missing clean-up of a state directory of one of the processors.

A node is cleaned up when a service on that node is started and when
a test is torn down.

If the clean-up flag clean_node_enabled of a EOS Streams service is
unset, the clean-up of the node is skipped.

The clean-up flag of processor1 in the EOS tests should stay set before
its first start, so that the node is cleaned before the service is started.
Afterwards for the multiple restarts of processor1 the cleans-up flag should
be unset to re-use the local state.

After the multiple restarts are done, the clean-up flag of processor1 should
again be set to trigger node clean-up during the test teardown.

A dirty node can lead to test failures when tests from Streams EOS tests are
scheduled on the same node, because the state store would not start empty
since it reads the local state that was not cleaned up.

Reviewers: Matthias J. Sax <mjsax@apache.org>, Andrew Choi <andchoi@linkedin.com>, Bill Bejeck <bbejeck@gmail.com>
2020-06-01 10:49:48 -05:00
Ewen Cheslack-Postava adc845a75b MINOR: Upgrade ducktape to 0.7.7 (#8487)
This fixes a version pinning issue where a transitive dependency had a
major version upgrade that a dependency did not account for, breaking
the build.

Reviewers: Andrew Egelhofer <aegelhofer@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-04-14 16:51:51 -07:00
Brian Bushree 7cdd1ce178 MINOR: Backport kafkatest per-broker overrides and extra JVM args (#8347)
Backport of #7297 and #7715 to allow per-node broker overrides and extra JVM args

Co-authored-by: David Arthur <mumrah@gmail.com>

Reviewers: Konstantine Karantasis <konstantine@confluent.io>
2020-03-25 21:59:34 -07:00
Konstantine Karantasis f4fda8c009 KAFKA-8417: Remove redundant network definition --net=host when starting testing docker containers (#6797)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2020-03-23 18:11:25 -07:00
Greg Harris 6462c2a7fe KAFKA-7489: Backport removal of 0.9 compatibility checks from ConnectDistributedTest (#8035)
(#7023) exposed an incompatibility between Kafka <=0.9 and Connect >0.9,
in which the broker does not recognize a request for ApiVersions.
For trunk and 2.4, this test case was removed rather than the issue addressed.
This effectively backports the other half of (#7023) which was left out of (#7791).

Signed-off-by: Greg Harris <gregh@confluent.io>

Author: Greg Harris <gregh@confluent.io>
Reviewers: Randall Hauch <rhauch@gmail.com>, Andrew Choi <andchoi@linkedin.com>
2020-02-05 09:50:44 -06:00
A. Sophie Blee-Goldman bc0a649d89 HOTFIX: bump system test versions (#7835)
Bumped kafkatest/version.py to 2.2.3-SNAPSHOT
Updated versions following instructions in streams_upgrade_test.py

Reviewers: Bill Bejeck <bbejeck@gmail.com>
2019-12-17 16:45:04 -05:00
Randall Hauch 77eecb2c87
MINOR: Increase the timeout in one of Connect's distributed system tests (#7790)
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Nigel Liang <nigel@nigelliang.com>, Roesler <john@confluent.io>
2019-12-06 15:33:05 -06:00
Randall Hauch fc2cccaff6 KAFKA-7489: Fix ConnectDistributedTest system test to use KafkaVersion (backport) (#7791)
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Nigel Liang <nigel@nigelliang.com>, John Roesler <john@confluent.io>
2019-12-06 13:45:34 -06:00
Randall Hauch 61c8228f31 Update versions to 2.2.3-SNAPSHOT 2019-12-01 14:52:03 -06:00
Randall Hauch b80eee19b8 Merge tag '2.2.2-rc2' into 2.2
2.2.2-rc2
2019-12-01 14:45:59 -06:00
A. Sophie Blee-Goldman faa4654504 HOTFIX: fix bug in VP test where it greps for the wrong log message (#7643)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2019-11-04 19:01:17 -08:00
Randall Hauch 1d348535a0 Bump version to 2.2.2 2019-10-22 23:40:27 -05:00
A. Sophie Blee-Goldman 79d0f55ba7 MINOR: followup to Version Probing improvements2.2 (#7448)
Small follow-up to trunk PR #7426

While debugging the 2.3 VP PR we realized we should remove the leader-tracking from the VP system test altogether. We'd already merged the corresponding trunk PR so I made a quick new PR for trunk.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2019-10-04 15:58:38 -07:00
A. Sophie Blee-Goldman d3bf3cd3d1 KAFKA-8649: send latest commonly supported version in assignment (#7426)
PR 7423 but targeted at 2.2.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2019-10-02 16:01:54 -07:00
Arjun Satish 5d2a6dc531 KAFKA-8774: Regex can be found anywhere in config value (#7197)
Corrected the AbstractHerder to correctly identify task configs that contain variables for externalized secrets. The original method incorrectly used `matcher.matches()` instead of `matcher.find()`. The former method expects the entire string to match the regex, whereas the second one can find a pattern anywhere within the input string (which fits this use case more correctly).

Added unit tests to cover various cases of a config with externalized secrets, and updated system tests to cover case where config value contains additional characters besides secret that requires regex pattern to be found anywhere in the string (as opposed to complete match).

Author: Arjun Satish <arjun@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
2019-08-13 09:53:26 -05:00
Stanislav Kozlovski d797428cd0 MINOR: Fix merge conflict in version.py (#6931)
Fix a regression introduced in commit c450bfc291

Reviewers: Ismael Juma <ismael@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2019-06-14 08:39:34 -07:00
Jason Gustafson 91d229df60 MINOR: Fix race condition on shutdown of verifiable producer
We've seen `ReplicaVerificationToolTest.test_replica_lags` fail occasionally due to errors such as the following:
```
RemoteCommandError: ubuntuworker7: Command 'kill -15 2896' returned non-zero exit status 1. Remote error message: bash: line 0: kill: (2896) - No such process
```
The problem seems to be a shutdown race condition when using `max_messages` with the producer. The process may already be gone which will cause the signal to fail.

Author: Jason Gustafson <jason@confluent.io>

Reviewers: Gwen Shapira

Closes #6906 from hachikuji/fix-failing-replicat-verification-test
2019-06-07 17:06:11 -07:00
Jason Gustafson be1bfadde7 MINOR: Lower producer throughput in flaky upgrade system test
We see the upgrade test failing from time to time. I looked into it and found that the root cause is basically that the test throughput can be too high for the 0.9 producer to make progress. Eventually it reaches a point where it has a huge backlog of timed out requests in the accumulator which all have to be expired. We see a long run of messages like this in the output:

```
{"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335160","key":null}
{"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335163","key":null}
{"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335166","key":null}
{"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335169","key":null}
```
This can continue for a long time (I have observed up to 1 min) and prevents the producer from successfully writing any new data. While it is busy expiring the batches, no data is getting delivered to the consumer, which causes it to eventually raise a timeout.
```
kafka.consumer.ConsumerTimeoutException
at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:50)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:109)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
```
The fix here is to reduce the throughput, which seems reasonable since the purpose of the test is to verify the upgrade, which does not demand heavy load. Note that I investigated several failing instances of this test going back to 1.0 and saw a similar pattern, so there does not appear to be a regression.

Author: Jason Gustafson <jason@confluent.io>

Reviewers: Gwen Shapira

Closes #6907 from hachikuji/lower-throughput-for-upgrade-test
2019-06-07 17:06:03 -07:00
Lucas Bradstreet ed47e09967 KAFKA-8499: ensure java is in PATH for ducker system tests (#6898)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2019-06-07 14:24:41 -07:00
Vahid Hashemian 79a42ae793 Update versions to 2.2.2-SNAPSHOT 2019-06-01 22:55:42 -07:00
Vahid Hashemian ed80742bed Merge tag '2.2.1-rc1' into 2.2
2.2.1-rc1
2019-06-01 22:50:50 -07:00
Matthias J. Sax c450bfc291 KAFKA-8155: Add 2.1.1 release to system tests (#6596)
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2019-05-30 12:58:32 -07:00
Alex Diachenko 46f59e7331 KAFKA-8418: Wait until REST resources are loaded when starting a Connect Worker. (#6840)
Author: Alex Diachenko <sansanichfb@gmail.com>
Reviewers: Arjun Satish <arjun@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
2019-05-30 14:03:40 -05:00
Alex Diachenko 0779740f9b MINOR: Fix red herring when ConnectDistributedTest.test_bounce fails. (#6838)
Author: Alex Diachenko <sansanichfb@gmail.com>
Reviewer: Randall Hauch <rhauch@gmail.com>
2019-05-29 17:35:24 -05:00
Matthias J. Sax 9d559da39e MINOR: fix Streams version-probing system test (#6763)
A version probing rebalance, triggers a second rebalance. If the second rebalance happens quickly, we see the log about successful group assignment twice.

Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-05-20 14:17:05 -07:00
Jason Gustafson 7b8db0d790 MINOR: Increase security test timeouts for transient failures (#6760)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2019-05-18 16:53:15 -07:00
Vahid Hashemian 55783d3133 Bump version to 2.2.1 2019-05-13 09:10:09 -07:00
Magesh Nandakumar b5b2c5af2f KAFKA-8352 : Fix Connect System test failure 404 Not Found (#6713)
Corrects the system tests to check for either a 404 or a 409 error and sleeping until the Connect REST API becomes available. This corrects a previous change to how REST extensions are initialized (#6651), which added the ability of Connect throwing a 404 if the resources are not yet started. The integration tests were already looking for 409.

Author: Magesh Nandakumar <magesh.n.kumar@gmail.com>
Reviewer: Randall Hauch <rhauch@gmail.com>
2019-05-10 18:21:07 -05:00
Matthias J. Sax 7704e01a60 Bump version to 2.2.1-SNAPSHOT 2019-03-22 13:26:11 -07:00
Matthias J. Sax 05fcfde8f6 Bump version to 2.2.0 2019-03-09 11:44:15 -08:00
Rajini Sivaram cec1dfab3f KAFKA-8070: Increase consumer startup timeout in system tests (#6405)
For consumers using SSL, this timeout includes the time to create and copy keystores and truststores and sometime it takes longer than 10s to complete the security setup before starting the consumer process.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2019-03-08 16:59:35 +00:00
Guozhang Wang 5cc14b68a8 HOTFIX: add igore import to streams_upgrade_test 2019-03-01 11:22:37 -08:00
Guozhang Wang 92f1412dfc MINOR: disable Streams system test for broker upgrade/downgrade (#6341)
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2019-02-28 00:22:18 -08:00
Bill Bejeck 034968b1ac
MINOR: Add all topics created check streams broker bounce test (2.2) (#6244)
The StreamsBrokerBounceTest.test_broker_type_bounce experienced what looked like a transient failure. After looking over this test and failure, it seems like it is vulnerable to timing error that streams will start before the kafka service creates all topics.

Reviews:  Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>
2019-02-20 12:47:05 -05:00
Konstantine Karantasis 257fd87fd7 KAFKA-7834: Extend collected logs in system test services to include heap dumps
* Enable heap dumps on OOM with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file.bin> in the major services in system tests
* Collect the heap dump from the predefined location as part of the result logs for each service
* Change Connect service to delete the whole root directory instead of individual expected files
* Tested by running the full suite of system tests

Author: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #6158 from kkonstantine/KAFKA-7834

(cherry picked from commit 83c435f3ba)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
2019-02-04 16:46:44 -08:00
Konstantine Karantasis 0dbb064963 MINOR: Upgrade ducktape to 0.7.5 (#6197)
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>
2019-01-25 11:14:19 -08:00
Colin Patrick McCabe a79d6dcdb6
KAFKA-7793: Improve the Trogdor command line. (#6133)
* Allow the Trogdor agent to be started in "exec mode", where it simply
runs a single task and exits after it is complete.

* For AgentClient and CoordinatorClient, allow the user to pass the path
to a file containing JSON, instead of specifying the JSON object in the
command-line text itself.  This means that we can get rid of the bash
scripts whose only function was to load task specs into a bash string
and run a Trogdor command.

* Print dates and times in a human-readable way, rather than as numbers
of milliseconds.

* When listing tasks or workers, output human-readable tables of
information.

* Allow the user to filter on task ID name, task ID pattern, or task
state.

* Support a --json flag to provide raw JSON output if desired.

Reviewed-by: David Arthur <mumrah@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
2019-01-24 09:26:51 -08:00
Kan Li f8e8d62f56 MINOR: ducker-ak: add down -f, avoid using a terminal in ducker test
When using ./ducker-ak test on Jenkins, the script complains that there is no TTY.  To fix this, we should skip passing -t to docker exec.  We do not need a pseudo-TTY to run the tests.  Similarly, we should skip passing -i, since we do not need to keep stdin open.

The down command should have a force option, specified as -f or --force.

Reviewed-by: Colin P. McCabe <cmccabe@apache.org>
2019-01-23 13:39:47 -08:00
Chris Egerton 743607af5a KAFKA-5117: Stop resolving externalized configs in Connect REST API
[KIP-297](https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations#KIP-297:ExternalizingSecretsforConnectConfigurations-PublicInterfaces) introduced the `ConfigProvider` mechanism, which was primarily intended for externalizing secrets provided in connector configurations. However, when querying the Connect REST API for the configuration of a connector or its tasks, those secrets are still exposed. The changes here prevent the Connect REST API from ever exposing resolved configurations in order to address that. rhauch has given a more thorough writeup of the thinking behind this in [KAFKA-5117](https://issues.apache.org/jira/browse/KAFKA-5117)

Tested and verified manually. If these changes are approved unit tests can be added to prevent a regression.

Author: Chris Egerton <chrise@confluent.io>

Reviewers: Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #6129 from C0urante/hide-provided-connect-configs
2019-01-23 11:00:23 -08:00
Xi Yang cc33511e9a MINOR: Support choosing different JVMs when running integration tests
+ Add a parameter to the ducktap-ak to control the OpenJDK base image.
+ Fix a few issues of using OpenJDK:11 as the base image.

*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*

*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*

Author: Xi Yang <xi@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #6071 from yangxi/ducktape-jdk
2019-01-11 15:11:55 -08:00
Bill Bejeck 3746bf2c84 MINOR: Need to have same wait as verify timeout broker upgrade downgrade (#6127)
When I originally refactored the streams_upgrade_test#upgrade_downgrade_brokers test I removed the wait call which would wait for up 24 minutes for the SmokeTestDriver class to publish and verify all of its records.

Since most of the tests run in two minutes or less I set the monitor_log timeout to three minutes. However, the SmokeTestDriver#verify method allows up to six minutes to consume all records before verifying the monitor_log timeout needs to be greater than 6 minutes. I've set the timeout to 8 minutes.

Additionally, the steps needed to update the streams_upgrade_test should be documented as there are several components that need to get updated. So I've documented those steps here on the test as a giant comment.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2019-01-11 11:35:33 -08:00
Bill Bejeck b1b792d9a8 MINOR: Add 2.1 version metadata upgrade (#6111)
Updated the test_metadata_upgrade test. To enable using the 2.1 version I needed to add config change to the StreamsUpgradeTestJobRunnerService to ensure the ductape passes proper args when starting the StreamsUpgradeTest

For testing, I ran the test_metadata_upgrade test and all versions now pass http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2019-01-09--001.1547049873--bbejeck--MINOR_add_2_1_version_metadata_upgrade--a450c68/report.html

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-01-09 15:19:00 -08:00
Bill Bejeck 515e680c71 MINOR: Put states in proper order, increase timeout for starting (#6105)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-01-08 13:48:38 -08:00
Jason Gustafson f9a22f42a8 KAFKA-7773; Add end to end system test relying on verifiable consumer (#6070)
This commit creates an EndToEndTest base class which relies on the verifiable consumer. This will ultimately replace ProduceConsumeValidateTest which depends on the console consumer. The advantage is that the verifiable consumer exposes more information to use for validation. It also allows for a nicer shutdown pattern. Rather than relying on the console consumer idle timeout, which requires a minimum wait time, we can halt consumption after we have reached the last acked offsets. This should be more reliable and faster. The downside is that the verifiable consumer only works with the new consumer, so we cannot yet convert the upgrade tests. This commit converts only the replication tests and a flaky security test to use EndToEndTest.
2019-01-08 14:14:51 +00:00
Bill Bejeck 404bdef08d MINOR: Remove sleep calls and ignore annotation from streams upgrade test (#6046)
The StreamsUpgradeTest::test_upgrade_downgrade_brokers used sleep calls in the test which led to flaky test performance and as a result, we placed an @ignore annotation on the test. This PR uses log events instead of the sleep calls hence we can now remove the @ignore setting.

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-01-06 23:03:54 -08:00
John Roesler ef9204dc58 MINOR: improve resilience of Streams test producers (#6028)
Some Streams system tests have failed during the setup phase
due to the producer having retries disabled and getting some
transient error from the broker.

This patch adds a retries parameter to the VerifiableProducer
(default unchanged), and sets retries to 10 for Streams tests.

It also sets acks equal to the number of brokers for Streams tests.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2019-01-04 13:44:15 -08:00