kafka

Commit Graph

Author	SHA1	Message	Date
Levani Kokhreidze	87eb0cf03c	KAFKA-6718: Update SubscriptionInfoData with clientTags (#10802 ) adds ClientTags to SubscriptionInfoData Reviewer: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>	2022-03-11 16:29:05 +08:00
Kamal Chandraprakash	496aa1f84b	MINOR: Provide valid examples in README page. (#10259 ) * MINOR: Provide valid examples in README page. - `testMetadataUpdateWaitTime` method is removed from MetadataTest class. - Removed the travis CI documentation. Reviewers: Luke Chen <showuon@gmail.com>	2022-02-21 14:48:24 +08:00
Michal T	6d7e6d6f87	MINOR: Install missing 'tc' utility - iproute2 for systemtests (#11764 ) Signed-off-by: Michal T <mtoth@redhat.com> Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-02-16 12:56:06 +01:00
Michal T	44fcba980f	MINOR: Fix typo in system tests Dockerfile (#11740 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-02-08 18:03:57 +01:00
David Jacot	7215c90c5e	MINOR: Add 3.0 and 3.1 to streams system tests (#11716 ) Reviewers: Bill Bejeck <bill@confluent.io>	2022-01-28 10:06:31 +01:00
David Jacot	110fae2f59	MINOR: Add 3.0 and 3.1 to broker and client compatibility tests (#11701 ) Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2022-01-25 16:22:48 +01:00
David Jacot	34208e8429	MINOR: Update files with 3.1.0 (#11698 ) Reviewers: Bill Bejeck <bbejeck@apache.org>	2022-01-21 21:30:56 +01:00
Ron Dagostino	1785e1223e	KAFKA-13582: TestVerifiableProducer.test_multiple_kraft_security_protocols fails (#11664 ) KRaft brokers always use the first controller listener, so if there is not also a colocated KRaft controller on the node be sure to only publish one controller listener in `controller.listener.names` even when the inter-controller listener name differs. System tests were failing due to unnecessarily publishing a second entry in `controller.listener.names` for a broker-only config and not also publishing a mapping for it in `listener.security.protocol.map`. Removing the unnecessary entry in `controller.listener.names` solves the problem. Reviewers: David Jacot <djacot@confluent.io>	2022-01-10 20:54:26 +01:00
Chia-Ping Tsai	b6e7f6a4df	MINOR: replace Thread.isAlive by Thread.is_alive for Python code (#11545 ) Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2021-11-29 18:49:14 +08:00
Bruno Cadonna	4fed0001ec	MINOR: Fix system test StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance (#11532 ) Log messages were changed in the AssignorConfiguration (#11490) that are also used for verification in system test StreamsCooperativeRebalanceUpgradeTest.test_upgrade_to_cooperative_rebalance. This commit fixes the test and adds comments to the log messages that point to the test that needs to be updated in case of changes to the log messages. Reviewers: John Roesler <vvcephei@apache.org>, Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2021-11-25 10:48:09 +01:00
David Jacot	3aef0a5ceb	MINOR: Bump trunk to 3.2.0-SNAPSHOT (#11458 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>	2021-11-02 13:38:54 +01:00
David Jacot	38a3ddb562	MINOR: Add a replication system test which simulates a slow replica (#11395 ) This patch adds a new system test which exercises the shrining/expansion process of the partition leader. It does so by introducing a network partition which isolates a broker from the other brokers in the cluster but not from KRaft Controller/ZK. Reviewers: Jason Gustafson <jason@confluent.io>	2021-10-20 08:19:36 +02:00
Luke Chen	1af1c80e2d	MINOR: replace deprecated exactly_once_beta into exactly_once_v2 (#10884 ) Replace deprecated exactly_once_beta with exactly_once_v2 in system tests. Follow up for #10870, found out there are still some system tests using the deprecated exactly_once_beta. This PR updates them. Reviewers: Bruno Cadonna <cadonna@apache.org>	2021-09-27 17:02:48 +02:00
David Jacot	f650a14d56	KAFKA-13312; 'NetworkDegradeTest#test_rate' should wait until iperf server is listening (#11344 ) Reviewers: Jason Gustafson <jason@confluent.io>	2021-09-21 10:26:46 +02:00
David Jacot	493280735b	MINOR: Bump latest 2.8 version to 2.8.1 (#11341 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2021-09-20 09:23:15 +02:00
Jason Gustafson	25b0857bdb	KAFKA-13234; Transaction system test should clear URPs after broker restarts (#11267 ) Clearing under-replicated-partitions helps ensure that partitions do not become unavailable longer than necessary as brokers are rolled. This prevents flakiness due to transaction timeouts. Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <ismael@juma.me.uk>	2021-09-01 08:37:05 -07:00
David Jacot	c4e1e23857	KAFKA-13231; `TransactionalMessageCopier.start_node` should wait until the process if fully started (#11264 ) This patch ensures that the transaction message copier is fully started in `start_node`. Without this, it is possible that `stop_node` is called before the process is started which results in not stopping it at all. Reviewers: Jason Gustafson <jason@confluent.io>	2021-08-27 08:28:14 +02:00
John Roesler	45ecaa19f8	MINOR: Set session timeout back to 10s for Streams system tests (#11236 ) We increased the default session timeout to 30s in KIP-735: https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout Since then, we are observing sporadic system test failures due to rebalances taking longer than the test timeout. Rather than increase the test wait times, we can just override the session timeout to a value more appropriate in the testing domain. Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>	2021-08-20 11:27:54 -05:00
Zara Lim	9bc45d4e03	MINOR: Increase the Kafka shutdown timeout to 120 (#11183 ) The streams static membership test has failed several times due to hitting the Kafka shutdown timeout, but the logs were showing that the shutdown did actually succeed after the 60 second timeout. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Walker Carlson <wcarlson@confluent.io>	2021-08-05 15:26:10 -07:00
Kamal Chandraprakash	a103c95a31	KAFKA-12724: Add 2.8.0 to system tests and streams upgrade tests. (#10602 ) Also adjusted the acceptable recovery lag to stabilize Streams tests. Reviewers: Justine Olshan <jolshan@confluent.io>, Matthias J. Sax <mjsax@apache.org>, John Roesler <vvcephei@apache.org>	2021-08-04 17:31:10 -05:00
Matthias J. Sax	a7d9a8ac36	MINOR: Remove older brokers from upgrade test (#11117 ) As of version 2.2.1 , Kafka Streams uses message headers and thus requires broker version 0.11.0 or newer. Reviewers: John Roesler <john@confluent.io>, Ismael Juma <ismael@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>	2021-07-26 14:09:47 -07:00
Cheng Tan	8ed271e1fd	KAFKA-13026: Idempotent producer (KAFKA-10619) follow-up testings (#11002 ) Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2021-07-26 21:45:59 +01:00
Niket	dc512cc038	KAFKA-13015: Ducktape System Tests for Metadata Snapshots (#11053 ) This PR implements system tests in ducktape to test the ability of brokers and controllers to generate and consume snapshots and catch up with the metadata log. Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@gmail.com>	2021-07-23 16:28:21 -07:00
Ryan Dielhenn	04fd555475	MINOR: Enable KRaft in transactions_test.py #11121 Reviewers: Colin P. McCabe <cmccabe@apache.org>	2021-07-23 16:01:54 -07:00
Ismael Juma	f34bb28ab6	KAFKA-13116: Fix message_format_change_test and compatibility_test_new_broker_test failures (#11108 ) These failures were caused by `a46b82bea9`. Details for each test: * message_format_change_test: use IBP 2.8 so that we can write in older message formats. * compatibility_test_new_broker_test_failures: fix down-conversion path to handle empty record batches correctly. The record scan in the old code ensured that empty record batches were never down-converted, which hid this bug. * upgrade_test: set the IBP 2.8 when message format is < 0.11 to ensure we are actually writing with the old message format even though the test was passing without the change. Verified with ducker that some variants of these tests failed without these changes and passed with them. Also added a unit test for the down-conversion bug fix. Reviewers: Jason Gustafson <jason@confluent.io>	2021-07-23 13:43:31 -07:00
Luke Chen	f959e6c583	KAFKA-13129: replace describe topic via zk with describe users (#11115 ) Replace the unsupported describe topic via zk with describe users to fix the system tests. For the upgrade_test case where TLS support is not required, use list_acls instead. Reviewers: Ismael Juma <ismael@juma.me.uk>	2021-07-23 05:33:43 -07:00
Bruno Cadonna	9b3687e0ac	HOTFIX: Modify system test config to reduce time to stable task assignment. (#11090 ) Currently, we verify the startup of a Streams client by checking the transition from REBALANCING to RUNNING and if the client processed some records in the EOS system test. However, if the Streams client only has standby tasks assigned as it can happen if the client is catching up by using warm-up replicas, the client will never process records within the timeout of the startup verification. Hence, the test will fail although everything is fine. This commit fixes this by reducing the time to the next probing rebalance and by increasing the number of max warm-up replicas. In such a way, the catch up of the client and the following processing of records should still be within the startup verification timeout of the client. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2021-07-21 07:58:14 +02:00
Ron Dagostino	1e78dcda69	MINOR: Fix ZooKeeperAuthorizerTest for KRaft (#11095 ) This patch fixes the ZooKeeperAuthorizerTest for KRaft. The system test was not configuring/reconfiguring/restarting the remote controller quorum with the correct security settings. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2021-07-20 16:35:14 -07:00
Colin Patrick McCabe	bfc57aa4dd	MINOR: enable reassign_partitions_test.py for kraft (#11064 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	2021-07-19 09:08:55 -07:00
CHUN-HAO TANG	98bd590718	MINOR: Replace unused variable with underscore (#11037 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2021-07-17 16:36:52 +08:00
Ron Dagostino	762d11c13f	MINOR: ducktape should start brokers in parallel and support co-located kraft This patch adds a sanity-check bounce system test for the case where we have 3 co-located KRaft controllers and fixes the system test code so that this case will pass by starting brokers in parallel by default instead of serially. We now also send SIGKILL to any running KRaft broker or controller nodes for the co-located case when a majority of co-located controllers have been stopped -- otherwise they do not shutdown, and we spin for the 60 second timeout. Finally, this patch adds the ability to specify that certain brokers should not be started when starting the cluster, and then we can start those nodes at a later time via the add_broker() method call; this is going to be helpful for KRaft snapshot system testing. We were not testing the 3 co-located KRaft controller case previously, and it would not pass because the first Kafka node would never be considered started. We were starting the Kafka nodes serially, and we decide that a node has successfully started when it logs a particular message. This message is not logged until the broker has identified the controller (i.e. the leader of the KRaft quorum). There cannot be a leader until a majority of the KRaft quorum has started, so with 3 co-located controllers the first node could never be considered "started" by the system test. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2021-07-16 16:28:09 -07:00
Bruno Cadonna	332db13047	HOTFIX: Fix verification of version probing (#10943 ) Fixes and improves version probing in system test test_version_probing_upgrade().	2021-07-12 18:50:25 +02:00
Colin Patrick McCabe	5a88a59ddd	MINOR: Hint about "docker system prune" when ducker-ak build fails (#10995 ) Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Jason Gustafson <jason@confluent.io>	2021-07-08 09:58:46 -07:00
Stanislav Vodetskyi	058589b03d	KAFKA-13041: Enable connecting VS Code remote debugger (#10915 ) The changes in this PR enable connecting VS Code's remote debugger to a system test running locally with ducker-ak. Changes include: - added zip_safe=False to setup.py - this enables installing kafkatest module together with source code when running `python setup.py develop/install`. - install [debugpy](https://github.com/microsoft/debugpy) on ducker nodes - expose 5678 (default debugpy port) on ducker01 node - ducker01 is the one that actually executes tests, so that's where you'd connect to. - added `-d\|--debug` option to `ducker-ak test` command - if used, tests will run via `python3.7 -m debugpy` command, which would listen on 5678 and pause until debugger is connected. - changed the logic of the `ducker-ak test` command so that ducktape args are collected separately after `--` - otherwise any argument we add to the `test` command in the future might potentially shadow a similar ducktape argument. - we don't really check that `ducktape_args` are args while `test_name_args` are actual test names, so the difference between the two is minimal actually - most importantly we do check that `test_name_args` is not empty, but we are ok if `ducktape_args` is. Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>	2021-07-08 20:35:14 +05:30
Konstantine Karantasis	d2a05d71c0	Bump trunk to 3.1.0-SNAPSHOT (#10981 ) Typical version bumps on trunk following the creation of the 3.0 release branch. Reviewer: Randall Hauch <rhauch@gmail.com>	2021-07-06 14:28:13 -07:00
kpatelatwork	527ba111c7	KAFKA-4793: Connect API to restart connector and tasks (KIP-745) (#10822 ) Implements KIP-745 https://cwiki.apache.org/confluence/display/KAFKA/KIP-745%3A+Connect+API+to+restart+connector+and+tasks to change connector REST API to restart a connector and its tasks as a whole. Testing strategy - [x] Unit tests added for all possible combinations of onlyFailed and includeTasks - [x] Integration tests added for all possible combinations of onlyFailed and includeTasks - [x] System tests for happy path Reviewers: Randall Hauch <rhauch@gmail.com>, Diego Erdody <erdody@gmail.com>, Konstantine Karantasis <k.karantasis@gmail.com>	2021-06-30 21:13:07 -07:00
Ron Dagostino	4f5b4c868e	KAFKA-12756: Update ZooKeeper to v3.6.3 (#10918 ) Update the ZooKeeper version to v3.6.3. This requires adding dropwizard as a new dependency. Also, add Kafka v2.8.0 to the ducktape system test image. Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>	2021-06-30 11:21:33 -07:00
Chia-Ping Tsai	01c2345658	MINOR: fix round_trip_fault_test.py - don't assign replicas to nonexistent brokers (#10908 ) The broker id starts with 1 (https://github.com/apache/kafka/blob/trunk/tests/kafkatest/services/kafka/kafka.py#L207) so round_trip_fault_test.py fails because it assigns replica to nonexistent broker. The interesting story is the failure happens only on KRaft only. KRaft mode checks the existent ids (https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/ReplicationControlManager.java#L950). By contrast, ZK mode has no such check and the min.insync.replicas is set to 1 so this test works with ZK mode even though there is one replica is always off-line. Reviewers: Ismael Juma <ismael@juma.me.uk>	2021-06-19 23:54:02 +08:00
Ron Dagostino	ebef7d0c21	MINOR: TestSecurityRollingUpgrade system test fixes (#10886 ) The TestSecurityRollingUpgrade. test_disable_separate_interbroker_listener() system test had a design flaw: it was migrating inter-broker communication from a SASL_SSL listener to an SSL listener in one roll while immediately removing the SASL_SSL listener in that roll. This requires two rolls because the existing SASL_SSL listener must remain available throughout the first roll so that unrolled brokers can continue to communicate with rolled brokers throughout. This patch adds the second roll to this test and removes the original SASL_SSL listener on that second roll instead of the first one. The test was not failing all the time -- it was flaky. The TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two() system test was not explicitly identifying the SASL mechanism to enable on a third port when that port was using SASL but the client security protocol was not SASL-based. This was resulting in an empty sasl.enabled.mechanisms config, which applied to that third port, and then when the cluster was rolled to take advantage of this third port for inter-broker communication the potential for an inability to communicate with other, unrolled brokers existed (similar to above, this resulted in a flaky test). Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2021-06-18 15:50:21 +08:00
John Roesler	987391958d	MINOR: enable EOS during smoke test IT (#10870 ) This IT has been failing on trunk recently. Enabling EOS during the integration test makes it easier to be sure that the test's assumptions are really true during verification and should make the test more reliable. I also noticed that in the actual system test file, we are using the deprecated property name "beta" instead of "v2". Reviewers: Boyang Chen <boyang@apache.org>	2021-06-13 21:35:02 -05:00
Chia-Ping Tsai	398800a4f3	MINOR: fix client_compatibility_features_test.py - DescribeAcls is already supported by KRaft (#10860 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	2021-06-10 22:02:17 +08:00
A. Sophie Blee-Goldman	48379bd6e5	KAFKA-12648: Pt. 1 - Add NamedTopology to protocol and state directory structure (#10609 ) This PR includes adding the NamedTopology to the Subscription/AssignmentInfo, and to the StateDirectory so it can place NamedTopology tasks within the hierarchical structure with task directories under the NamedTopology parent dir. Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2021-06-07 15:38:12 -07:00
Chia-Ping Tsai	0bf4b47f58	MINOR: upgrade pip from 20.2.2 to 21.1.1 (#10661 ) The following error happens on my mac m1 when building docker image for system tests. Collecting pynacl Using cached PyNaCl-1.4.0.tar.gz (3.4 MB) Installing build dependencies ... error ERROR: Command errored out with exit status 1: command: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-k867aac0/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel 'cffi>=1.4.1; python_implementation != '"'"'PyPy'"'"'' cwd: None Complete output (14 lines): Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.8/dist-packages/pip/__main__.py", line 23, in <module> from pip._internal.cli.main import main as _main # isort:skip # noqa File "/usr/local/lib/python3.8/dist-packages/pip/_internal/cli/main.py", line 5, in <module> import locale File "/usr/lib/python3.8/locale.py", line 16, in <module> import re File "/usr/lib/python3.8/re.py", line 145, in <module> class RegexFlag(enum.IntFlag): AttributeError: module 'enum' has no attribute 'IntFlag' ---------------------------------------- ERROR: Command errored out with exit status 1: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-k867aac0/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel 'cffi>=1.4.1; python_implementation != '"'"'PyPy'"'"'' Check the logs for full command output. There was a related issue: pypa/pip#9689 and it is already fixed by pypa/pip#9689 (included by pip 21.1.1). I test the pip 21.1.1 and it works well on mac m1. Reviewers: Ismael Juma <ismael@juma.me.uk>	2021-05-29 14:49:25 +08:00
Mickael Maison	7f91d2935f	MINOR: Updating files with release 2.7.1 (#10660 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>	2021-05-20 10:43:15 +01:00
Ron Dagostino	5b0c58ed53	MINOR: Support using the ZK authorizer with KRaft (#10550 ) This patch adds support for running the ZooKeeper-based kafka.security.authorizer.AclAuthorizer with KRaft clusters. Set the authorizer.class.name config as well as the zookeeper.connect config while also setting the typical KRaft configs (node.id, process.roles, etc.), and the cluster will use KRaft for metadata and ZooKeeper for ACL storage. A system test that exercises the authorizer is included. This patch also changes "Raft" to "KRaft" in several system test files. It also fixes a bug where system test admin clients were unable to connect to a cluster with broker credentials via the SSL security protocol when the broker was using that for inter-broker communication and SASL for client communication. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>	2021-05-19 10:32:56 -07:00
Colin Patrick McCabe	9e5b77fb96	KAFKA-12788: improve KRaft replica placement (#10494 ) Implement a striped replica placement algorithm for KRaft. This also means implementing rack awareness. Previously, KRraft just chose replicas randomly in a non-rack-aware fashion. Also, allow replicas to be placed on fenced brokers if there are no other choices. This was specified in KIP-631 but previously not implemented. Reviewers: Jun Rao <junrao@gmail.com>	2021-05-17 16:49:47 -07:00
Ron Dagostino	12377bd3c6	MINOR: Add missing @cluster annotation to StreamsNamedRepartitionTopicTest (#10697 ) The StreamsNamedRepartitionTopicTest system tests did not have the @cluster annotation and was therefore taking up the entire cluster. For example, we see this in the log output: kafkatest.tests.streams.streams_named_repartition_topic_test.StreamsNamedRepartitionTopicTest.test_upgrade_topology_with_named_repartition_topic is using entire cluster. It's possible this test has no associated cluster metadata. This PR adds the missing annotation. Reviewers: Bill Bejeck <bbejeck@apache.org>	2021-05-17 17:33:43 -04:00
Ron Dagostino	55b24ce9d6	MINOR: fix system test TestSecurityRollingUpgrade (#10694 ) Ensure security protocol and sasl mechanism are updated in the cached SecurityConfig during rolling system tests. Also explicitly indicate which SASL mechanisms we wish to expose during the tests. Reviewers: David Arthur <mumrah@gmail.com>	2021-05-17 13:46:44 -04:00
Chia-Ping Tsai	29c55fdbbc	MINOR: set replication.factor to 1 to make StreamsBrokerCompatibilityService work with old broker (#10673 ) Reviewers: Matthias J. Sax <mjsax@conflunet.io>, A. Sophie Blee-Goldman <sophie@confluent.io>	2021-05-14 13:51:31 +08:00
Chia-Ping Tsai	d881d11388	MINOR: fix streams_broker_compatibility_test.py (#10632 ) The log message was changed and so the system test can't capture expected message. Reviewers: Anna Sophie Blee-Goldman ableegoldman@apache.org>	2021-05-05 11:12:00 -07:00
Ron Dagostino	1f4207c7c1	MINOR: system test spelling/pydoc/dead code fixes (#10604 ) Reviewers: Kamal Chandraprakash <kamal@nmsworks.co.in>, Chia-Ping Tsai <chia7712@gmail.com>	2021-05-01 23:22:46 +08:00
A. Sophie Blee-Goldman	3bfc9fe486	MINOR: Bump latest 2.6 version to 2.6.2 (#10582 ) Bump the version for system tests to 2.6.2	2021-04-21 12:50:30 -07:00
Ismael Juma	976e78e405	KAFKA-12590: Remove deprecated kafka.security.auth.Authorizer, SimpleAclAuthorizer and related classes in 3.0 (#10450 ) These were deprecated in Apache Kafka 2.4 (released in December 2019) to be replaced by `org.apache.kafka.server.authorizer.Authorizer` and `AclAuthorizer`. As part of KIP-500, we will implement a new `Authorizer` implementation that relies on a topic (potentially a KRaft topic) instead of `ZooKeeper`, so we should take the chance to remove related tech debt in 3.0. Details on the issues affecting the old Authorizer interface can be found in the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-504+-+Add+new+Java+Authorizer+Interface Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ron Dagostino <rdagostino@confluent.io>	2021-04-03 08:23:26 -07:00
John Roesler	4ed7f2cd01	KAFKA-12593: Fix Apache License headers (#10452 ) * Standardize license headers in scala, python, and gradle files. * Relocate copyright attribution to the NOTICE. * Add a license header check to `spotless` for scala files. Reviewers: Ewen Cheslack-Postava <ewencp@apache.org>, Matthias J. Sax <mjsax@apache.org>, A. Sophie Blee-Goldman <ableegoldman@apache.org	2021-04-01 10:38:37 -05:00
Ismael Juma	16b2d4f3a7	MINOR: Self-managed -> KRaft (Kafka Raft) (#10414 ) `Self-managed` is also used in the context of Cloud vs on-prem and it can be confusing. `KRaft` is a cute combination of `Kafka Raft` and it's pronounced like `craft` (as in `craftsmanship`). Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jose Sancio <jsancio@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Ron Dagostino <rdagostino@confluent.io>	2021-03-29 15:39:10 -07:00
Ismael Juma	7c7e8078e4	MINOR: Use self-managed mode instead of KIP-500 and nozk (#10362 ) KIP-500 is not particularly descriptive. I also tweaked the readme text a bit. Tested that the readme for self-managed still works after these changes. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ron Dagostino <rdagostino@confluent.io>, Jason Gustafson <jason@confluent.io>	2021-03-19 16:42:37 -07:00
Justine Olshan	fdd11a034c	KAFKA-12318: system tests need to fetch Topic IDs via Admin Client instead of via ZooKeeper (#10286 ) Change the ducktape system tests to support both ZK and raft topic IDs. Clarifies that the IBP check applies to the ZK code path. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ron Dagostino <rdagostino@confluent.io>	2021-03-19 11:41:50 -07:00
Ron Dagostino	9adfac2803	MINOR: fix failing ZooKeeper system tests (#10297 ) ZooKeeper-related system tests in zookeeper_security_upgrade_test.py and zookeeper_tls_test.py broke due to #10199. That patch changed the logic of SecurityConfig.enabled_sasl_mechanisms() to only add the inter-broker SASL mechanism when the inter-broker protocol was SASL_{PLAINTEXT,SSL}. The inter-broker protocol is left to default to PLAINTEXT for the SecurityConfig instance associated with Zookeeper since that value doesn't apply to ZooKeeper, so the default inter-broker SASL mechanism of GSSAPI was not being added into the set returned by enabled_sasl_mechanisms(). This is actually correct -- GSSAPI shouldn't be added since inter-broker communication is a Kafka concept and doesn't apply to ZooKeeper. GSSAPI should be added when ZooKeeper uses it, though -- which is the case in these tests. So the prior patch referred to above uncovered a bug: we were relying on the default inter-broker SASL mechanism to signal that Kerberos was being used by ZooKeeper even though the inter-broker protocol has nothing to do with that determination in such cases. This patch explicitly includes GSSAPI in the list of enabled SASL mechanisms when SASL is enabled for use by ZooKeeper. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2021-03-17 10:58:42 -07:00
Chia-Ping Tsai	3288db5ed1	MINOR: fix client_compatibility_features_test.py (#10292 ) Reviewers: Colin Patrick McCabe <cmccabe@confluent.io>, Ron Dagostino <rdagostino@confluent.io>	2021-03-18 01:27:06 +08:00
Ron Dagostino	b96fc7892f	KAFKA-12455: Fix OffsetValidationTest.test_broker_rolling_bounce failure with Raft (#10322 ) This test was failing when used with a Raft-based metadata quorum but succeeding with a ZooKeeper-based quorum. This patch increases the consumers' session timeouts to 30 seconds, which fixes the Raft case and also eliminates flakiness that has historically existed in the Zookeeper case. This patch also fixes a minor logging bug in RaftReplicaManager.endMetadataChangeDeferral() that was discovered during the debugging of this issue, and it adds an extra logging statement in RaftReplicaManager.handleMetadataRecords() when a single metadata batch is applied to mirror the same logging statement that occurs when deferred metadata changes are applied. In the Raft system test case the consumer was sometimes receiving a METADATA response with just 1 alive broker, and then when that broker rolled the consumer wouldn't know about any alive nodes. It would have to wait until the broker returned before it could reconnect, and by that time the group coordinator on the second broker would have timed-out the client and initiated a group rebalance. The test explicitly checks that no rebalances occur, so the test would fail. It turns out that the reason why the ZooKeeper configuration wasn't seeing rebalances was just plain luck. The brokers' metadata caches in the ZooKeeper configuration show 1 alive broker even more frequently than the Raft configuration does. If we tweak the metadata.max.age.ms value on the consumers we can easily get the ZooKeeper test to fail, and in fact this system test has historically been flaky for the ZooKeeper configuration. We can get the test to pass by setting session.timeout.ms=30000 (which is longer than the roll time of any broker), or we can increase the broker count so that the client never sees a METADATA response with just a single alive broker and therefore never loses contact with the cluster for an extended period of time. We have plenty of system tests with 3+ brokers, so we choose to keep this test with 2 brokers and increase the session timeout. Reviewers: Ismael Juma <ismael@juma.me.uk>	2021-03-16 13:57:29 -07:00
Ron Dagostino	b92d606379	MINOR: disable round_trip_fault_test system tests for Raft quorums (#10249 ) The KIP-500 early access release will not support creating a partition with a manual partition assignment that includes a broker that is not currently online. This patch disables system tests for Raft-based metadata quorums where the test depends on this functionality to pass. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2021-03-09 13:57:15 -08:00
Ron Dagostino	0fc53652e1	MINOR: fix failing system test delegation_token_test (#10237 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>	2021-03-09 13:55:29 -08:00
Boyang Chen	17851da667	KAFKA-12381: remove live broker checks for forwarding topic creation (#10240 ) Removed broker number checks for invalid replication factor when doing the forwarding, in order to reduce false alarms for clients. Reviewers: Jason Gustafson <jason@confluent.io>	2021-03-05 15:55:14 -08:00
Ron Dagostino	29b4a3d1fe	MINOR: Disable transactional/idempotent system tests for Raft quorums (#10224 )	2021-03-02 12:57:12 -05:00
Ron Dagostino	5d37901500	KAFKA-12374: Add missing config sasl.mechanism.controller.protocol (#10199 ) Fix some cases where we were erroneously using the configuration of the inter broker listener instead of the controller listener. Add the sasl.mechanism.controller.protocol configuration key specified by KIP-631. Add some ducktape tests. Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>, Boyang Chen <boyang@confluent.io>	2021-02-26 16:56:11 -08:00
Ron Dagostino	02226fa090	MINOR: disable test_produce_bench_transactions for Raft metadata quorum (#10222 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2021-02-26 13:54:21 -08:00
Jason Gustafson	74dfe80bb8	KAFKA-12365; Disable APIs not supported by KIP-500 broker/controller (#10194 ) This patch updates request `listeners` tags to be in line with what the KIP-500 broker/controller support today. We will re-enable these APIs as needed once we have added the support. I have also updated `ControllerApis` to use `ApiVersionManager` and simplified the envelope handling logic. Reviewers: Ron Dagostino <rdagostino@confluent.io>, Colin P. McCabe <cmccabe@apache.org>	2021-02-25 19:38:21 -08:00
Ron Dagostino	bd04f7557a	MINOR: fix syntax error in upgrade_test.py (#10210 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2021-02-25 12:14:38 -08:00
Matthias J. Sax	e2a0d0c90e	MINOR: bump release version to 3.0.0-SNAPSHOT (#10186 ) Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2021-02-24 17:49:18 -08:00
Guozhang Wang	059c9b3fcf	MINOR: Fix the generation extraction util (#10204 ) Reviewers: Matthias J. Sax <matthias@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>	2021-02-24 12:23:24 -08:00
Ron Dagostino	9e799cb23c	MINOR: fix some ducktape test issues (#10181 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2021-02-22 15:09:25 -08:00
Ron Dagostino	0711d15582	MINOR: Test the new KIP-500 quorum mode in ducktape (#10105 ) Add the necessary test annotations to test the new KIP-500 quorum broker mode in many of our ducktape tests. This mode is tested in addition to the classic Apache ZooKeeper mode. This PR also adds a new sanity_checks/bounce_test.py system test that runs through a simple produce/bounce/produce series of events. Finally, this PR adds @cluster annotations to dozens of system tests that were missing them. The lack of this annotation was causing these tests to grab the entire cluster of nodes. Adding the @cluster annotation dramatically reduced the time needed to run these tests. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>	2021-02-22 13:57:17 -08:00
Justine Olshan	a524a751c1	MINOR: Added missing import (KafkaVersion) to kafka.py (#10154 ) Reviewers: Ron Dagostino <rdagostino@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2021-02-19 12:10:12 +08:00
Ron Dagostino	a30f92bf59	MINOR: Add KIP-500 BrokerServer and ControllerServer (#10113 ) This PR adds the KIP-500 BrokerServer and ControllerServer classes and makes some related changes to get them working. Note that the ControllerServer does not instantiate a QuorumController object yet, since that will be added in PR #10070. * Add BrokerServer and ControllerServer * Change ApiVersions#computeMaxUsableProduceMagic so that it can handle endpoints which do not support PRODUCE (such as KIP-500 controller nodes) * KafkaAdminClientTest: fix some lingering references to decommissionBroker that should be references to unregisterBroker. * Make some changes to allow SocketServer to be used by ControllerServer as we as by the broker. * We now return a random active Broker ID as the Controller ID in MetadataResponse for the Raft-based case as per KIP-590. * Add the RaftControllerNodeProvider * Add EnvelopeUtils * Add MetaLogRaftShim * In ducktape, in config_property.py: use a KIP-500 compatible cluster ID. Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>	2021-02-17 21:35:13 -08:00
Ron Dagostino	faaef2c2df	MINOR: Support Raft-based metadata quorums in system tests (#10093 ) We need to be able to run system tests with Raft-based metadata quorums -- both co-located brokers and controllers as well as remote controllers -- in addition to the ZooKepeer-based mode we run today. This PR adds this capability to KafkaService in a backwards-compatible manner as follows. If no changes are made to existing system tests then they function as they always do -- they instantiate ZooKeeper, and Kafka will use ZooKeeper. On the other hand, if we want to use a Raft-based metadata quorum we can do so by introducing a metadata_quorum argument to the test method and using @matrix to set it to the quorums we want to use for the various runs of the test. We then also have to skip creating a ZooKeeperService when the quorum is Raft-based. This PR does not update any tests -- those will come later after all the KIP-500 code is merged. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2021-02-11 09:44:17 -08:00
Dániel Urbán	202ff6336f	KAFKA-5235: GetOffsetShell: Support for multiple topics and consumer configuration override (KIP-635) (#9430 ) This patch implements KIP-635 which mainly adds support for querying offsets of multiple topics/partitions. Reviewers: David Jacot <djacot@confluent.io>	2021-02-11 12:06:21 +01:00
John Roesler	1f240ce179	bump to 2.9 development version	2021-02-07 09:25:36 -06:00
Stanislav Vodetskyi	91d6c55da4	MINOR: Upgrade ducktape to version 0.8.1 (#9933 ) ducktape 0.8.1 was updated to include the following changes/fixes from 0.7.x branch: * Junit reporting support * fix for an issue where unicode characters in exception message would cause test runner to hang on py27. Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>	2021-01-22 20:23:55 -08:00
Justine Olshan	86b9fdef2b	KAFKA-10869: Gate topic IDs behind IBP 2.8 (KIP-516) (#9814 ) Topics processed by the controller and topics newly created will only be given topic IDs if the inter-broker protocol version on the controller is greater than 2.8. This PR also adds a kafka config to specify whether the IBP is greater or equal to 2.8. System tests have been modified to include topic ID checks for upgrade/downgrade tests. This PR also adds a new integration test file for requests/responses that are not gated by IBP (ex: metadata) Reviewers: dengziming <dengziming1993@gmail.com>, Lucas Bradstreet <lucas@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>	2021-01-20 22:32:06 +00:00
John Roesler	be88f5a1aa	MINOR: Fix StreamsOptimizedTest (#9911 ) We have seen recent system test timeouts associated with this test. Analysis revealed an excessive amount of time spent searching for test conditions in the logs. This change addresses the issue by dropping some unnecessary checks and using a more efficient log search mechanism. Reviewers: Bill Bejeck <bbejeck@apache.org>, Guozhang Wang <guozhang@apache.org>	2021-01-19 14:57:34 -06:00
Mickael Maison	966e9dd6a2	MINOR: Updating files with release 2.6.1 (#9844 ) Reviewers: Bill Bejeck <bbejeck@gmail.com>, Matthias J. Sax <mjsax@apache.org>	2021-01-14 12:24:18 +00:00
Bill Bejeck	bf694b2943	MINOR: Add 2.7.0 release to broker and client compat tests (#9774 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@confluent.io>	2021-01-05 09:45:00 -05:00
Chia-Ping Tsai	ac7b5d3389	KAFKA-10893 Increase target_messages_per_sec of ReplicaScaleTest to reduce the run time (#9797 ) Reviewers: David Arthur <mumrah@gmail.com>	2021-01-05 00:23:34 +08:00
Bill Bejeck	b6891f6729	MINOR: Kafka Streams updates for 2.7.0 release (#9773 ) Reviewer: Matthias J. Sax <matthias@confluent.io>	2020-12-22 14:34:59 -08:00
Bill Bejeck	300909d9e6	MINOR: Updating files with latest release 2.7.0 (#9772 ) Changes to trunk for the 2.7.0 release. Updating dependencies.gradle, Dockerfile, and vagrant/bash.sh Reviewers: Matthias J. Sax <mjsax@apache.org>	2020-12-21 11:52:49 -05:00
Chia-Ping Tsai	6e15937feb	KAFKA-10289; Fix failed connect_distributed_test.py (ConnectDistributedTest.test_bounce) (#9673 ) In Python 3, `filter` functions return iterators rather than `list` so it can traverse only once. Hence, the following loop will only see "empty" and then validation fails. ```python src_messages = self.source.committed_messages() # return iterator sink_messages = self.sink.flushed_messages()) # return iterator for task in range(num_tasks): # only first task can "see" the result. following tasks see empty result src_seqnos = [msg['seqno'] for msg in src_messages if msg['task'] == task] ``` Reference: https://portingguide.readthedocs.io/en/latest/iterators.html#new-behavior-of-map-and-filter. Reviewers: Jason Gustafson <jason@confluent.io>	2020-12-09 13:38:17 -08:00
Chia-Ping Tsai	1cf9ce95ad	MINOR: add "flush=True" to all print in system tests (#9711 ) That makes the behavior of print equal to pyhton2. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2020-12-09 11:19:06 -08:00
Chia-Ping Tsai	abb8ff61cc	MINOR: Align the UID inside/outside container (#9652 ) Reviewers: Jason Gustafson <jason@confluent.io>	2020-12-03 10:39:58 +08:00
Bruno Cadonna	60139d5b25	MINOR: fix reading SSH output in Streams system tests (#9665 ) SSH outputs in system tests originating from paramiko are bytes. However, the logger in the system tests does not accept bytes and instead throws an exception. That means, the bytes returned as SSH output from paramiko need to converted to a type that the logger (or other objects) can process. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2020-12-01 10:28:04 -08:00
Luke Chen	9412fc1151	MINOR: Update vagrant/tests readme (#9650 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2020-11-28 13:06:48 +08:00
Tom Bentley	91679f247a	KAFKA-10692: Add delegation.token.secret.key, deprecate ...master.key (#9623 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2020-11-19 15:26:25 +00:00
Walker Carlson	5899f5fc4a	KAFKA-9331: Add a streams specific uncaught exception handler (#9487 ) This PR introduces a streams specific uncaught exception handler that currently has the option to close the client or the application. If the new handler is set as well as the old handler (java thread handler) will be ignored and an error will be logged. The application shutdown is achieved through the rebalance protocol. Reviewers: Bruno Cadonna <cadonna@confluent.io>, Leah Thomas <lthomas@confluent.io>, John Roesler <john@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2020-11-17 22:55:09 -08:00
feyman2016	3e2d1fc8aa	Add system test coverage for group coordinator migration (#9588 ) This newly added system test is to verify that with the fix in #9270 , the member.id update caused by static member rejoin would be persisted correctly. Reviewers: Boyang Chen <boyang@confluent.io>	2020-11-12 19:36:27 -08:00
Gardner Vickers	f978d0551b	MINOR: Increase the amount of time available to the `test_verifiable_producer` (#9201 ) Increase the amount of time available to the `test_verifiable_producer` test to login and get the process name for the verifiable producer from 5 seconds to 10 seconds. We were seeing some test failures due to the assertion failing because the verifiable producer would complete before we could login, list the processes, and parse out the producer version. Previously, we were giving this operation 5 seconds to run, this PR bumps it up to 10 seconds. I verified locally that this does not flake, but even at 5 seconds I wasn't seeing any flakes. Ultimately we should find a better strategy than racing to query the producer process (as outlined in the existing comments). Reviewers: Jason Gustafson <jason@confluent.io>	2020-11-12 13:09:15 -08:00
David Mao	ee1aa07036	MINOR: Fix group_mode_transactions_test (#9538 ) KIP-431 (#9099) changed the format of console consumer output to `Partition:$PARTITION\t$VALUE` whereas previously the output format was `$VALUE\t$PARTITION`. This PR updates the message verifier to accommodate the updated console consumer output format.	2020-10-31 13:43:13 +01:00
Bruno Cadonna	a85b944011	MINOR: Fix verification in StreamsUpgradeTest.test_version_probing_upgrade (#9530 ) The system test StreamsUpgradeTest.test_version_probing_upgrade tries to verify the wrong version for version probing. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2020-10-29 16:10:30 -07:00
Manikumar Reddy	36493efa59	MINOR: fix error in quota_test.py system tests quota_test.py tests are failing with below error. ``` 23:24:42 [INFO:2020-10-24 17:54:42,366]: RunnerClient: kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=user.override_quota=False: FAIL: not enough arguments for format string 23:24:42 Traceback (most recent call last): 23:24:42 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/tests/runner_client.py", line 134, in run 23:24:42 data = self.run_test() 23:24:42 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/tests/runner_client.py", line 192, in run_test 23:24:42 return self.test_context.function(self.test) 23:24:42 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.6/site-packages/ducktape-0.8.0-py3.6.egg/ducktape/mark/_mark.py", line 429, in wrapper 23:24:42 return functools.partial(f, args, kwargs)(w_args, **w_kwargs) 23:24:42 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/client/quota_test.py", line 141, in test_quota 23:24:42 self.quota_config = QuotaConfig(quota_type, override_quota, self.kafka) 23:24:42 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/client/quota_test.py", line 60, in __init__ 23:24:42 self.configure_quota(kafka, self.producer_quota, self.consumer_quota, ['users', None]) 23:24:42 File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/client/quota_test.py", line 83, in configure_quota 23:24:42 (kafka.kafka_configs_cmd_with_optional_security_settings(node, force_use_zk_conection), producer_byte_rate, consumer_byte_rate) 23:24:42 TypeError: not enough arguments for format string 23:24:42 ``` ran thee tests locally. Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: David Jacot <djacot@confluent.io>, Ron Dagostino <rndgstn@gmail.com> Closes #9496 from omkreddy/quota-tests	2020-10-25 14:45:05 +05:30
Nikolay Izhikov	c8c1baf4e1	KAFKA-10592: Fix vagrant for a system tests with python3 Fix vagrant for a system tests with a python3. Author: Nikolay Izhikov <nizhikov@apache.org> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #9480 from nizhikov/KAFKA-10592	2020-10-25 01:09:16 +05:30
Ron Dagostino	c7f19aba37	MINOR: fix system tests sending ACLs through ZooKeeper (#9458 ) Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2020-10-20 13:13:50 +01:00
Ron Dagostino	1636481c5f	MINOR: fix error in quota_test.py system tests (#9443 )	2020-10-15 17:08:42 +01:00
Ron Dagostino	147a19036e	MINOR: ACLs for secured cluster system tests (#9378 ) This PR adds missing broker ACLs required to create topics and SCRAM credentials when ACLs are enabled for a system test. This PR also adds support for using PLAINTEXT as the inter broker security protocol when using SCRAM from the client in a system test with a secured cluster-- without this it would always be necessary to set both the inter-broker and client mechanisms to a SCRAM mechanism. Also contains some refactoring to make assumptions clearer. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2020-10-09 15:34:53 +01:00
bill	4d3036bb4e	Updating trunk versions after cutting branch for 2.7	2020-10-08 07:47:36 -04:00
Nikolay	4e65030e05	KAFKA-10402: Upgrade system tests to python3 (#9196 ) For now, Kafka system tests use python2 which is outdated and not supported. This PR upgrades python to the third version. Reviewers: Ivan Daschinskiy, Mickael Maison <mickael.maison@gmail.com>, Magnus Edenhill <magnus@edenhill.se>, Guozhang Wang <wangguoz@gmail.com>	2020-10-07 09:41:30 -07:00
Nikolay	bc7674fe1b	KAFKA-10505: Fix parsing of generation log string. (#9312 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>	2020-09-23 14:24:02 -07:00
Bruno Cadonna	a46c07ec8d	KAFKA-10292: Set min.insync.replicas to 1 of __consumer_offsets (#9286 ) The test StreamsBrokerBounceTest.test_all_brokers_bounce() fails on 2.5 because in the last stage of the test there is only one broker left and the offset commit cannot succeed because the min.insync.replicas of __consumer_offsets is set to 2 and acks is set to all. This causes a time out and extends the closing of the Kafka Streams client to beyond the duration passed to the close method of the client. This affects especially the 2.5 branch since there Kafka Streams commits offsets for each task, i.e., close() needs to wait for the timeout for each task. In 2.6 and trunk the offset commit is done per thread, so close() does only need to wait for one time out per stream thread. I opened this PR on trunk, since the test could also become flaky on trunk and we want to avoid diverging system tests across branches. A more complete solution would be to improve the test by defining a better success criteria. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2020-09-15 11:12:37 -07:00
Ron Dagostino	ebd64b5d55	KAFKA-10131: Remove use_zk_connection flag from ducktape (#9274 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2020-09-14 15:56:21 -07:00
Chia-Ping Tsai	ee68b999c4	KAFKA-10463: Install `git` explicitly in Dockerfile (#9257 ) `openjdk:8` includes `git` by default, but `openjdk:11` does not. Install `git` explicitly to make it easier to test with newer openjdk versions. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2020-09-14 15:22:00 -07:00
Ron Dagostino	e8524ccd8f	KAFKA-10259: KIP-554 Broker-side SCRAM Config API (#9032 ) Implement the KIP-554 API to create, describe, and alter SCRAM user configurations via the AdminClient. Add ducktape tests, and modify JUnit tests to test and use the new API where appropriate. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Rajini Sivaram <rajinisivaram@googlemail.com>	2020-09-04 13:05:01 -07:00
Justine Olshan	a027b9a934	MINOR: Fix typo in ducker-ak test example Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2020-08-22 00:30:22 +05:30
Sanjana Kaundinya	b7856df21b	MINOR: Include security configs for topic delete in system tests (#9142 ) Reviewers: Ron Dagostino <rdagostino@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>	2020-08-19 13:37:27 +01:00
Andrew Egelhofer	f6c26eaa04	MINOR: Use new version of ducktape ducktape diff: https://github.com/confluentinc/ducktape/compare/v0.7.8...v0.7.9 - bcrypt (a dependency of ducktape) dropped Python2.7 support. ducktape-0.7.9 now pins bcrypt to a Python2.7-supported version. Author: Andrew Egelhofer <aegelhofer@confluent.io> Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com> Closes #9192 from andrewegel/trunk	2020-08-18 07:11:24 +05:30
Sanjana Kaundinya	f7a4fe7c14	MINOR: fix the way total consumed is calculated for verifiable consumer (#9143 ) Reviewers: Ron Dagostino <rdagostino@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>	2020-08-16 11:15:41 +01:00
John Roesler	7159c6ddd0	MINOR: bump 2.5 versions to 2.5.1 (#9165 ) Reviewers: Bill Bejeck <bbejeck@apache.org>	2020-08-11 15:18:33 -05:00
Randall Hauch	1112fd4723	KAFKA-10341: Add 2.6.0 to system tests and streams upgrade tests (#9116 ) Author: Randall Hauch <rhauch@gmail.com> Reviewer: Matthias J. Sax <matthias@confluent.io>	2020-08-04 18:04:52 -05:00
Bruno Cadonna	ac3a51d013	MINOR: Remove staticmethod tag to be able to use logger of instance (#9086 ) A system test failed with the following error: global name 'self' is not defined The reason was that `self` was accessed to log a message in a static method. This commit makes the method an instance method. Reviewer: Matthias J. Sax <matthias@confluent.io>	2020-07-27 09:38:14 -07:00
Chia-Ping Tsai	0d5c967073	KAFKA-10300 fix flaky core/group_mode_transactions_test.py (#9059 ) the root cause is same to #9026 so I copy the approach of #9026 to resolve core/group_mode_transactions_test.py Reviewers: Jun Rao <junrao@gmail.com>	2020-07-23 15:59:57 -07:00
Jason Gustafson	67f5b5de77	KAFKA-10274; Consistent timeouts in transactions_test (#9026 ) KAFKA-10235 fixed a consistency issue with the transaction timeout and the progress timeout. Since the test case relies on transaction timeouts, we need to wait at last as long as the timeout in order to ensure progress. However, having a low transaction timeout makes the test prone to the issue identified in KAFKA-9802, in which the coordinator timed out the transaction while the producer was awaiting a Produce response. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Boyang Chen <boyang@confluent.io>, Jun Rao <junrao@gmail.com>	2020-07-22 12:06:47 -07:00
Manikumar Reddy	c38825ab97	KAFKA-9432:(follow-up) Set `configKeys` to null in `describeConfigs()` to make it backward compatible with older Kafka versions. - After #8312, older brokers are returning empty configs, with latest `adminClient.describeConfigs`. Old brokers are receiving empty configNames in `AdminManageer.describeConfigs()` method. Older brokers does not handle empty configKeys. Due to this old brokers are filtering all the configs. - Update ClientCompatibilityTest to verify describe configs - Add test case to test describe configs with empty configuration Keys Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com> Closes #9046 from omkreddy/KAFKA-9432	2020-07-21 17:32:11 +05:30
Greg Harris	f4944ee460	KAFKA-10295: Wait for connector recovery in test_bounce (#9043 ) Signed-off-by: Greg Harris <gregh@confluent.io>	2020-07-20 08:50:05 -05:00
Greg Harris	5a2a7c6348	KAFKA-10286: Connect system tests should wait for workers to join group (#9040 ) Currently, the system tests `connect_distributed_test` and `connect_rest_test` only wait for the REST api to come up. The startup of the worker includes an asynchronous process for joining the worker group and syncing with other workers. There are some situations in which this sync takes an unusually long time, and the test continues without all workers up. This leads to flakey test failures, as worker joins are not given sufficient time to timeout and retry without waiting explicitly. This changes the `ConnectDistributedTest` to wait for the Joined group message to be printed to the logs before continuing with tests. I've activated this behavior by default, as it's a superset of the checks that were performed by default before. This log message is present in every version of DistributedHerder that I could find, in slightly different forms, but always with `Joined group` at the beginning of the log message. This change should be safe to backport to any branch. Signed-off-by: Greg Harris <gregh@confluent.io> Author: Greg Harris <gregh@confluent.io> Reviewer: Randall Hauch <rhauch@gmail.com>	2020-07-20 08:48:02 -05:00
Manikumar Reddy	b02fa53419	MINOR: Enable broker/client compatibility tests for 2.5.0 release - Add missing broker/client compatibility tests for 2.5.0 release Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com> Closes #9041 from omkreddy/compat	2020-07-20 18:20:48 +05:30
Jason Gustafson	6d2c7802da	MINOR: Fix flaky system test assertion after static member fencing (#9033 ) The test case `OffsetValidationTest.test_fencing_static_consumer` fails periodically due to this error: ``` Traceback (most recent call last): File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/tests/runner_client.py", line 134, in run data = self.run_test() File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/tests/runner_client.py", line 192, in run_test return self.test_context.function(self.test) File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/mark/_mark.py", line 429, in wrapper return functools.partial(f, args, kwargs)(w_args, **w_kwargs) File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/tests/kafkatest/tests/client/consumer_test.py", line 257, in test_fencing_static_consumer assert len(consumer.dead_nodes()) == num_conflict_consumers AssertionError ``` When a consumer stops, there is some latency between when the shutdown is observed by the service and when the node is added to the dead nodes. This patch fixes the problem by giving some time for the assertion to be satisfied. Reviewers: Boyang Chen <boyang@confluent.io>	2020-07-17 11:27:33 -07:00
vinoth chandar	796fae25c3	KAFKA-10174: Prefer --bootstrap-server for configs command in ducker tests (#8948 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2020-07-16 09:01:46 -07:00
Chia-Ping Tsai	598a0d16fa	KAFKA-10257 system test kafkatest.tests.core.security_rolling_upgrade_test fails (#9021 ) security_rolling_upgrade_test may change the security listener and then restart Kafka servers. has_sasl and has_ssl get out-of-date due to cached _security_config. This PR offers a simple fix that we always check the changes of port mapping and then update the sasl/ssl flag. Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>	2020-07-15 11:33:49 -07:00
Chia-Ping Tsai	e099b58df5	KAFKA-10235 Fix flaky transactions_test.py (#8981 ) Reducing timeout of transaction to clean up the unstable offsets quicker. IN hard_bounce mode, transactional client is killed ungracefully. Hence, it produces unstable offsets which obstructs TransactionalMessageCopier from receiving position of group. Reviewers: Jun Rao <junrao@gmail.com>	2020-07-09 09:33:07 -07:00
Chia-Ping Tsai	80cab851ee	KAFKA-10225 Increase default zk timeout for system tests (#8974 ) Increase ZK connection and session timeout in system tests to match the defaults. Reviewers: Jun Rao <junrao@gmail.com>	2020-07-08 13:19:40 -07:00
Chia-Ping Tsai	6953161125	KAFKA-10191 fix flaky StreamsOptimizedTest (#8913 ) Call KafkaStreams#cleanUp to reset local state before starting application up the second run. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>, John Roesler <john@confluent.io>	2020-07-07 12:48:36 -05:00
John Roesler	34f749db30	MINOR: prune the metadata upgrade test matrix (#8971 ) Most of the values in the metadata upgrade test matrix are just testing the upgrade/downgrade path between two previous releases. This is unnecessary. We run the tests for all supported branches, so what we should test is the up-/down-gradability of released versions with respect to the current branch. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2020-07-06 18:52:51 -05:00
Chia-Ping Tsai	72042f26af	KAFKA-10209: Fix connect_rest_test.py after the introduction of new connector configs (#8944 ) There are two new configs introduced by `371f14c3c1` and `1c4eb1a575` so we have to update the expected configs in the connect_rest_test.py system test too. Reviewer: Konstantine Karantasis <konstantine@confluent.io>	2020-07-03 10:38:42 -07:00
John Roesler	3b2ae7b95a	KAFKA-10173: Use SmokeTest for upgrade system tests (#8938 ) Replaces the previous upgrade test's trivial Streams app with the commonly used SmokeTest, exercising many more features. Also adjust the test matrix to test upgrading from each released version since 2.2 to the current branch. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2020-07-02 18:14:46 -05:00
Chia-Ping Tsai	6094af8974	KAFKA-10214: Fix zookeeper_tls_test.py system test After 3661f981fff2653aaf1d5ee0b6dde3410b5498db security_config is cached. Hence, the later changes to security flag can't impact the security_config used by later tests. issue: https://issues.apache.org/jira/browse/KAFKA-10214 Author: Chia-Ping Tsai <chia7712@gmail.com> Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com> Closes #8949 from chia7712/KAFKA-10214	2020-07-01 17:08:54 +05:30
Bruno Cadonna	f3a9ce4a69	MINOR: Do not swallow exception when collecting PIDs (#8914 ) During Streams' system tests the PIDs of the Streams clients are collected. The method the collects the PIDs swallows any exception that might be thrown by the ssh_capture() function. Swallowing any exceptions might make the investigation of failures harder, because no information about what happened are recorded. Reviewers: John Roesler <vvcephei@apache.org>	2020-06-30 12:18:23 -05:00
Nikolay	3661f981ff	KAFKA-10180: Fix security_config caching in system tests (#8917 ) Reviewers: Jun Rao <junrao@gmail.com>	2020-06-27 09:27:49 -07:00
vinoth chandar	54dbd041bc	KAFKA-10138: Prefer --bootstrap-server for reassign_partitions command in ducktape tests (#8898 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2020-06-19 12:35:49 -07:00
Ego	d3d65dd5dd	MINOR: Upgrade ducktape to 0.7.8 (#8879 ) Newer version of ducktape that updates some dependencies and adds some features. You can see that diff here: https://github.com/confluentinc/ducktape/compare/v0.7.7...v0.7.8 Reviewer: Konstantine Karantasis <konstantine@confluent.io>	2020-06-17 21:53:22 -07:00
Nikolay	8b22b81596	KAFKA-9320: Enable TLSv1.3 by default (KIP-573) (#8695 ) 1. Enables `TLSv1.3` by default with Java 11 or newer. 2. Add unit tests that cover the various TLSv1.2 and TLSv1.3 combinations. 3. Extend `benchmark_test.py` and `replication_test.py` to run with 'TLSv1.2' or 'TLSv1.3'. Reviewers: Ismael Juma <ismael@juma.me.uk>	2020-06-02 15:34:43 -07:00
Randall Hauch	19e40788e7	Bump trunk to 2.7.0-SNAPSHOT (#8746 )	2020-06-01 21:23:09 -05:00
Jason Gustafson	d9fe30dab0	KAFKA-9802; Increase transaction timeout in system tests to reduce flakiness (#8736 ) We have been seeing increased flakiness in transaction system tests. I believe the cause might be due to KIP-537, which increased the default zk session timeout from 6s to 18s and the default replica lag timeout from 10s to 30s. In the system test, we use the default transaction timeout of 10s. However, since the system test involves hard failures, the Produce request could be blocking for as long as the max of these two in order to wait for an ISR shrink. Hence this patch increases the timeout to 30s. Note this patch also includes a minor logging fix in `Partition`. Previously we would see messages like the following: ``` [Broker id=3] Leader output-topic-0 starts at leader epoch 0 from offset 0 with high watermark 0 ISR 3,2,1 addingReplicas removingReplicas .Previous leader epoch was -1. ``` This patch fixes the log to print as the following: ``` [Broker id=3] Leader output-topic-0 starts at leader epoch 0 from offset 0 with high watermark 0 ISR [3,2,1] addingReplicas [] removingReplicas []. Previous leader epoch was -1. ``` Reviewers: Bob Barrett <bob.barrett@confluent.io>, Ismael Juma <github@juma.me.uk>	2020-05-27 20:54:09 -07:00
Nikolay	2951b6dd99	KAFKA-10050: kafka_log4j_appender.py fixed for JDK11 (#8731 ) kafka_log4j_appender.py was broken on JDK11 by `befd80b38`. `fix_opts_for_new_jvm` requires `node.version` to be set, we add the relevant code to the test. Reviewers: Ismael Juma <ismael@juma.me.uk>	2020-05-27 20:49:57 -07:00
John Roesler	2cff1fab3f	KAFKA-6145: KIP-441: Fix assignor config passthough (#8716 ) Also fixes a system test by configuring the HATA to perform a one-shot balanced assignment Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>	2020-05-27 13:50:12 -05:00
Magnus Edenhill	4aa4786a81	MINOR: Deploy VerifiableClient in constructor to avoid test timeouts (#8651 ) Previous to this fix a plugged-in verifiable client, such as confluent-kafka-python, would be deployed on the node in the background worker thread as the client was started. Since this could be time consuming (e.g., 10+ seconds) and since the main test thread would continue to operate, it was common for the current test to time out waiting for e.g. the verifiable producer to produce messages while it was in fact still deploying. The fix here is to deploy the verifiable client on the node when the verifiable client is instantiated, which is thus a blocking operation on the main test thread, avoiding any test-based timeouts. Reviewers: Jason Gustafson <jason@confluent.io>	2020-05-21 09:59:32 -07:00
Boyang Chen	fad8db67bb	MINOR: add option to rebuild source for system tests (#6656 ) Reviewers: Jason Gustafson <jason@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2020-05-13 17:54:43 -07:00
A. Sophie Blee-Goldman	58f7a97314	KAFKA-9821: consolidate Streams rebalance triggering mechanisms (#8596 ) Persist followup rebalance in assignment and consolidate rebalance triggering mechanisms Reviewers: John Roesler <vvcephei@apache.org>	2020-05-12 15:57:18 -05:00
Bruno Cadonna	c19a3be198	KAFKA-6145: Set HighAvailabilityTaskAssignor as default in streams_upgrade_test.py (#8613 ) Generalize the verification in the upgrade test so that it does not rely on the task assignor's behavior. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>	2020-05-07 21:10:44 -05:00
Lucas Bradstreet	4e1d6a3d04	MINOR: add support for kafka 2.4 and 2.5 to downgrade test The downgrade test does not currently support 2.4 and 2.5. When you enable them, it fails as a result of consumer group static membership. This PR makes the downgrade test work with all of our released versions again. Author: Lucas Bradstreet <lucas@confluent.io> Reviewers: Boyang Chen, Gwen Shapira Closes #8518 from lbradstreet/downgrade-test-2.4-2.5	2020-04-28 18:01:24 -07:00
John Roesler	5bb3415c77	KAFKA-6145: KIP-441: Add TaskAssignor class config (#8541 ) * add a config to set the TaskAssignor * set the default assignor to HighAvailabilityTaskAssignor * fix broken tests (with some TODOs in the system tests) Implements: KIP-441 Reviewers: Bruno Cadonna <bruno@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>	2020-04-28 15:57:11 -05:00
John Roesler	88eae49a8d	MINOR: document how to escape json parameters to ducktape tests (#8546 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2020-04-27 18:22:00 -05:00
Bruno Cadonna	362d199dbe	HOTFIX: Fix broker bounce system tests (#8532 ) Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2020-04-24 08:49:47 -07:00
Lucas Bradstreet	b161358998	MINOR: Downgrade test should wait for ISR rejoin between rolls (#8495 ) I added a change to the upgrade test a while back that would make it wait for ISR rejoin before rolls. This prevents incompatible brokers charging through a bad roll and disguising a downgrade problem. We now also check for protocol errors in the broker logs. Reviewers: Boyang Chen <boyang@confluent.io>, Ismael Juma <ismael@juma.me.uk>	2020-04-23 00:15:43 -07:00
Boyang Chen	df41713d64	KAFKA-9779: Add Stream system test for 2.5 release (#8378 ) Reviewer: Matthias J. Sax <matthias@confluent.io>	2020-04-15 15:59:03 -07:00
Matthias J. Sax	17f9879261	KAFKA-9832: extend Kafka Streams EOS system test (#8440 ) Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2020-04-15 13:13:23 -07:00
Rajini Sivaram	8820055744	KAFKA-9797; Fix TestSecurityRollingUpgrade.test_enable_separate_interbroker_listener (#8403 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>	2020-04-15 13:04:11 +01:00
Ewen Cheslack-Postava	cadb3499ff	MINOR: Upgrade ducktape to 0.7.7 (#8487 ) This fixes a version pinning issue where a transitive dependency had a major version upgrade that a dependency did not account for, breaking the build. Reviewers: Andrew Egelhofer <aegelhofer@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2020-04-14 16:36:52 -07:00
Boyang Chen	ea47a885b1	MINOR: remove stream simple benchmark suite (#8353 ) Reviewers: Matthias J. Sax <matthias@confluent.io>	2020-04-14 09:49:03 -07:00
Matthias J. Sax	20e4a74c35	KAFKA-9832: Extend Streams system tests for EOS-beta (#8443 ) Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2020-04-10 11:55:01 -07:00
Boyang Chen	7f640f13b4	KAFKA-9776: Downgrade TxnCommit API v3 when broker doesn't support (#8375 ) Revert the decision for the sendOffsetsToTransaction(groupMetadata) API to fail with old version of brokers for the sake of making the application easier to adapt between versions. This PR silently downgrade the TxnOffsetCommit API when the build version is small than 3. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2020-04-02 21:48:37 -07:00
Ron Dagostino	f0ad03069a	MINOR: System test ZooKeeper upgrades (#8384 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2020-04-02 23:23:48 +05:30
Michal T	86a3ebe537	MINOR: Fix typo in version 2.4.1 of kafka folder in Dockerfile (#8393 )	2020-04-01 17:56:47 -07:00
Matthias J. Sax	6ad5407350	KAFKA-9719: Streams with EOS-beta should fail fast for older brokers (#8367 ) Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2020-03-30 15:21:27 -07:00
Bill Bejeck	c725c2338b	MINOR: Update dependencies.gradle, Dockerfile, version.py, and bash.sh for 2.4.1 upgrade (#8387 ) These files were missed in the 2.4.1 release Reviewers: Ismael Juma <ismael@confluent.io>	2020-03-30 12:55:35 -04:00
Nikolay	befd80b38d	KAFKA-9573: Fix JVM options to run early versions of Kafka on the latest JVMs (#8138 ) Startup scripts for the early version of Kafka contain removed JVM options like `-XX:+PrintGCDateStamps` or `-XX:UseParNewGC`. When system tests run on JVM that doesn't support these options we should set up environment variables with correct options. Reviewers: Guozhang Wang <guozhang@confluent.io>, Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <ismael@juma.me.uk	2020-03-25 10:31:07 -07:00
A. Sophie Blee-Goldman	e1cbefef60	HOTFIX: fix log message in version probing system test (#8341 ) Reviewer: Matthias J. Sax <matthias@confluent.io>	2020-03-24 21:46:37 -07:00
Rajini Sivaram	6b419933a0	KAFKA-9662: Wait for consumer offset reset in throttle test to avoid losing early messages (#8227 )	2020-03-06 14:50:22 -05:00
A. Sophie Blee-Goldman	674360f5b3	KAFKA-6145: Encode task positions in SubscriptionInfo (#8121 ) * Replace Prev/Standby task lists with a representation of the current poasition of all tasks, where each task is encoded as the sum of the positions of all the changelogs in that task. * Only the protocol change is implemented, not actual positions, and the assignor is updated to translate the new protocol back to lists of Prev/Standby tasks so that the current assignment protocol still functions without modification. Implements: KIP-441 Reviewers: John Roesler <vvcephei@apache.org>, Bruno Cadonna <bruno@confluent.io>	2020-03-06 09:19:04 -06:00
A. Sophie Blee-Goldman	a1f2ece323	KAFKA-9525: add enforceRebalance method to Consumer API (#8087 ) As described in KIP-568. Waiting on acceptance of the KIP to write the tests, on the off chance something changes. But rest assured unit tests are coming ⚡️ Will also kick off existing Streams system tests which leverage this new API (eg version probing, sometimes broker bounce) Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2020-02-29 18:44:22 -08:00
Brian Bushree	72a5aa8b07	MINOR: add wait_for_assigned_partitions to console-consumer (#8192 ) what/why the throttling_test was broken by this PR (#7785) since it depends on the consumer having partitions-assigned before starting the producer this PR provides the ability to wait for partitions to be assigned in the console consumer before considering it started. caveat this does not support starting up the JmxTool inside the console-consumer for custom metrics while using this wait_until_partitions_assigned flag since the code assumes one JmxTool running per node. I think a proper fix for this would be to make JmxTool its own standalone single-node service alternatives we could use the EndToEnd test suite which uses the verifiable producer/consumer under the hood but I found that there were more changes necessary to get this working unfortunately (specifically doesn't seem like this test suite plays nicely with the ProducerPerformanceService) Reviewers: Mathew Wong <mwong@confluent.io>, Bill Bejeck <bbejeck.com>	2020-02-29 19:43:51 -05:00
Matthew Wong	294b62963b	throttle consumer timeout increase (#8188 ) The test_throttled_reassignment test fails because the consumer that is used to validate reassignment does not start on time to consume all messages. This does not seem like an issue with the throttling of the reassignment, since increasing the timeout allowed the test to pass multiple consecutive runs locally. This test seemed to rely on the default JmxTool for the console consumer that was removed in this commit: `179d0d7` The console consumer would check to see if it had partitions assigned to it before beginning to consume. Although the test occasionally failed with the JmxTool, it began to fail much more after the removal. Error messages of failures followed the below format with varying numbers of missed messages. They are the first messages by the producer. 535 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 515 more. Total Acked: 192792, Total Consumed: 192259. We validated that the first 535 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer. In the scope of the test, this error suggests that the test is falling into the race condition described in produce_consume_validate.py, which has the timeout to prevent the consumer from missing initial messages. This can serve as a temporary fix until the logic of consumer startup is addressed further. Reviewers: Jason Gustafson <jason@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	2020-02-27 17:46:55 -05:00
Ron Dagostino	9d53ad794d	KAFKA-9567: Docs, system tests for ZooKeeper 3.5.7 These changes depend on [KIP-515: Enable ZK client to use the new TLS supported authentication](https://cwiki.apache.org/confluence/display/KAFKA/KIP-515%3A+Enable+ZK+client+to+use+the+new+TLS+supported+authentication), which was only added to 2.5.0. The upgrade to ZooKeeper 3.5.7 was merged to both 2.5.0 and 2.4.1 via https://issues.apache.org/jira/browse/KAFKA-9515, but this change must only be merged to 2.5.0 (it will break the system tests if merged to 2.4.1). Author: Ron Dagostino <rdagostino@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Andrew Choi <li_andchoi@microsoft.com> Closes #8132 from rondagostino/KAFKA-9567	2020-02-25 19:59:55 +05:30
Nikolay	f364281431	KAFKA-9319: Fix generation of CA certificate for system tests. (#8106 ) Newer versions of Java have added checks to ensure that trust anchors are CA certificates and contain proper extensions. This PR adds Basic Constraints extension with the CA field set to true for system tests. Reviewers: ajini Sivaram <rajinisivaram@googlemail.com>	2020-02-17 09:49:35 +00:00
Boyang Chen	07db26c20f	KAFKA-9417: New Integration Test for KIP-447 (#8000 ) This change mainly have 2 components: 1. extend the existing transactions_test.py to also try out new sendTxnOffsets(groupMetadata) API to make sure we are not introducing any regression or compatibility issue a. We shrink the time window to 10 seconds for the txn timeout scheduler on broker so that we could trigger expiration earlier than later 2. create a completely new system test class called group_mode_transactions_test which is more complicated than the existing system test, as we are taking rebalance into consideration and using multiple partitions instead of one. For further breakdown: a. The message count was done on partition level, instead of global as we need to visualize the per partition order throughout the test. For this sake, we extend ConsoleConsumer to print out the data partition as well to help message copier interpret the per partition data. b. The progress count includes the time for completing the pending txn offset expiration c. More visibility and feature improvements on TransactionMessageCopier to better work under either standalone or group mode. Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2020-02-12 12:34:12 -08:00
Ron Dagostino	342f13a838	KAFKA-8843: KIP-515: Zookeeper TLS support Signed-off-by: Ron Dagostino <rdagostinoconfluent.io> Author: Ron Dagostino <rdagostino@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com> Closes #8003 from rondagostino/KAFKA-8843	2020-02-08 21:16:48 +05:30
Guozhang Wang	4090f9a2b0	KAFKA-9113: Clean up task management and state management (#7997 ) This PR is collaborated by Guozhang Wang and John Roesler. It is a significant tech debt cleanup on task management and state management, and is broken down by several sub-tasks listed below: Extract embedded clients (producer and consumer) into RecordCollector from StreamTask. guozhangwang#2 guozhangwang#5 Consolidate the standby updating and active restoring logic into ChangelogReader and extract out of StreamThread. guozhangwang#3 guozhangwang#4 Introduce Task state life cycle (created, restoring, running, suspended, closing), and refactor the task operations based on the current state. guozhangwang#6 guozhangwang#7 Consolidate AssignedTasks into TaskManager and simplify the logic of changelog management and task management (since they are already moved in step 2) and 3)). guozhangwang#8 guozhangwang#9 Also simplified the StreamThread logic a bit as the embedded clients / changelog restoration logic has been moved into step 1) and 2). guozhangwang#10 Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Boyang Chen <boyang@confluent.io>	2020-02-04 21:06:39 -08:00
David Arthur	7e776b0462	Bump trunk to 2.6.0-SNAPSHOT (#8026 )	2020-02-03 13:04:56 -05:00
vinoth chandar	71c5729a41	KAFKA-6144: Add KeyQueryMetadata APIs to KafkaStreams (#7960 ) Deprecate existing metadata query APIs in favor of new ones that include standby hosts as well as partition information. Closes: #7960 Implements: KIP-535 Co-authored-by: Navinder Pal Singh Brar <navinder_brar@yahoo.com> Reviewed-by: John Roesler <vvcephei@apache.org>	2020-01-15 09:39:02 -06:00
Brian Bushree	422bc1f0fa	MINOR: Disable JmxTool in kafkatest console-consumer by default (#7785 ) Do not initialize `JmxTool` by default when running console consumer. In order to support this, we remove `has_partitions_assigned` and its only usage in an assertion inside `ProduceConsumeValidateTest`, which did not seem to contribute much to the validation. Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>	2020-01-09 16:53:36 -08:00
A. Sophie Blee-Goldman	3453e9e2ee	HOTFIX: fix system test race condition (#7836 ) In some system tests a Streams app is started and then prints a message to stdout, which the system test waits for to confirm the node has successfully been brought up. It then greps for certain log messages in a retriable loop. But waiting on the Streams app to start/print to stdout does not mean the log file has been created yet, so the grep may return an error. Although this occurs in a retriable loop it is assumed that grep will not fail, and the result is piped to wc and then blindly converted to an int in the python function, which fails since the error message is a string (throws ValueError) We should catch the ValueError and return a 0 so it can try again rather than immediately crash Reviewers: Bill Bejeck <bbejeck@gmail.com>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>	2019-12-31 18:44:31 -08:00
Lucas Bradstreet	8fd7cd6a43	MINOR: upgrade system test should check for ISR rejoin on each roll (#7827 ) The upgrade system test correctly rolls by upgrading the broker and leaving the IBP, and then rolling again with the latest IBP version. Unfortunately, this is not sufficient to pick up many problems in our IBP gating as we charge through the rolls and after the second roll all of the brokers will rejoin the ISR and the test will be treated as a success. This test adds two new checks: 1. We wait for the ISR to stabilize for all partitions. This is best practice during rolls, and is enough to tell us if a broker hasn't rejoined after each roll. 2. We check the broker logs for some common protocol errors. This is a fail safe as it's possible for the test to be successful even if some protocols are incompatible and the ISR is rejoined. Reviewers: Nikhil Bhatia <nikhil@confluent.io>, Jason Gustafson <jason@confluent.io>	2019-12-30 11:02:30 -08:00
Bruno Cadonna	1d21cf166a	KAFKA-9305: Add version 2.4 to Streams system tests (#7841 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2019-12-20 14:21:12 -08:00
Manikumar Reddy	b50d925e07	MINOR: Add compatibility tests for 2.4.0 (#7838 ) Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2019-12-17 19:44:32 +05:30
John Roesler	717ce42a6d	KAFKA-9138: Add system test for relational joins (#7664 ) Add a system test to verify the new foreign-key join introduced in KIP-213 Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-12-11 09:48:23 -08:00
Randall Hauch	39d68de393	MINOR: Simplify the timeout logic to handle protocol in Connect distributed system tests (#7806 )	2019-12-10 12:30:49 -06:00
Randall Hauch	ccded348eb	MINOR: Bump system test version from 2.2.1 to 2.2.2 (#7765 ) Author: Randall Hauch <rhauch@gmail.com> Reviewer: Ismael Juma <ismael@confluent.io>	2019-12-06 15:44:56 -06:00
Randall Hauch	9d9139024a	MINOR: Increase the timeout in one of Connect's distributed system tests (#7789 ) Author: Randall Hauch <rhauch@gmail.com> Reviewers: Nigel Liang <nigel@nigelliang.com>, Roesler <john@confluent.io>	2019-12-06 15:33:43 -06:00
David Arthur	91f948763d	Actually run the delete topic command in kafka.py (#7776 ) Reviewed-By: Jason Gustafson <jason@confluent.io>	2019-12-04 17:54:37 -05:00
Jason Gustafson	e057d61050	MINOR: Fix --enable-autocommit flag in verifiable consumer (#7743 ) The --enable-autocommit argument is a flag. It does not take a parameter. This was broken in #7724. Reviewers: Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>	2019-11-23 14:18:42 -08:00
David Arthur	b15e05d925	KAFKA-9123 Test a large number of replicas (#7621 ) Two tests using 50k replicas on 8 brokers: * Do a rolling restart with clean shutdown, delete topics * Run produce bench and consumer bench on a subset of topics Reviewed-By: David Jacot <djacot@confluent.io>, Vikas Singh <vikas@confluent.io>, Jason Gustafson <jason@confluent.io>	2019-11-22 19:32:08 -05:00
Jason Gustafson	9d8ab3a1a2	KAFKA-8509; Add downgrade system test (#7724 ) This patch adds a basic downgrade system test. It verifies that producing and consuming continues to work before and after the downgrade. Reviewers: Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>	2019-11-22 10:09:13 -08:00
Bruno Cadonna	2604d39b9f	MINOR: Fix Streams EOS system tests by adding clean-up of state dir (#7693 ) Recently, system tests test_rebalance_[simple\|complex] failed repeatedly with a verfication error. The cause was most probably the missing clean-up of a state directory of one of the processors. A node is cleaned up when a service on that node is started and when a test is torn down. If the clean-up flag clean_node_enabled of a EOS Streams service is unset, the clean-up of the node is skipped. The clean-up flag of processor1 in the EOS tests should stay set before its first start, so that the node is cleaned before the service is started. Afterwards for the multiple restarts of processor1 the cleans-up flag should be unset to re-use the local state. After the multiple restarts are done, the clean-up flag of processor1 should again be set to trigger node clean-up during the test teardown. A dirty node can lead to test failures when tests from Streams EOS tests are scheduled on the same node, because the state store would not start empty since it reads the local state that was not cleaned up. Reviewers: Matthias J. Sax <mjsax@apache.org>, Andrew Choi <andchoi@linkedin.com>, Bill Bejeck <bbejeck@gmail.com>	2019-11-21 10:32:31 -05:00
David Arthur	cca0225390	Fix missing reference in kafka.py (#7715 ) Also fix a default value for a dictionary arg	2019-11-19 15:19:48 -05:00
David Arthur	d04699486d	KAFKA-8981 Add rate limiting to NetworkDegradeSpec (#7446 ) * Add rate limiting to tc * Feedback from PR * Add a sanity test for tc * Add iperf to vagrant scripts * Dynamically determine the network interface * Add some temp code for testing on AWS * Temp: use hostname instead of external IP * Temp: more AWS debugging * More AWS WIP * More AWS temp * Lower latency some * AWS wip * Trying this again now that ping should work * Add cluster decorator to tests * Fix broken import * Fix device name * Fix decorator arg * Remove errant import * Increase timeouts * Fix tbf command, relax assertion on latency test * Fix log line * Final bit of cleanup * Newline * Revert Trogdor retry count * PR feedback * More PR feedback * Feedback from PR * Remove unused argument	2019-11-18 20:36:00 -05:00
Brian Bushree	9fb22868fe	[MINOR] allow additional JVM args in KafkaService (#7297 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>, Vikas Singh <vikas@confluent.io>	2019-11-18 10:50:36 -08:00
Colin Patrick McCabe	7f49674439	KAFKA-8746: Kibosh must handle an empty JSON string from Trogdor (#7155 ) When Trogdor wants to clear all the faults injected to Kibosh, it sends the empty JSON object {}. However, Kibosh expects {"faults":[]} instead. Kibosh should handle the empty JSON object, since that's consistent with how Trogdor handles empty JSON fields in general (if they're empty, they can be omitted). We should also have a test for this. Reviewers: David Arthur <mumrah@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>	2019-11-15 15:13:32 -08:00
John Roesler	cac85601a0	KAFKA-9169: fix standby checkpoint initialization (#7681 ) Instead of caching the checkpoint map during StandbyTask initialization, use the latest checkpoints (which would have been updated during suspend). Reviewers: Bill Bejeck <bill@confluent.io>	2019-11-13 22:03:44 -06:00
Colin P. Mccabe	67fd88050f	KAFKA-8984: Improve tagged fields documentation Author: Colin P. Mccabe <cmccabe@confluent.io> Reviewers: Vikas Singh <vikas@confluent.io>, Jason Gustafson <jason@confluent.io> Closes #7477 from cmccabe/KAFKA-8984	2019-11-09 10:37:48 +05:30
Jason Gustafson	903d66e2f9	KAFKA-9079: Fix reset logic in transactional message copier The consumer's `committed` API does not return an entry in the response map for a requested partition if there is no committed offset. The transactional message copier, which is used in the transaction system test, did not account for this. If the first transaction attempted by the copier was randomly aborted, then we would not seek to the beginning as expected, which means we would fail to copy some of the records. This patch fixes the problem by iterating over the assignment rather than the result of `committed` when resetting offsets. It also adds enables additional logging in the transaction message copier service to make finding problems easier in the future. Author: Jason Gustafson <jason@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7653 from hachikuji/fix-transaction-system-test	2019-11-06 15:59:51 +05:30
Bruno Cadonna	065411aa22	KAFKA-9077: Fix reading of metrics of Streams' SimpleBenchmark (#7610 ) With KIP-444 the metrics definitions are refactored. Thus, Streams' SimpleBenchmark needs to be updated to correctly access the refactored metrics. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>	2019-10-29 16:02:15 -04:00
Boyang Chen	465f810730	KAFKA-8972 (2.4 blocker): correctly release lost partitions during consumer.unsubscribe() (#7441 ) Inside onLeavePrepare we would look into the assignment and try to revoke the owned tasks and notify users via RebalanceListener#onPartitionsRevoked, and then clear the assignment. However, the subscription's assignment is already cleared in this.subscriptions.unsubscribe(); which means user's rebalance listener would never be triggered. In other words, from consumer client's pov nothing is owned after unsubscribe, but from the user caller's pov the partitions are not revoked yet. For callers like Kafka Streams which rely on the rebalance listener to maintain their internal state, this leads to inconsistent state management and failure cases. Before KIP-429 this issue is hidden away since every time the consumer re-joins the group later, it would still revoke everything anyways regardless of the passed-in parameters of the rebalance listener; with KIP-429 this is easier to reproduce now. Our fixes are following: • Inside unsubscribe, first do onLeavePrepare / maybeLeaveGroup and then subscription.unsubscribe. This we we are guaranteed that the streams' tasks are all closed as revoked by then. • [Optimization] If the generation is reset due to fatal error from join / hb response etc, then we know that all partitions are lost, and we should not trigger onPartitionRevoked, but instead just onPartitionsLost inside onLeavePrepare. This is because we don't want to commit for lost tracks during rebalance which is doomed to fail as we don't have any generation info. Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2019-10-29 10:41:25 -07:00
Bill Bejeck	c015169aa6	MINOR: Streams upgrade system test cleanup (#7571 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>,	2019-10-24 10:28:29 -04:00
Konstantine Karantasis	fc0963a236	KAFKA-9078: Fix Connect system test after adding MM2 connector classes MM2 added a few connector classes in Connect's classpath and given that the assertion in the Connect REST system tests need to be adjusted to account for these additions. This fix makes sure that the loaded Connect plugins are a superset of the expected by the test connectors. Testing: The change is straightforward. The fix was tested with local system test runs. Author: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> Closes #7578 from kkonstantine/minor-fix-connect-test-after-mm2-classes	2019-10-23 15:46:26 +05:30
Bill Bejeck	6afe05fe89	MINOR: system test clean up (#7552 ) Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>,	2019-10-21 10:51:15 -04:00
Bill Bejeck	b62f2a1123	KAFKA-8496: System test for KIP-429 upgrades and compatibility (#7529 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2019-10-16 22:29:33 -07:00
A. Sophie Blee-Goldman	11a401d51d	delete (#7504 ) This system test was marked @Ignore around a year and a half ago pending the version probing work, but never turned on again. These days, it is made redundant by the suite of system tests in streams_upgrade_test, which cover rolling upgrades (including version probing and metadata change). Reviewers: Boyang Chen <boyang@confluent.io>, Bill Bejeck <bbejeck@gmail.com>	2019-10-14 11:41:44 -04:00
A. Sophie Blee-Goldman	cc6525a746	KAFKA-8743: Flaky Test Repartition{WithMerge}OptimizingIntegrationTest (#7472 ) All four flavors of the repartition/optimization tests have been reported as flaky and failed in one place or another: * RepartitionOptimizingIntegrationTest.shouldSendCorrectRecords_OPTIMIZED * RepartitionOptimizingIntegrationTest.shouldSendCorrectRecords_NO_OPTIMIZATION * RepartitionWithMergeOptimizingIntegrationTest.shouldSendCorrectRecords_OPTIMIZED * RepartitionWithMergeOptimizingIntegrationTest.shouldSendCorrectRecords_NO_OPTIMIZATION They're pretty similar so it makes sense to knock them all out at once. This PR does three things: * Switch to in-memory stores wherever possible * Name all operators and update the Topology accordingly (not really a flaky test fix, but had to update the topology names anyway because of the IM stores so figured might as well) * Port to TopologyTestDriver -- this is the "real" fix, should make a big difference as these repartition tests required multiple roundtrips with the Kafka cluster (while using only the default timeout) Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2019-10-10 16:23:18 -07:00
A. Sophie Blee-Goldman	f9934e7e93	MINOR: remove unused imports in Streams system tests (#7468 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-10-08 14:57:07 -07:00
A. Sophie Blee-Goldman	d88f1048da	KAFKA-8179: Part 7, cooperative rebalancing in Streams (#7386 ) Key improvements with this PR: * tasks will remain available for IQ during a rebalance (but not during restore) * continue restoring and processing standby tasks during a rebalance * continue processing active tasks during rebalance until the RecordQueue is empty* * only revoked tasks must suspended/closed * StreamsPartitionAssignor tries to return tasks to their previous consumers within a client * but do not try to commit, for now (pending KAFKA-7312) Reviewers: John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2019-10-07 09:27:09 -07:00
Manikumar Reddy	4c2bd567b1	MINOR: Bump version to 2.5.0-SNAPSHOT (#7455 )	2019-10-07 20:04:57 +05:30
A. Sophie Blee-Goldman	3b05dc685b	MINOR: just remove leader on trunk like we did on 2.3 (#7447 ) Small follow-up to trunk PR #7423 While debugging the 2.3 VP PR we realized we should remove the leader-tracking from the VP system test altogether. We'd already merged the corresponding trunk PR so I made a quick new PR for trunk (also fixes a missed version bump in one of the log messages) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-10-04 15:58:11 -07:00
Chris Egerton	791d0d61bf	KAFKA-8804: Secure internal Connect REST endpoints (#7310 ) Implemented KIP-507 to secure the internal Connect REST endpoints that are only for intra-cluster communication. A new V2 of the Connect subprotocol enables this feature, where the leader generates a new session key, shares it with the other workers via the configuration topic, and workers send and validate requests to these internal endpoints using the shared key. Currently the internal `POST /connectors/<connector>/tasks` endpoint is the only one that is secured. This change adds unit tests and makes some small alterations to system tests to target the new `sessioned` Connect subprotocol. A new integration test ensures that the endpoint is actually secured (i.e., requests with missing/invalid signatures are rejected with a 400 BAD RESPONSE status). Author: Chris Egerton <chrise@confluent.io> Reviewed: Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>	2019-10-02 17:06:57 -05:00
A. Sophie Blee-Goldman	8da69936a7	KAFKA-8649: Send latest commonly supported version in assignment (#7423 ) Instead of sending the leader's version and having older members try to blindly upgrade. The only other real change here is that we will also set the VERSION_PROBING error code and return early from onAssignment when we are upgrading our used subscription version (not just downgrading it) since this implies the whole group has finished the rolling upgrade and all members should rejoin with the new subscription version. Also piggy-backing on a fix for a potentially dangerous edge case, where every thread of an instance is assigned the same set of active tasks. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-10-02 08:54:32 -07:00
Rajini Sivaram	0d31272b35	KAFKA-8848; Update system tests to use new AclAuthorizer (#7374 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2019-09-24 10:30:17 +01:00
Randall Hauch	ada35d5ff4	Add recent versions of Kafka to the matrix of ConnectDistributedTest (#7024 ) Reviewers: Arjun Satish <arjun@confluent.io>, Konstantine Karantasis <k.karantasis@gmail.com>	2019-09-18 10:21:21 -07:00
Vikas Singh	312e4db590	MINOR. implement --expose-ports option in ducker-ak (#7269 ) This change adds a command line option to the `ducker-ak up' command to enable exposing ports from docker containers. The exposed ports will be mapped to the ephemeral ports on the host. The option is called `expose-ports' and can take either a single value (like 5005) or a range (like 5005-5009). This port will then exposed from each docker container that ducker-ak sets up. Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@users.noreply.github.com>	2019-09-09 07:57:29 -07:00
vinoth chandar	ffef0871c2	KAFKA-7149 : Reducing streams assignment data size (#7185 ) * Leader instance uses dictionary encoding on the wire to send topic partitions * Topic names (most expensive component) are mapped to an integer using the dictionary * Follower instances receive the dictionary, decode topic names back * Purely an on-the-wire optimization, no in-memory structures changed * Test case added for version 5 AssignmentInfo Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-09-05 13:50:55 -07:00
Colin Patrick McCabe	a225347ff2	KAFKA-8840: Fix bug where ClientCompatibilityFeaturesTest fails when running multiple iterations (#7260 ) Fix a bug where ClientCompatibilityFeaturesTest fails when running multiple iterations. Also, fix a typo in tests/docker/Dockerfile. Reviewers: Ismael Juma <ismael@juma.me.uk>	2019-08-30 16:07:59 -07:00
Vikas Singh	09ad6b84c5	MINOR. Fix 2.3.0 streams systest dockerfile typo (#7272 ) As part of commit `4d1ee26a13` streams version 2.3.0 test jar was added, but there was a simple typo in the path that specified the version. `ducker-ak up` was failing because of that. Fixed that. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-08-29 14:12:08 -07:00
Matthias J. Sax	4d1ee26a13	KAFKA-8594: Add version 2.3 to Streams system tests (#7131 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>	2019-08-21 10:26:57 -07:00
Brian Bushree	6c8f654d5f	MINOR: Upgrade ducktape to 0.7.6 Author: Brian Bushree <bbushree@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #7138 from brianbushree/update-ducktape	2019-08-19 17:12:48 -07:00
David Arthur	ff9e95cb09	MINOR: Add fetch from follower system test (#7166 ) This adds a basic system test that enables rack-aware brokers with the rack-aware replica selector for fetch from followers (KIP-392). The test asserts that the follower was read from at least once and that all the messages that were produced were successfully consumed. Reviewers: Jason Gustafson <jason@confluent.io>	2019-08-13 12:33:05 -07:00
Arjun Satish	794637232c	KAFKA-8774: Regex can be found anywhere in config value (#7197 ) Corrected the AbstractHerder to correctly identify task configs that contain variables for externalized secrets. The original method incorrectly used `matcher.matches()` instead of `matcher.find()`. The former method expects the entire string to match the regex, whereas the second one can find a pattern anywhere within the input string (which fits this use case more correctly). Added unit tests to cover various cases of a config with externalized secrets, and updated system tests to cover case where config value contains additional characters besides secret that requires regex pattern to be found anywhere in the string (as opposed to complete match). Author: Arjun Satish <arjun@confluent.io> Reviewer: Randall Hauch <rhauch@gmail.com>	2019-08-13 09:40:12 -05:00
Matthias J. Sax	e9a35fe02e	MINOR: Bump system test version from 2.2.0 to 2.2.1 (#6873 ) Reviewers: Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>	2019-08-09 14:33:20 -07:00
Rajini Sivaram	de8ce78a90	MINOR: Tolerate limited data loss for upgrade tests with old message format (#7102 ) To avoid transient system test failures, tolerate a small amount of data loss due to truncation in upgrade system tests using older message format prior to KIP-101, where data loss was possible. Reviewers: Ismael Juma <ismael@juma.me.uk>	2019-07-31 16:19:36 +01:00
Brian Bushree	e5f7220b23	MINOR: kafkatest - adding whitelist for interbroker sasl configs (#7093 )	2019-07-22 09:38:28 +01:00
Ismael Juma	d67495d6a7	KAFKA-8634: Update ZooKeeper to 3.5.5 (#6802 ) ZooKeeper 3.5.5 is the first stable release in the 3.5.x series. The key new feature in is TLS support, but there are a few more noteworthy features: * Dynamic reconfiguration * Local sessions * New node types: Container, TTL * Ability to remove watchers * Multi-threaded commit processor * Upgraded to Netty 4.1 See the release notes for more detail: https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html In addition to the version bump, we: * Add `commons-cli` dependency as it's required by `ZooKeeperMain`, but specified as `provided` in their pom. * Remove unnecessary `ZooKeeperMainWrapper`, the bug it worked around was fixed upstream a long time ago. * Ignore non zero exit in one system test invocation of `ZooKeeperMain`. `ZooKeeperMainWrapper` always returned `0` and `ZooKeeperService.query` relies on that for correct behavior. Reviewers: Jason Gustafson <jason@confluent.io>	2019-07-10 09:45:10 -07:00
Ismael Juma	57903be496	MINOR: Remove zkclient dependency (#7036 ) ZkUtils was removed so we don't need this anymore. Also: * Fix ZkSecurityMigrator and ReplicaManagerTest not to reference ZkClient classes. * Remove references to zkclient in various `log4j.properties` and `import-control.xml`. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>	2019-07-05 07:50:32 -07:00
David Arthur	23beeea34b	KAFKA-8443; Broker support for fetch from followers (#6832 ) Follow on to #6731, this PR adds broker-side support for [KIP-392](https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica) (fetch from followers). Changes: * All brokers will handle FetchRequest regardless of leadership * Leaders can compute a preferred replica to return to the client * New ReplicaSelector interface for determining the preferred replica * Incremental fetches will include partitions with no records if the preferred replica has been computed * Adds new JMX to expose the current preferred read replica of a partition in the consumer Two new conditions were added for completing a delayed fetch. They both relate to communicating the high watermark to followers without waiting for a timeout: * For regular fetches, if the high watermark changes within a single fetch request * For incremental fetch sessions, if the follower's high watermark is lower than the leader A new JMX attribute `preferred-read-replica` was added to the `kafka.consumer:type=consumer-fetch-manager-metrics,client-id=some-consumer,topic=my-topic,partition=0` object. This was added to support the new system test which verifies that the fetch from follower behavior works end-to-end. This attribute could also be useful in the future when debugging problems with the consumer. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>	2019-07-04 08:18:51 -07:00
Brian Bushree	5287036b38	MINOR: system tests - avoid 'sasl.enabled.mechanisms' in listener overrides (#7018 ) Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2019-07-03 12:17:17 +01:00
Randall Hauch	6f91096c7d	MINOR: Fix version for ConnectDistributed system test, remove 0.9.0.1 compatibility test (#7023 ) Connect tests were using String version for KafkaService instead of the expected KafkaVersion object. This broke due to recent changes to KafkaVersion. It turns out that the tests with String version were running compatibility tests against `dev` brokers rather than the older broker versions they were expecting to run against. When version was fixed, tests using 0.9.0.1 brokers started failing since new clients are not compatible with 0.9.0.1 brokers. So this PR fixes version parameter and removes the two tests against 0.9.0.1 brokers. Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>	2019-07-02 19:10:41 +01:00
Colin Patrick McCabe	3d2d87abd1	MINOR: Add compatibility tests for 2.3.0 (#6995 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	2019-06-28 09:25:08 -07:00
Brian Bushree	357aedeb1b	MINOR: Support listener config overrides in system tests (#6981 ) Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2019-06-27 18:10:43 +01:00
Stanislav Vodetskyi	594d043037	MINOR: Fix failing upgrade test by supporting both security.inter.broker.protocol and inter.broker.listener.name depending on kafka version (#7000 ) Reviewers: Brian Bushree <bbushree@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>	2019-06-27 17:50:17 +01:00
Stanislav Vodetskyi	f51d7d3c93	KAFKA-8557: system tests - add support for (optional) interbroker listener with the same security protocol as client listeners (#6938 ) Reviewers: Brian Bushree <bbushree@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>	2019-06-21 17:51:43 +01:00
David Arthur	d7a5e31ca2	KAFKA-8519 Add trogdor action to slow down a network (#6912 ) This adds a new Trogdor fault spec for inducing network latency on a network device for system testing. It operates very similarly to the existing network partition spec by executing the `tc` linux utility.	2019-06-21 11:30:05 -04:00
Boyang Chen	e981b82601	KAFKA-8500; Static member rejoin should always update member.id (#6899 ) This PR fixes a bug in static group membership. Previously we limit the `member.id` replacement in JoinGroup to only cases when the group is in Stable. This is error-prone and could potentially allow duplicate consumers reading from the same topic. For example, imagine a case where two unknown members join in the `PrepareRebalance` stage at the same time. The PR fixes the following things: 1. Replace `member.id` at any time we see a known static member rejoins group with unknown member.id 2. Immediately fence any ongoing join/sync group callback to early terminate the duplicate member. 3. Clearly handle Dead/Empty cases as exceptional. 4. Return old leader id upon static member leader rejoin to avoid trivial member assignment being triggered. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	2019-06-12 08:41:58 -07:00
Jason Gustafson	2feb44ebc8	MINOR: Fix race condition on shutdown of verifiable producer We've seen `ReplicaVerificationToolTest.test_replica_lags` fail occasionally due to errors such as the following: ``` RemoteCommandError: ubuntuworker7: Command 'kill -15 2896' returned non-zero exit status 1. Remote error message: bash: line 0: kill: (2896) - No such process ``` The problem seems to be a shutdown race condition when using `max_messages` with the producer. The process may already be gone which will cause the signal to fail. Author: Jason Gustafson <jason@confluent.io> Reviewers: Gwen Shapira Closes #6906 from hachikuji/fix-failing-replicat-verification-test	2019-06-07 16:56:21 -07:00
Jason Gustafson	c7c310beff	MINOR: Lower producer throughput in flaky upgrade system test We see the upgrade test failing from time to time. I looked into it and found that the root cause is basically that the test throughput can be too high for the 0.9 producer to make progress. Eventually it reaches a point where it has a huge backlog of timed out requests in the accumulator which all have to be expired. We see a long run of messages like this in the output: ``` {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335160","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335163","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335166","key":null} {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335169","key":null} ``` This can continue for a long time (I have observed up to 1 min) and prevents the producer from successfully writing any new data. While it is busy expiring the batches, no data is getting delivered to the consumer, which causes it to eventually raise a timeout. ``` kafka.consumer.ConsumerTimeoutException at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:50) at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:109) at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69) at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47) at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) ``` The fix here is to reduce the throughput, which seems reasonable since the purpose of the test is to verify the upgrade, which does not demand heavy load. Note that I investigated several failing instances of this test going back to 1.0 and saw a similar pattern, so there does not appear to be a regression. Author: Jason Gustafson <jason@confluent.io> Reviewers: Gwen Shapira Closes #6907 from hachikuji/lower-throughput-for-upgrade-test	2019-06-07 16:53:50 -07:00
Lucas Bradstreet	677713baf3	KAFKA-8499: ensure java is in PATH for ducker system tests (#6898 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2019-06-07 14:23:49 -07:00
Boyang Chen	cca05cace4	KAFKA-8331: stream static membership system test (#6877 ) As title suggested, we boost 3 stream instances stream job with one minute session timeout, and once the group is stable, doing couple of rolling bounces for the entire cluster. Every rejoin based on restart should have no generation bump on the client side. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>	2019-06-07 16:52:12 -04:00
Stanislav Kozlovski	58aa04f91e	MINOR: Improve Trogdor external command worker docs (#6438 ) Reviewers: Colin McCabe <cmccabe@apache.org>, Xi Yang <xi@confluent.io>	2019-06-06 10:04:05 -07:00
Matthias J. Sax	ba3dc49437	KAFKA-8155: Add 2.2.0 release to system tests (#6597 ) Reviewers: Bill Bejeck <bill@confluent.io>, Boyang Chen <boyang@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confuent.io>	2019-06-03 21:09:58 -07:00
Konstantine Karantasis	55d07e717e	KAFKA-8473: Adjust Connect system tests for incremental cooperative rebalancing (#6872 ) Author: Konstantine Karantasis <konstantine@confluent.io> Reviewer: Randall Hauch <rhauch@gmail.com>	2019-06-03 16:50:03 -05:00
Matthias J. Sax	55bfea1378	KAFKA-8155: Add 2.1.1 release to system tests (#6596 ) Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2019-05-30 12:50:30 -07:00
Alex Diachenko	77a9a108ff	KAFKA-8418: Wait until REST resources are loaded when starting a Connect Worker. (#6840 ) Author: Alex Diachenko <sansanichfb@gmail.com> Reviewers: Arjun Satish <arjun@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>	2019-05-30 14:01:00 -05:00
Alex Diachenko	4838855ea7	MINOR: Fix red herring when ConnectDistributedTest.test_bounce fails. (#6838 ) Author: Alex Diachenko <sansanichfb@gmail.com> Reviewer: Randall Hauch <rhauch@gmail.com>	2019-05-29 17:33:24 -05:00
Bill Bejeck	f249956390	MINOR: Account for different versions in upgrade (#6835 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bruno Cadonna <bruno@confluent.io>	2019-05-29 17:02:37 -04:00
Matthias J. Sax	d286051a21	MINOR: fix Streams version-probing system test (#6764 ) Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Boyang Chen <boyang@confluent.io>	2019-05-24 08:08:52 -07:00
Ismael Juma	c823f32070	MINOR: Add 2.0, 2.1 and 2.2 to broker and client compat tests These are important to ensure we don't break compatibility. Author: Ismael Juma <ismael@juma.me.uk> Reviewers: Gwen Shapira Closes #6794 from ijuma/update-version-compat-tests	2019-05-23 13:34:00 -07:00
Konstantine Karantasis	c6d083d7fc	KAFKA-8417: Remove redundant network definition --net=host when starting testing docker containers (#6797 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2019-05-23 11:46:10 -07:00
Manikumar Reddy	5ca6a2ee94	MINOR: Use `jps` cmd to find out the pid of TransactionalMessageCopier Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com> Closes #6787 from omkreddy/transaction_test	2019-05-22 19:12:55 +05:30
Colin Patrick McCabe	87ff83a82e	MINOR: Bump version to 2.4.0-SNAPSHOT (#6774 ) Reviewers: Jason Gustafson <jason@confluent.io>	2019-05-20 12:47:21 -07:00
Jason Gustafson	b52170372b	MINOR: Increase security test timeouts for transient failures (#6760 ) Reviewers: Ismael Juma <ismael@juma.me.uk>	2019-05-18 16:52:18 -07:00
Boyang Chen	9fa331b811	KAFKA-8225 & KIP-345 part-2: fencing static member instances with conflicting group.instance.id (#6650 ) For static members join/rejoin, we encode the current timestamp in the new member.id. The format looks like group.instance.id-timestamp. During consumer/broker interaction logic (Join, Sync, Heartbeat, Commit), we shall check the whether group.instance.id is known on group. If yes, we shall match the member.id stored on static membership map with the request member.id. If mismatching, this indicates a conflict consumer has used same group.instance.id, and it will receive a fatal exception to shut down. Right now the only missing part is the system test. Will work on it offline while getting the major logic changes reviewed. Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2019-05-18 07:28:36 -07:00
Vahid Hashemian	16ece15fb3	MINOR: Include StickyAssignor in system tests (#5223 ) Reviewers: Jason Gustafson <jason@conflent.io>	2019-05-11 11:13:07 -07:00
Magesh Nandakumar	40d5c9fac9	KAFKA-8352 : Fix Connect System test failure 404 Not Found (#6713 ) Corrects the system tests to check for either a 404 or a 409 error and sleeping until the Connect REST API becomes available. This corrects a previous change to how REST extensions are initialized (#6651), which added the ability of Connect throwing a 404 if the resources are not yet started. The integration tests were already looking for 409. Author: Magesh Nandakumar <magesh.n.kumar@gmail.com> Reviewer: Randall Hauch <rhauch@gmail.com>	2019-05-10 18:20:39 -05:00
Boyang Chen	0f995ba6be	KAFKA-7862 & KIP-345 part-one: Add static membership logic to JoinGroup protocol (#6177 ) This is the first diff for the implementation of JoinGroup logic for static membership. The goal of this diff contains: * Add group.instance.id to be unique identifier for consumer instances, provided by end user; Modify group coordinator to accept JoinGroupRequest with/without static membership, refactor the logic for readability and code reusability. * Add client side support for incorporating static membership changes, including new config for group.instance.id, apply stream thread client id by default, and new join group exception handling. * Increase max session timeout to 30 min for more user flexibility if they are inclined to tolerate partial unavailability than burdening rebalance. * Unit tests for each module changes, especially on the group coordinator logic. Crossing the possibilities like: 6.1 Dynamic/Static member 6.2 Known/Unknown member id 6.3 Group stable/unstable 6.4 Leader/Follower The rest of the 345 change will be broken down to 4 separate diffs: * Avoid kicking out members through rebalance.timeout, only do the kick out through session timeout. * Changes around LeaveGroup logic, including version bumping, broker logic, client logic, etc. * Admin client changes to add ability to batch remove static members * Deprecate group.initial.rebalance.delay Reviewers: Liquan Pei <liquanpei@gmail.com>, Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2019-04-26 11:44:38 -07:00
Boyang Chen	847957cbea	KAFKA-8291 : System test fix (#6637 ) As titled, this PR changed the default reset policy to latest accidentally for system tests, which in fact was earliest. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-04-25 14:16:34 -07:00
Ismael Juma	7d9e93ac6d	MINOR: Use https instead of http in links (#6477 ) Verified that the https links work. I didn't update the license header in this PR since that touches so many files. Will file a separate one for that. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2019-04-22 11:58:25 -07:00
David Arthur	409fabc561	KAFKA-7747; Check for truncation after leader changes [KIP-320] (#6371 ) After the client detects a leader change we need to check the offset of the current leader for truncation. These changes were part of KIP-320: https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation. Reviewers: Jason Gustafson <jason@confluent.io>	2019-04-21 16:24:18 -07:00
Guozhang Wang	4aa2cfe467	MINOR: Tighten up metadata upgrade test (#6531 ) Reviewers: Bill Bejeck <bbejeck@gmail.com>	2019-04-05 12:50:42 -07:00
Stanislav Kozlovski	0d55f0f3ec	KAFKA-8102: Add an interval-based Trogdor transaction generator (#6444 ) This patch adds a TimeIntervalTransactionsGenerator class which enables the Trogdor ProduceBench worker to commit transactions based on a configurable millisecond time interval. Also, we now handle 409 create task responses in the coordinator command-line client by printing a more informative message Reviewers: Colin P. McCabe <cmccabe@apache.org>	2019-03-25 09:58:11 -07:00
Brian Bushree	860e957999	MINOR: list-topics should not require topic param `kafka.list_topics(...)` should not require a topic parameter Author: Brian Bushree <bbushree@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #6367 from brianbushree/list-topics-no-topic	2019-03-22 11:50:00 -07:00
Stanislav Kozlovski	f20f3c1a97	MINOR: Update Trogdor ConnectionStressWorker status at the end of execution (#6445 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2019-03-15 09:53:21 -07:00
John Roesler	8e97540071	KAFKA-7944: Improve Suppress test coverage (#6382 ) * add a normal windowed suppress with short windows and a short grace period * improve the smoke test so that it actually verifies the intended conditions See https://issues.apache.org/jira/browse/KAFKA-7944 Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2019-03-12 09:53:29 -07:00
Rajini Sivaram	460b3a6259	KAFKA-8070: Increase consumer startup timeout in system tests (#6405 ) For consumers using SSL, this timeout includes the time to create and copy keystores and truststores and sometime it takes longer than 10s to complete the security setup before starting the consumer process. Reviewers: Ismael Juma <ismael@juma.me.uk>	2019-03-08 16:57:58 +00:00
Guozhang Wang	057bb35f24	HOTFIX: add igore import to streams_upgrade_test	2019-03-01 11:20:41 -08:00
Guozhang Wang	dfae20ecee	MINOR: disable Streams system test for broker upgrade/downgrade (#6341 ) Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2019-02-28 00:20:44 -08:00
Bill Bejeck	071f62a356	MINOR: refactor topic check to make sure all topics exist by name vs doing a topic count (#6271 ) Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2019-02-20 17:02:30 -08:00
Bill Bejeck	be76560011	MINOR: Add all topics created check streams broker bounce test (trunk) (#6243 ) The StreamsBrokerBounceTest.test_broker_type_bounce experienced what looked like a transient failure. After looking over this test and failure, it seems like it is vulnerable to timing error that streams will start before the kafka service creates all topics. Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>	2019-02-20 12:45:22 -05:00
Matthias J. Sax	d2575f03a3	MINOR: Bump version to 2.3.0-SNAPSHOT (#6226 ) * MINOR: Bump version to 2.3.0-SNAPSHOT * Github comment	2019-02-11 14:46:49 -08:00
Colin Patrick McCabe	4be68c58da	KAFKA-7828: Add ExternalCommandWorker to Trogdor (#6219 ) Allow the Trogdor agent to execute external commands. The agent communicates with the external commands via stdin, stdout, and stderr. Based on a patch by Xi Yang <xi@confluent.io> Reviewers: David Arthur <mumrah@gmail.com>	2019-02-06 16:42:02 -08:00
Konstantine Karantasis	83c435f3ba	KAFKA-7834: Extend collected logs in system test services to include heap dumps * Enable heap dumps on OOM with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file.bin> in the major services in system tests * Collect the heap dump from the predefined location as part of the result logs for each service * Change Connect service to delete the whole root directory instead of individual expected files * Tested by running the full suite of system tests Author: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #6158 from kkonstantine/KAFKA-7834	2019-02-04 16:46:03 -08:00
Konstantine Karantasis	0dbb064963	MINOR: Upgrade ducktape to 0.7.5 (#6197 ) Reviewed-by: Colin P. McCabe <cmccabe@apache.org>	2019-01-25 11:14:19 -08:00
Colin Patrick McCabe	a79d6dcdb6	KAFKA-7793: Improve the Trogdor command line. (#6133 ) * Allow the Trogdor agent to be started in "exec mode", where it simply runs a single task and exits after it is complete. * For AgentClient and CoordinatorClient, allow the user to pass the path to a file containing JSON, instead of specifying the JSON object in the command-line text itself. This means that we can get rid of the bash scripts whose only function was to load task specs into a bash string and run a Trogdor command. * Print dates and times in a human-readable way, rather than as numbers of milliseconds. * When listing tasks or workers, output human-readable tables of information. * Allow the user to filter on task ID name, task ID pattern, or task state. * Support a --json flag to provide raw JSON output if desired. Reviewed-by: David Arthur <mumrah@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>	2019-01-24 09:26:51 -08:00
Kan Li	f8e8d62f56	MINOR: ducker-ak: add down -f, avoid using a terminal in ducker test When using ./ducker-ak test on Jenkins, the script complains that there is no TTY. To fix this, we should skip passing -t to docker exec. We do not need a pseudo-TTY to run the tests. Similarly, we should skip passing -i, since we do not need to keep stdin open. The down command should have a force option, specified as -f or --force. Reviewed-by: Colin P. McCabe <cmccabe@apache.org>	2019-01-23 13:39:47 -08:00
Chris Egerton	743607af5a	KAFKA-5117: Stop resolving externalized configs in Connect REST API [KIP-297](https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations#KIP-297:ExternalizingSecretsforConnectConfigurations-PublicInterfaces) introduced the `ConfigProvider` mechanism, which was primarily intended for externalizing secrets provided in connector configurations. However, when querying the Connect REST API for the configuration of a connector or its tasks, those secrets are still exposed. The changes here prevent the Connect REST API from ever exposing resolved configurations in order to address that. rhauch has given a more thorough writeup of the thinking behind this in [KAFKA-5117](https://issues.apache.org/jira/browse/KAFKA-5117) Tested and verified manually. If these changes are approved unit tests can be added to prevent a regression. Author: Chris Egerton <chrise@confluent.io> Reviewers: Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com, Ewen Cheslack-Postava <ewen@confluent.io> Closes #6129 from C0urante/hide-provided-connect-configs	2019-01-23 11:00:23 -08:00
Xi Yang	cc33511e9a	MINOR: Support choosing different JVMs when running integration tests + Add a parameter to the ducktap-ak to control the OpenJDK base image. + Fix a few issues of using OpenJDK:11 as the base image. More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers. Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes. Author: Xi Yang <xi@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #6071 from yangxi/ducktape-jdk	2019-01-11 15:11:55 -08:00
Bill Bejeck	3746bf2c84	MINOR: Need to have same wait as verify timeout broker upgrade downgrade (#6127 ) When I originally refactored the streams_upgrade_test#upgrade_downgrade_brokers test I removed the wait call which would wait for up 24 minutes for the SmokeTestDriver class to publish and verify all of its records. Since most of the tests run in two minutes or less I set the monitor_log timeout to three minutes. However, the SmokeTestDriver#verify method allows up to six minutes to consume all records before verifying the monitor_log timeout needs to be greater than 6 minutes. I've set the timeout to 8 minutes. Additionally, the steps needed to update the streams_upgrade_test should be documented as there are several components that need to get updated. So I've documented those steps here on the test as a giant comment. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-01-11 11:35:33 -08:00
Bill Bejeck	b1b792d9a8	MINOR: Add 2.1 version metadata upgrade (#6111 ) Updated the test_metadata_upgrade test. To enable using the 2.1 version I needed to add config change to the StreamsUpgradeTestJobRunnerService to ensure the ductape passes proper args when starting the StreamsUpgradeTest For testing, I ran the test_metadata_upgrade test and all versions now pass http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2019-01-09--001.1547049873--bbejeck--MINOR_add_2_1_version_metadata_upgrade--a450c68/report.html Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2019-01-09 15:19:00 -08:00
Bill Bejeck	515e680c71	MINOR: Put states in proper order, increase timeout for starting (#6105 ) Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2019-01-08 13:48:38 -08:00
Jason Gustafson	f9a22f42a8	KAFKA-7773; Add end to end system test relying on verifiable consumer (#6070 ) This commit creates an EndToEndTest base class which relies on the verifiable consumer. This will ultimately replace ProduceConsumeValidateTest which depends on the console consumer. The advantage is that the verifiable consumer exposes more information to use for validation. It also allows for a nicer shutdown pattern. Rather than relying on the console consumer idle timeout, which requires a minimum wait time, we can halt consumption after we have reached the last acked offsets. This should be more reliable and faster. The downside is that the verifiable consumer only works with the new consumer, so we cannot yet convert the upgrade tests. This commit converts only the replication tests and a flaky security test to use EndToEndTest.	2019-01-08 14:14:51 +00:00
Bill Bejeck	404bdef08d	MINOR: Remove sleep calls and ignore annotation from streams upgrade test (#6046 ) The StreamsUpgradeTest::test_upgrade_downgrade_brokers used sleep calls in the test which led to flaky test performance and as a result, we placed an @ignore annotation on the test. This PR uses log events instead of the sleep calls hence we can now remove the @ignore setting. Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2019-01-06 23:03:54 -08:00
John Roesler	ef9204dc58	MINOR: improve resilience of Streams test producers (#6028 ) Some Streams system tests have failed during the setup phase due to the producer having retries disabled and getting some transient error from the broker. This patch adds a retries parameter to the VerifiableProducer (default unchanged), and sets retries to 10 for Streams tests. It also sets acks equal to the number of brokers for Streams tests. Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2019-01-04 13:44:15 -08:00
Bill Bejeck	639427a38f	MINOR: Increase throughput for VerifiableProducer in test (#6060 ) Previous PR #6043 reduced throughput for VerifiableProducer in base class, but the streams_standby_replica_test needs higher throughput for consumer to complete verification in 60 seconds Update system test and kicked off branch builder with 25 repeats https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2201/ Reviewers: Guozhang Wang <wangguoz@gmail.com>	2018-12-21 22:32:19 -08:00
Bill Bejeck	40ca7ddeed	MINOR: Stabilization fixes broker down test trunk (#6043 ) This PR addresses a few issues with this system test flakiness. This PR is a cherry-picked duplicate of #6041 but for the trunk branch, hence I won't repeat the inline comments here. 1. Need to grab the monitor before a given operation to observe logs for signal 2. Relied too much on a timely rebalance and only sent a handful of messages. I've updated the test and ran it here https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2143/ parameterized for 15 repeats all passed. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2018-12-19 17:53:09 -08:00
Bill Bejeck	da332f2241	MINOR:Start processor inside verify message (#6029 ) This PR fixes a flaky system test. I ran six runs of branch builder, and each run was parameterized to repeat the test 25 times for a total of 150 runs. All test runs passed. https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2122/ https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2123/ https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2124/ https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2128/ https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2129/ https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2130/ Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>, John Roesler <vvcephei@users.noreply.github.com>	2018-12-13 20:30:04 -08:00
Stanislav Kozlovski	05dc36d548	MINOR: Fix failing ConsumeBenchTest:test_multiple_consumers_specified_group_partitions_should_raise (#6015 ) This is the error message we're after: "You may not specify an explicit partition assignment when using multiple consumers in the same group." We apparently changed it midway through #5810 and forgot to update the test. Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Ismael Juma <ismael@juma.me.uk>	2018-12-08 09:27:59 -08:00
Bill Bejeck	ab1fb3fdde	MINOR: Adding system test for named repartition topics (#5913 ) This is a system test for doing a rolling upgrade of a topology with a named repartition topic. 1. An initial Kafka Streams application is started on 3 nodes. The topology has one operation forcing a repartition and the repartition topic is explicitly named. 2. Each node is started and processing of data is validated 3. Then one node is stopped (full stop is verified) 4. A property is set signaling the node to add operations to the topology before the repartition node which forces a renumbering of all operators (except repartition node) 5. Restart the node and confirm processing records 6. Repeat the steps for the other 2 nodes completing the rolling upgrade I ran two runs of the system test with 25 repeats in each run for a total of 50 test runs. All test runs passed Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2018-12-03 12:37:31 -08:00
Attila Sasvari	e7ce0e7e0a	KAFKA-4544: Add system tests for delegation token based authentication This change adds some basic system tests for delegation token based authentication: - basic delegation token creation - producing with a delegation token - consuming with a delegation token - expiring a delegation token - producing with an expired delegation token New files: - delegation_tokens.py: a wrapper around kafka-delegation-tokens.sh - executed in container where a secure Broker is running (taking advantage of automatic cleanup) - delegation_tokens_test.py: basic test to validate the lifecycle of a delegation token Changes were made in the following file to extend their functionality: - config_property was updated to be able to configure Kafka brokers with delegation token related settings - jaas.conf template because a broker needs to support multiple login modules when delegation tokens are used - consule-consumer and verifiable_producer to override KAFKA_OPTS (to specify custom jaas.conf) and the client properties (to authenticate with delegation token). Author: Attila Sasvari <asasvari@apache.org> Reviewers: Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Andras Katona <41361962+akatona84@users.noreply.github.com>, Manikumar Reddy <manikumar.reddy@gmail.com> Closes #5660 from asasvari/KAFKA-4544	2018-12-03 11:28:36 +05:30
Bill Bejeck	dfd545485a	MINOR: Add system test for optimization upgrades (#5912 ) This is a new system test testing for optimizing an existing topology. This test takes the following steps 1. Start a Kafka Streams application that uses a selectKey then performs 3 groupByKey() operations and 1 join creating four repartition topics 2. Verify all instances start and process data 3. Stop all instances and verify stopped 4. For each stopped instance update the config for TOPOLOGY_OPTIMIZATION to all then restart the instance and verify the instance has started successfully also verifying Kafka Streams reduced the number of repartition topics from 4 to 1 5. Verify that each instance is processing data from the aggregation, reduce, and join operation Stop all instances and verify the shut down is complete. 6. For testing I ran two passes of the system test with 25 repeats for a total of 50 test runs. All test runs passed Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2018-11-27 13:07:34 -08:00
Stanislav Kozlovski	9368743b8f	KAFKA-7597: Add transaction support to ProduceBenchWorker (#5885 ) KAFKA-7597: Add configurable transaction support to ProduceBenchWorker. In order to get support for serializing Optional<> types to JSON, add a new library: jackson-datatype-jdk8. Once Jackson 3 comes out, this library will not be needed. Reviewers: Colin McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>	2018-11-27 12:49:53 -08:00
John Roesler	14fbf16875	MINOR: increase system test kafka start timeout (#5934 ) The Kafka Streams system tests fail with some regularity due to a timeout starting the broker. The initial start is quite quick, but many of our tests involve stopping and restarting nodes with data already loaded, and also while processing is ongoing. Under these conditions, it seems to be normal for the broker to take about 25 seconds to start, which makes the 30 second timeout pretty close for comfort. I have seen many test failures in which the broker successfully started within a couple of seconds after the tests timed out and already initiated the failure/shut-down sequence. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2018-11-21 11:49:36 -08:00
Matthias J. Sax	190cbd9fe5	MINOR: fix failing Streams system tests (#5928 ) Reviewers: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>	2018-11-19 18:44:45 -08:00
Cyrus Vafadari	f52de5f0ad	MINOR: Remove unused abstract function in test class (#5888 ) The function `setup_producer_and_consumer` is unused in the system test framework, which incorrectly suggests subclasses should implement it. It is not required or even referenced by the framework, so the requirement should be removed. Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Jason Gustafson <jason@confluent.io>	2018-11-13 09:43:29 -08:00
Stanislav Kozlovski	8259fda695	KAFKA-7514: Add threads to ConsumeBenchWorker (#5864 ) Add threads with separate consumers to ConsumeBenchWorker. Update the Trogdor test scripts and documentation with the new functionality. Reviewers: Colin McCabe <cmccabe@apache.org>	2018-11-13 08:38:42 -08:00
Magesh Nandakumar	1c5aec6e9d	MINOR: Modify Connect service's startup timeout to be passed via the init (#5882 ) Currently, the startup timeout is hardcoded to be 60 seconds in Connect's test service. Modifying it to be passable via init. This can safely be backported as well. Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>	2018-11-06 13:41:19 -08:00
Randall Hauch	f856082cb8	KAFKA-7559: Correct standalone system tests to use the correct external file (#5883 ) This fixes the Connect standalone system tests. See branch builder: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2021/ This should be backported to the 2.0 branch, since that's when the tests were first modified to use the external property file. Reviewers: Magesh Nandakumar <magesh.n.kumar@gmail.com>, Ismael Juma <ismael@juma.me.uk>	2018-11-05 22:39:12 -08:00
Gardner Vickers	ea518e4d67	KAFKA-7561: Increase stop_timeout_sec to make ConsoleConsumerTest pass (#5853 ) Looks like the increased delay happens when connecting to the docker container. Reviewers: Ismael Juma <ismael@juma.me.uk>	2018-11-03 10:06:17 -07:00
Stanislav Kozlovski	d28c534819	KAFKA-7515: Trogdor - Add Consumer Group Benchmark Specification (#5810 ) This ConsumeBenchWorker now supports using consumer groups. The groups may be either used to store offsets, or as subscriptions.	2018-10-29 10:51:07 -07:00
Randall Hauch	8b1d705404	MINOR: Fix undefined variable in Connect test Corrects an error in the system tests: ``` 07:55:45 [ERROR:2018-10-23 07:55:45,738]: Failed to import kafkatest.tests.connect.connect_test, which may indicate a broken test that cannot be loaded: NameError: name 'EXTERNAL_CONFIGS_FILE' is not defined ``` The constant is defined in the [services/connect.py](https://github.com/apache/kafka/blob/trunk/tests/kafkatest/services/connect.py#L43) file in the `ConnectServiceBase` class, but the problem is in the [tests/connect/connect_test.py](https://github.com/apache/kafka/blob/trunk/tests/kafkatest/tests/connect/connect_test.py#L50) `ConnectStandaloneFileTest`, which does not extend the `ConnectServiceBase class`. Suggestions welcome to be able to reuse that variable without duplicating the literal (as in this PR). System test run with this PR: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2004/ If approved, this should be merged as far back as the `2.0` branch. Author: Randall Hauch <rhauch@gmail.com> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #5832 from rhauch/fix-connect-externals-tests	2018-10-24 13:16:34 -07:00
Guozhang Wang	2d77746a7b	MINOR: fixes on streams upgrade test (#5754 ) 1. In test_upgrade_downgrade_brokers, allow duplicates to happen. 2. In test_version_probing_upgrade, grep the generation numbers from brokers at the end, and check if they can ever be synchronized. Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>	2018-10-13 22:39:24 -07:00
Colin Hicks	b63cd6370f	MINOR: Adjust test params pursuant to KAFKA-4514. (#5777 ) PR #2267 Introduced support for Zstandard compression. The relevant test expects values for `num_nodes` and `num_producers` based on the (now-incremented) count of compression types. Passed the affected, previously-failing test: `ducker-ak test tests/kafkatest/tests/client/compression_test.py` Reviewers: Jason Gustafson <jason@confluent.io>	2018-10-10 15:16:54 -07:00

... 4 5 6 7 8 ...

910 Commits