Commit Graph

70 Commits

Author SHA1 Message Date
Jinhe Zhang aa0f4ea808 remove eos tests 2025-08-25 15:27:58 -04:00
Lucas Brutschy f26974b16d
KAFKA-19202: Enable KIP-1071 in streams_eos_test (#19700)
CI / build (push) Waiting to run Details
Enable next system test with KIP-1071.

Some of the validation inside the test did not make sense for KIP-1071.
This is because in KIP-1071, if a member leaves or joins the group, not
all members may enter a REBALANCING state. We use the wrapper introduced
in   [KAFKA-19271](https://issues.apache.org/jira/browse/KAFKA-19271)
to print a log line whenever the member epoch is bumped, which is  the
only way a member can "indirectly" observe that other members  are
rebalancing.

Reviewers: Bill Bejeck <bill@confluent.io>
2025-05-17 21:20:47 +02:00
Lucas Brutschy 5c63b4569b
KAFKA-19202: Enable KIP-1071 in streams_smoke_test.py (#19560)
CI / build (push) Waiting to run Details
Enables KIP-1071 (`group.protocol=streams`) in the first streams  system
test `streams_smoke_test.py`.

All tests using KIP-1071 cannot use `KafkaTest` anymore, since  we need
to customize the broker configuration. The corresponding  functionality
is added to `BaseStreamsTest`, which all streams  tests will have to
extend from now on.

There are some left-overs from ZK in the tests that I copied   from
'KafkaTest'. They need to be cleaned up, but this should  be done in a
separate PR.
2025-04-29 13:35:19 +02:00
TengYao Chi b37b89c668
KAFKA-9366 Upgrade log4j to log4j2 (#17373)
This pull request replaces Log4j with Log4j2 across the entire project, including dependencies, configurations, and code. The notable changes are listed below:

1. Introduce Log4j2 Instead of Log4j
2. Change Configuration File Format from Properties to YAML
3. Adds warnings to notify users if they are still using Log4j properties, encouraging them to transition to Log4j2 configurations

Co-authored-by: Lee Dongjin <dongjin@apache.org>

Reviewers: Luke Chen <showuon@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 01:14:31 +08:00
Bill Bejeck 36c131ef4a
KAFKA-17609:[1/4] Changes needed to convert system tests to use KRaft and remove ZK (#17275)
This is part one of a multi-pr effort to convert Kafka Streams system tests to KRaft. I decided to break down the changes into multiple PRs to reduce the review load

Reviewers: Matthias Sax <mjsax@apache.org>
2024-11-05 11:23:33 -05:00
Matthias J. Sax 6fd973b4a5
KAFKA-16331: Remove EOSv1 from Kafka Streams system tests (#17108)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Bill Bejeck <bill@confluent.io>
2024-09-10 17:55:03 -07:00
Yi-Sheng Lien b8f3776f24
KAFKA-15155: Follow PEP 8 best practice in Python to check if a container is empty (#13974)
Reviewers: Divij Vaidya <diviv@amazon.com>
2023-07-11 11:01:50 +02:00
vamossagar12 c14f56b484
KAFKA-14586: Moving StreamResetter to tools (#13127)
Moves StreamResetter to tools project.

Reviewers: Federico Valeri <fedevaleri@gmail.com>, Christo Lolov <lolovc@amazon.com>, Bruno Cadonna <cadonna@apache.org>
2023-03-28 14:43:22 +02:00
Lucas Brutschy 4560978ed7
KAFKA-14309: FK join upgrades not tested with DEV_VERSION (#12760)
The streams upgrade system inserted FK join code for every version of the
the StreamsUpgradeTest except for the latest. Also, the original code
never switched on the `test.run_fk_join` flag for the target version of
the upgrade.

The effect was that FK join upgrades were not tested at all, since
no FK join code was executed after the bounce in the system test.

We introduce `extra_properties` in the system tests, that can be used
to pass any property to the upgrade driver, which is supposed to be
reused by system tests for switching on and off flags (e.g. for the
state restoration code).

Reviewers: Alex Sorokoumov <asorokoumov@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-11-07 15:46:51 -08:00
Nikolay 51b079dca7
KAFKA-12878: Support --bootstrap-server in kafka-streams-application-reset tool (#12632)
Reviewers: Chris Egerton <chrise@aiven.io>
2022-09-19 13:20:41 -04:00
Alex Sorokoumov 78dd40123c
MINOR: Add upgrade tests for FK joins (#12122)
Follow up PR for KAFKA-13769.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-05-13 17:21:27 -07:00
Luke Chen f28a2ee918
MINOR: revert back to 60s session timeout for static membership test (#11881)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-04-21 11:51:31 -07:00
John Roesler 45ecaa19f8
MINOR: Set session timeout back to 10s for Streams system tests (#11236)
We increased the default session timeout to 30s in KIP-735:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout

Since then, we are observing sporadic system test failures
due to rebalances taking longer than the test timeout.
Rather than increase the test wait times, we can just override
the session timeout to a value more appropriate in the testing
domain.

Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>
2021-08-20 11:27:54 -05:00
Kamal Chandraprakash a103c95a31
KAFKA-12724: Add 2.8.0 to system tests and streams upgrade tests. (#10602)
Also adjusted the acceptable recovery lag to stabilize Streams tests.

Reviewers: Justine Olshan <jolshan@confluent.io>, Matthias J. Sax <mjsax@apache.org>, John Roesler <vvcephei@apache.org>
2021-08-04 17:31:10 -05:00
Chia-Ping Tsai 29c55fdbbc
MINOR: set replication.factor to 1 to make StreamsBrokerCompatibilityService work with old broker (#10673)
Reviewers: Matthias J. Sax <mjsax@conflunet.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2021-05-14 13:51:31 +08:00
Nikolay 4e65030e05
KAFKA-10402: Upgrade system tests to python3 (#9196)
For now, Kafka system tests use python2 which is outdated and not supported.
This PR upgrades python to the third version.

Reviewers: Ivan Daschinskiy, Mickael Maison <mickael.maison@gmail.com>, Magnus Edenhill <magnus@edenhill.se>, Guozhang Wang <wangguoz@gmail.com>
2020-10-07 09:41:30 -07:00
Chia-Ping Tsai 6953161125
KAFKA-10191 fix flaky StreamsOptimizedTest (#8913)
Call KafkaStreams#cleanUp to reset local state before starting application up the second run.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>, John Roesler <john@confluent.io>
2020-07-07 12:48:36 -05:00
John Roesler 3b2ae7b95a
KAFKA-10173: Use SmokeTest for upgrade system tests (#8938)
Replaces the previous upgrade test's trivial Streams app
with the commonly used SmokeTest, exercising many more
features. Also adjust the test matrix to test upgrading
from each released version since 2.2 to the current branch.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-07-02 18:14:46 -05:00
Bruno Cadonna f3a9ce4a69
MINOR: Do not swallow exception when collecting PIDs (#8914)
During Streams' system tests the PIDs of the Streams
clients are collected. The method the collects the PIDs
swallows any exception that might be thrown by the
ssh_capture() function. Swallowing any exceptions
might make the investigation of failures harder,
because no information about what happened are recorded.

Reviewers: John Roesler <vvcephei@apache.org>
2020-06-30 12:18:23 -05:00
John Roesler 2cff1fab3f
KAFKA-6145: KIP-441: Fix assignor config passthough (#8716)
Also fixes a system test by configuring the HATA to perform a one-shot balanced assignment

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>
2020-05-27 13:50:12 -05:00
John Roesler 5bb3415c77
KAFKA-6145: KIP-441: Add TaskAssignor class config (#8541)
* add a config to set the TaskAssignor
* set the default assignor to HighAvailabilityTaskAssignor
* fix broken tests (with some TODOs in the system tests)

Implements: KIP-441
Reviewers: Bruno Cadonna <bruno@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
2020-04-28 15:57:11 -05:00
Matthias J. Sax 17f9879261
KAFKA-9832: extend Kafka Streams EOS system test (#8440)
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-04-15 13:13:23 -07:00
Matthias J. Sax 20e4a74c35
KAFKA-9832: Extend Streams system tests for EOS-beta (#8443)
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-04-10 11:55:01 -07:00
Matthias J. Sax 6ad5407350
KAFKA-9719: Streams with EOS-beta should fail fast for older brokers (#8367)
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-03-30 15:21:27 -07:00
Bruno Cadonna 1d21cf166a KAFKA-9305: Add version 2.4 to Streams system tests (#7841)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2019-12-20 14:21:12 -08:00
Boyang Chen 465f810730 KAFKA-8972 (2.4 blocker): correctly release lost partitions during consumer.unsubscribe() (#7441)
Inside onLeavePrepare we would look into the assignment and try to revoke the owned tasks and notify users via RebalanceListener#onPartitionsRevoked, and then clear the assignment.

However, the subscription's assignment is already cleared in this.subscriptions.unsubscribe(); which means user's rebalance listener would never be triggered. In other words, from consumer client's pov nothing is owned after unsubscribe, but from the user caller's pov the partitions are not revoked yet. For callers like Kafka Streams which rely on the rebalance listener to maintain their internal state, this leads to inconsistent state management and failure cases.

Before KIP-429 this issue is hidden away since every time the consumer re-joins the group later, it would still revoke everything anyways regardless of the passed-in parameters of the rebalance listener; with KIP-429 this is easier to reproduce now.

Our fixes are following:

• Inside unsubscribe, first do onLeavePrepare / maybeLeaveGroup and then subscription.unsubscribe. This we we are guaranteed that the streams' tasks are all closed as revoked by then.
• [Optimization] If the generation is reset due to fatal error from join / hb response etc, then we know that all partitions are lost, and we should not trigger onPartitionRevoked, but instead just onPartitionsLost inside onLeavePrepare. This is because we don't want to commit for lost tracks during rebalance which is doomed to fail as we don't have any generation info.

Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2019-10-29 10:41:25 -07:00
Bill Bejeck c015169aa6
MINOR: Streams upgrade system test cleanup (#7571)
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>,
2019-10-24 10:28:29 -04:00
Bill Bejeck 6afe05fe89
MINOR: system test clean up (#7552)
Guozhang Wang <wangguoz@gmail.com>, Sophie Blee-Goldman <sophie@confluent.io>,
2019-10-21 10:51:15 -04:00
Bill Bejeck b62f2a1123 KAFKA-8496: System test for KIP-429 upgrades and compatibility (#7529)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-10-16 22:29:33 -07:00
Matthias J. Sax 4d1ee26a13
KAFKA-8594: Add version 2.3 to Streams system tests (#7131)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>
2019-08-21 10:26:57 -07:00
Boyang Chen cca05cace4 KAFKA-8331: stream static membership system test (#6877)
As title suggested, we boost 3 stream instances stream job with one minute session timeout, and once the group is stable, doing couple of rolling bounces for the entire cluster. Every rejoin based on restart should have no generation bump on the client side.

Reviewers: Guozhang Wang <wangguoz@gmail.com>,  Bill Bejeck <bbejeck@gmail.com>
2019-06-07 16:52:12 -04:00
Matthias J. Sax ba3dc49437
KAFKA-8155: Add 2.2.0 release to system tests (#6597)
Reviewers: Bill Bejeck <bill@confluent.io>, Boyang Chen <boyang@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confuent.io>
2019-06-03 21:09:58 -07:00
John Roesler 8e97540071 KAFKA-7944: Improve Suppress test coverage (#6382)
* add a normal windowed suppress with short windows and a short grace
period
* improve the smoke test so that it actually verifies the intended
conditions

See https://issues.apache.org/jira/browse/KAFKA-7944

Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2019-03-12 09:53:29 -07:00
Bill Bejeck b1b792d9a8 MINOR: Add 2.1 version metadata upgrade (#6111)
Updated the test_metadata_upgrade test. To enable using the 2.1 version I needed to add config change to the StreamsUpgradeTestJobRunnerService to ensure the ductape passes proper args when starting the StreamsUpgradeTest

For testing, I ran the test_metadata_upgrade test and all versions now pass http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2019-01-09--001.1547049873--bbejeck--MINOR_add_2_1_version_metadata_upgrade--a450c68/report.html

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-01-09 15:19:00 -08:00
Bill Bejeck ab1fb3fdde MINOR: Adding system test for named repartition topics (#5913)
This is a system test for doing a rolling upgrade of a topology with a named repartition topic.

1. An initial Kafka Streams application is started on 3 nodes. The topology has one operation forcing a repartition and the repartition topic is explicitly named.
2. Each node is started and processing of data is validated
3. Then one node is stopped (full stop is verified)
4. A property is set signaling the node to add operations to the topology before the repartition node which forces a renumbering of all operators (except repartition node)
5. Restart the node and confirm processing records
6. Repeat the steps for the other 2 nodes completing the rolling upgrade

I ran two runs of the system test with 25 repeats in each run for a total of 50 test runs.
All test runs passed

Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2018-12-03 12:37:31 -08:00
Bill Bejeck dfd545485a MINOR: Add system test for optimization upgrades (#5912)
This is a new system test testing for optimizing an existing topology. This test takes the following steps

1. Start a Kafka Streams application that uses a selectKey then performs 3 groupByKey() operations and 1 join creating four repartition topics
2. Verify all instances start and process data
3. Stop all instances and verify stopped
4. For each stopped instance update the config for TOPOLOGY_OPTIMIZATION to all then restart the instance and verify the instance has started successfully also verifying Kafka Streams reduced the number of repartition topics from 4 to 1
5. Verify that each instance is processing data from the aggregation, reduce, and join operation
Stop all instances and verify the shut down is complete.
6. For testing I ran two passes of the system test with 25 repeats for a total of 50 test runs.

All test runs passed

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2018-11-27 13:07:34 -08:00
Bill Bejeck b74e7e407c MINOR: Enable ignored upgrade system tests - trunk (#5605)
Removed ignore annotations from the upgrade tests. This PR includes the following changes for updating the upgrade tests:

* Uploaded new versions 0.10.2.2, 0.11.0.3, 1.0.2, 1.1.1, and 2.0.0 (in the associated scala versions) to kafka-packages
* Update versions in version.py, Dockerfile, base.sh
* Added new versions to StreamsUpgradeTest.test_upgrade_downgrade_brokers including version 2.0.0
* Added new versions StreamsUpgradeTest.test_simple_upgrade_downgrade test excluding version 2.0.0
* Version 2.0.0 is excluded from the streams upgrade/downgrade test as StreamsConfig needs an update for the new version, requiring a KIP. Once the community votes the KIP in, a minor follow-up PR can be pushed to add the 2.0.0 version to the upgrade test.
* Fixed minor bug in kafka-run-class.sh for classpath in upgrade/downgrade tests across versions.
* Follow on PRs for 0.10.2x, 0.11.0x, 1.0.x, 1.1.x, and 2.0.x will be pushed soon with the same updates required for the specific version.

Reviewers: Eno Thereska <eno.thereska@gmail.com>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <matthias@confluent.io>
2018-09-13 13:46:47 -07:00
Matthias J. Sax d166485be1
KAFKA-6054: Add 'version probing' to Kafka Streams rebalance (#4636)
implements KIP-268

Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2018-05-30 22:39:42 -07:00
Bill Bejeck c6fd3d488e MINOR: update VerifiableProducer to send keys if configured and removed StreamsRepeatingKeyProducerService (#4841)
This PR does the following:

* Remove the StreamsRepeatingIntegerKeyProducerService and the associated Java class
* Add a parameter to VerifiableProducer.java to enable sending keys when specified
* Update the corresponding Python file verifiable_producer.py to support the new parameter.

Reviewers: Matthias J Sax <matthias@confluentio>, Guozhang Wang <wangguoz@gmail.com>
2018-04-27 22:12:57 -07:00
Bill Bejeck e7f019690a MINOR: Fixes for streams system tests (#4935)
This PR fixes some regressions introduced into streams system tests and sets the upgrade tests to ignore until PR #4636 is merged as it has the fixes for the upgrade tests.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2018-04-26 10:04:59 -07:00
Matthias J. Sax cae42215b7
KAFKA-6054: Update Kafka Streams metadata to version 3 (#4880)
- adds Streams upgrade tests for 1.1 release
 - introduces metadata version 3

Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2018-04-18 09:38:27 +02:00
Guozhang Wang 0dc7f0e66f
KAFKA-6611, PART II: Improve Streams SimpleBenchmark (#4854)
SimpleBenchmark:

1.a Do not rely on manual num.records / bytes collection on atomic integers.
1.b Rely on config files for num.threads, bootstrap.servers, etc.
1.c Add parameters for key skewness and value size.
1.d Refactor the tests for loading phase, adding tumbling-windowed count.
1.e For consumer / consumeproduce, collect metrics on consumer instead.
1.f Force stop the test after 3 minutes, this is based on empirical numbers of 10M records.

Other tests: use config for kafka bootstrap servers.

streams_simple_benchmark.py: only use scale 1 for system test, remove yahoo from benchmark tests.

Note that the JMX based metrics is more accurate than the manually collected metrics. 

Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2018-04-15 10:15:31 -07:00
Matthias J. Sax 0c0d8363e5
KAFKA-6054: Fix upgrade path from Kafka Streams v0.10.0 (#4779)
Reviewers: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Damian Guy <damian@confluent.io>
2018-04-06 17:00:52 -07:00
Guozhang Wang f2fbfaaccc
KAFKA-6611: PART I, Use JMXTool in SimpleBenchmark (#4650)
1. Use JmxMixin for SimpleBenchmark (will remove the self reporting in #4744), only when loading phase is false (i.e. we are in fact starting the streams app).

2. Reported the full jmx reported metrics in log files, and in the returned data only return the max values: this is because we want to skip the warming up and cooling down periods that will have lower rate numbers, while max represents the actual rate at full speed.

3. Incorporates two other improves to JMXTool: #1241 and #2950

Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Rohan Desai <desai.p.rohan@gmail.com>
2018-03-22 16:46:56 -07:00
Guozhang Wang 0f364cd53a
MINOR: Pass a streams config to replace the single state dir (#4714)
This is a general change and is re-requisite to allow streams benchmark test with different streams tests. For the streams benchmark itself I will have a separate PR for switching configs. Details:

1. Create a "streams.properties" file under PERSISTENT_ROOT before all the streams test. For now it will only contain a single config of state.dir pointing to PERSISTENT_ROOT.

2. For all the system test related code, replace the main function parameter of state.dir with propsFilename, then inside the function load the props from the file and apply overrides if necessary.

3. Minor fixes.

Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
2018-03-19 14:17:00 -07:00
John Roesler 7006d0f58b MINOR: Streams system tests fixes/updates (#4689)
Some changes required to get the Streams system tests working via Docker

To test:

TC_PATHS="tests/kafkatest/tests/streams" bash tests/docker/run_tests.sh

That command will take about 3.5 hours, and should pass. Note there are a couple of ignored tests.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
2018-03-15 14:42:43 -07:00
Bill Bejeck 8a7d7e7955 MINOR: Add System test for standby task-rebalancing (#4554)
Author: Bill Bejeck <bill@confluent.io>

Reviewers: Damian Guy <damian@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2018-03-05 11:06:32 -08:00
Matthias J. Sax 0b3b6049f0
MINOR: Fix Streams EOS system tests (#4572)
Avoid loosing log/stdout/stderr files on restart
Reenables tests

Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
2018-02-16 13:13:13 -08:00
Bill Bejeck 67803384d9 MINOR: adding system tests for how streams functions with broker faiures (#4513)
System test for two cases:

* Starting a multi-node streams application with the broker down initially, broker starts and confirm rebalance completes and streams application still able to process records.

* Multi-node streams app running, broker goes down, stop stream instance(s) confirm after broker comes back remaining streams instance(s) still function.

Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2018-02-07 17:21:35 -08:00
Bill Bejeck f3b9afe622 MINOR: Broker down for significant amt of time system test
System test where a broker is offline more than the configured timeouts.  In this case:
- Max poll interval set to 45 secs
- Retries set to 2
- Request timeout set to 15 seconds
- Max block ms set to 30 seconds

The broker was taken off-line for 70 seconds or more than double request timeout * num retries

[passing system test results](http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-12-11--001.1513034559--bbejeck--KSTREAMS_1179_broker_down_for_significant_amt_of_time--6ab4802/report.html)

Author: Bill Bejeck <bill@confluent.io>

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

Closes #4313 from bbejeck/KSTREAMS_1179_broker_down_for_significant_amt_of_time
2017-12-19 15:37:21 -08:00