Updated the System test `stream_broker_compatibility_test.py` to address system test failures as we have removed explicit broker version checking
- Ignore the `0.8.2.2` and `0.9.0.0` tests because the `NetworkClient` only logs `UnsupportedVersionException`s that occur and will continue to retry connecting. Once issue https://issues.apache.org/jira/browse/KAFKA-6297 is addressed, we may re-enable these tests.
- Updated existing tests expected error messages
- Updated Streams code in test for to make sure we fail fast for incompatible brokers
Author: Bill Bejeck <bill@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4286 from bbejeck/MINOR_fix_broker_compatibility_tests
`topics.regex` was added in KAFKA-3073. This change fixes the test that invokes `/validate` to ensure that all the configdefs are returned as expected.
Author: Mikkin <mikkin@confluent.io>
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4279 from mikkin/KAFKA-6284
Fix an omission where Kibosh was not getting installed on Vagrant
instances running in AWS.
Fix an issue where the Dockerfile was unable to download old Apache
Kafka releases. See the discussion on KAFKA-6233.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4240 from cmccabe/KAFKA-6247
For ducktape: add Kibosh to the testing Dockerfile.
Create files_unreadable_fault_spec.py.
For trogdor: create FilesUnreadableFaultSpec.java.
Add a unit test of using the Kibosh service.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4195 from cmccabe/KAFKA-5811
Previously, Trogdor only handled "Faults." Now, Trogdor can handle
"Tasks" which may be either faults, or workloads to execute in the
background.
The Agent and Coordinator have been refactored from a
mutexes-and-condition-variables paradigm into a message passing
paradigm. No locks are necessary, because only one thread can access
the task state or worker state. This makes them a lot easier to reason
about.
The MockTime class can now handle mocking deferred message passing
(adding a message to an ExecutorService with a delay). I added a
MockTimeTest.
MiniTrogdorCluster now starts up Agent and Coordinator classes in
paralle in order to minimize junit test time.
RPC messages now inherit from a common Message.java class. This class
handles implementing serialization, equals, hashCode, etc.
Remove FaultSet, since it is no longer necessary.
Previously, if CoordinatorClient or AgentClient hit a networking
problem, they would throw an exception. They now retry several times
before giving up. Additionally, the REST RPCs to the Coordinator and
Agent have been changed to be idempotent. If a response is lost, and
the request is resent, no harm will be done.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#4073 from cmccabe/KAFKA-6060
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#4128 from mjsax/minor-cleanup
minor fix
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Alex Ayars <alex.ayars@confluent.io>
Closes#4084 from cmccabe/KAFKA-6070
With these changes, we are ensuring that the partitions being reassigned are from non-zero offsets. We also ensure that every message in the log has producerId and sequence number.
This means that it successfully reproduces https://issues.apache.org/jira/browse/KAFKA-6003.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#4029 from apurvam/KAFKA-6016-add-idempotent-producer-to-reassign-partitions
- improve tests to get rid of calls to `sleep` in Python
- fixed some flaky test conditions
- improve debugging
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3542 from mjsax/failing-eos-system-tests
To check ordering, we augment the existing transactions test to read and write from topics with one partition. Since we are writing monotonically increasing numbers, the topics should always be sorted, making it very easy to check for out of order messages.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3969 from apurvam/KAFKA-5888-system-test-which-check-ordering
The string representation of TopicPartition was changed to be
{topic}-{partitition} consistently in the following commit:
f6f56a645b
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3890 from ijuma/fix-replica-verification-test
Added some tips for running a single test file, test class and/or test method on the documentation landing page about tests
Author: Paolo Patierno <ppatierno@live.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3577 from ppatierno/minor-tests-doc
Also throw an exception if a null keystore type is seen
in `SecurityStore`. This should never happen.
The default keystore type has changed in Java 9 (
http://openjdk.java.net/jeps/229), so we need to
be explicit to have consistent behaviour across
Java versions.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3808 from ijuma/set-jks-explicitly-in-system-tests
The previous timeout was 10 seconds, but system test failures have occurred when Zookeeper has started after about 11 seconds. Increasing the timeout to 30 seconds, since most of the time this extra time will not be required, and when it is it will prevent a failed system test.
In addition to merging to `trunk`, please backport to the `0.11.x` and `0.10.2.x` branches.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#3774 from rhauch/MINOR-Increase-timeout-of-zookeeper-service-in-system-tests
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3715 from cmccabe/kafka_service_print_node_hostname_on_failure
We are occasionally hitting some timeouts due to processing not finishing. So rather than failing the build for these reasons it would be better to reduce the runtime.
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3725 from dguy/fix-system-test
When a Connect distributed worker starts up talking with broker versions 0.10.1.0 and later, it will use the AdminClient to look for the internal topics and attempt to create them if they are missing. Although the AdminClient was added in 0.11.0.0, the AdminClient uses APIs to create topics that existed in 0.10.1.0 and later. This feature works as expected when Connect uses a broker version 0.10.1.0 or later.
However, when a Connect distributed worker starts up using a broker older than 0.10.1.0, the AdminClient is not able to find the required APIs and thus will throw an UnsupportedVersionException. Unfortunately, this exception is not caught and instead causes the Connect worker to fail even when the topics already exist.
This change handles the UnsupportedVersionException by logging a debug message and doing nothing. The existing producer logic will get information about the topics, which will cause the broker to create them if they don’t exist and broker auto-creation of topics is enabled. This is the same behavior that existed prior to 0.11.0.0, and so this change restores that behavior for brokers older than 0.10.1.0.
This change also adds a system test that verifies Connect works with a variety of brokers and is able to run source and sink connectors. The test verifies that Connect can read from the internal topics when the connectors are restarted.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#3641 from rhauch/kafka-5704
ewencp would be great to cherry-pick this back into 0.11.x if possible
Author: Xavier Léauté <xavier@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#3645 from xvrl/system-test-cluster-id
Added handling of _DUCKTAPE_OPTIONS (mainly for enabling debugging)
Author: Paolo Patierno <ppatierno@live.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#3578 from ppatierno/kafka-5643
Support a --custom-ducktape flag which allows developers to install
their own versions of ducktape into Docker images. This is helpful for
ducktape development.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Ismael Juma <ismael@juma.me.uk>
Closes#3539 from cmccabe/KAFKA-5602
The bump from 0.11.1.0-SNAPSHOT to 1.0.0-SNAPSHOT broke a couple
of system tests:
* TestVerifiableProducer.test_simple_run
* KafkaVersionTest.test_multi_version
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3587 from ijuma/fix-_kafka_jar_versions
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Jiangjie Qin <becket.qin@gmail.com>, Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Onur Karaman <okaraman@linkedin.com>
Closes#2929 from lindong28/KAFKA-4763
Waiting for the first line of output was added in KAFKA-2527 when JmxMixin was originally added as a heuristic to
determine when the process was ready. We've since determined this is not good enough given JmxTool's limitations
and now include a separate, more reliable check before starting JmxTool. This check is also dangerous since a
consumer that is started before data is available in the topic, it won't output anything to stdout and only logs
errors to a separate log file. This means we may have a long delay between starting the process and starting JMX
monitoring.
Since we have a more reliable check for liveness via JMX now (and in cases that need it, partition assignment
metrics via JMX), we should no longer need to wait for the first line of output.
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>
Closes#3447 from ewencp/dont-wait-first-line-console-consumer
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno.thereska@gmail.com>, Damian Guy <damian.guy@gmail.com>
Closes#3497 from mjsax/disable-flaky-system-tests
Update system tests to make use of the newly released 0.11 version.
Add on to https://github.com/apache/kafka/pull/3454
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy <damian.guy@gmail.com>
Closes#3479 from enothereska/minor-compatibility-tests
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Eno Thereska <eno.thereska@gmail.com>, Ewen Cheslack-Postava <me@ewencp.org>
Closes#3454 from ijuma/test-upgrades-from-0.11.0.x
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#3310 from mjsax/kafka-5362-add-eos-system-tests-for-streams-api
-Tests for rolling upgrades for a streams app (keeping broker config fixed)
-Tests for rolling upgrades of brokers (keeping streams app config fixed)
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>
Closes#3411 from enothereska/KAFKA-5487-upgrade-test-streams
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Joseph Rea <jrea@users.noreply.github.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2575 from mjsax/minor-update-system-test-readme
Increased the timeout from 30sec to 60sec. When running the system tests with packaged Kafka, Connect workers can take about 30seconds to start.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#3344 from rhauch/KAFKA-5450
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3340 from hachikuji/add-random-aborts-to-system-test
When the message copier hangs (like when there is a bug in the client), it ignores the sigterm and doesn't shut down. this leaves the cluster in an unclean state causing future tests to fail.
In this patch we always send SIGKILL when cleaning the node if the process isn't already dead. This is consistent with the other services.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3308 from apurvam/KAFKA-5437-force-kill-message-copier-on-cleanup
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#3201 from mjsax/kafka-5362-add-eos-system-tests-for-streams-api
We need this to debug most issues with the transactions system test.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3261 from apurvam/MINOR-set-log-level-for-producer-to-trace-for-transactions-test
The log_level parameter is used in system tests in kafka.py. However the log4j template accepted that parameter in only one place. This led to a large number of DEBUG lines printed even when the intention was to capture only INFO lines. Led to huge log files. Thanks to ijuma for noticing this.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3247 from enothereska/minor-log4j-template-fix
This currently fails in multiple ways. One of which is most likely KAFKA-5355, where the concurrent consumer reads duplicates.
During broker bounces, the concurrent consumer misses messages completely. This is another bug.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3217 from apurvam/KAFKA-5366-add-concurrent-reads-to-transactions-system-test
Also update message format tests now that we have a third message
format.
Finally, set group.initial.rebalance.delay.ms=100.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#2701 from ijuma/update-upgrade-tests-for-0.11
It avoids the need to handle protocol downgrades and it's safe (i.e. it will never cause
the auto creation of topics).
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3220 from ijuma/kafka-5374-admin-client-metadata
- add broker compatibility system tests
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy, Eno Thereska, Guozhang Wang
Closes#2974 from mjsax/kafka-4923-add-eos-to-streams-add-broker-check-and-system-test
## Fixes race condition in TestVerifiableProducer sanity test:
The test starts a producer, waits for at least 5 acks, and then
logs in to the worker to grep for the producer process to figure
out what version it is running.
The problem was that the producer was set up to produce 1000 messages
at a rate of 1000 msgs/s and then exit. This means it will have a
typical runtime slightly above 1 second.
Logging in to the vagrant instance might take longer than that thus
resulting in the process grep to fail, failing the test.
This commit doesn't really fix the issue - a proper fix would be to tell
the producer to stick around until explicitly killed - but it increases
the chances of the test passing, at the expense of a slightly longer
runtime.
## Improves error reporting when is_version() fails
Author: Magnus Edenhill <magnus@edenhill.se>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2765 from edenhill/trunk
The backing store for offsets, status, and configs now attempts to use the new AdminClient to look up the internal topics and create them if they don’t yet exist. If the necessary APIs are not available in the connected broker, the stores fall back to the old behavior of relying upon auto-created topics. Kafka Connect requires a minimum of Apache Kafka 0.10.0.1-cp1, and the AdminClient can work with all versions since 0.10.0.0.
All three of Connect’s internal topics are created as compacted topics, and new distributed worker configuration properties control the replication factor for all three topics and the number of partitions for the offsets and status topics; the config topic requires a single partition and does not allow it to be set via configuration. All of these new configuration properties have sensible defaults, meaning users can upgrade without having to change any of the existing configurations. In most situations, existing Connect deployments will have already created the storage topics prior to upgrading.
The replication factor defaults to 3, so anyone running Kafka clusters with fewer nodes than 3 will receive an error unless they explicitly set the replication factor for the three internal topics. This is actually desired behavior, since it signals the users that they should be aware they are not using sufficient replication for production use.
The integration tests use a cluster with a single broker, so they were changed to explicitly specify a replication factor of 1 and a single partition.
The `KafkaAdminClientTest` was refactored to extract a utility for setting up a `KafkaAdminClient` with a `MockClient` for unit tests.
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2984 from rhauch/kafka-4667
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2745 from hachikuji/add-replication-testcase-for-compression
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Eno Thereska <eno@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2897 from mjsax/KAFKA-4564-follow-up
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma, Eno Thereska, Matthias J. Sax, Guozhang Wang
Closes#2837 from mjsax/kafka-4564-fail-fast-test-stream-compatibility
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Magnus Edenhill, Eno Thereska, Damian Guy, Guozhang Wang
Closes#2836 from mjsax/minor-broker-comp-test
This PR replaces https://github.com/apache/kafka/pull/2743 (just raising from Confluent repo)
This PR describes the addition of Partition Level Leader Epochs to messages in Kafka as a mechanism for fixing some known issues in the replication protocol. Full details can be found here:
[KIP-101 Reference](https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation)
*The key elements are*:
- Epochs are stamped on messages as they enter the leader.
- Epochs are tracked in both leader and follower in a new checkpoint file.
- A new API allows followers to retrieve the leader's latest offset for a particular epoch.
- The logic for truncating the log, when a replica becomes a follower, has been moved from Partition into the ReplicaFetcherThread
- When partitions are added to the ReplicaFetcherThread they are added in an initialising state. Initialising partitions request leader epochs and then truncate their logs appropriately.
This test provides a good overview of the workflow `EpochDrivenReplicationProtocolAcceptanceTest.shouldFollowLeaderEpochBasicWorkflow()`
The corrupted log use case is covered by the test
`EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards()`
Remaining work: There is a do list here: https://docs.google.com/document/d/1edmMo70MfHEZH9x38OQfTWsHr7UGTvg-NOxeFhOeRew/edit?usp=sharing
Author: Ben Stopford <benstopford@gmail.com>
Author: Jun Rao <junrao@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2808 from benstopford/kip-101-v2
Several fixes for handling broker failures:
- default replication value for internal topics is now 3 in test itself (not in streams code, that will require a KIP.
- streams producer waits for acks from all replicas in test itself (not in streams code, that will require a KIP.
- backoff time for streams client to try again after a failure to contact controller.
- fix bug related to state store locks (this helps in multi-threaded scenarios)
- fix related to catching exceptions property for network errors.
- system test for all the above
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Dan Norwood <norwood@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2719 from enothereska/KAFKA-4916-broker-bounce-test
This is from the KIP-98 proposal.
The main points of discussion surround the correctness logic, particularly the Log class where incoming entries are validated and duplicates are dropped, and also the producer error handling to ensure that the semantics are sound from the users point of view.
There is some subtlety in the idempotent producer semantics. This patch only guarantees idempotent production upto the point where an error has to be returned to the user. Once we hit a such a non-recoverable error, we can no longer guarantee message ordering nor idempotence without additional logic at the application level.
In particular, if an application wants guaranteed message order without duplicates, then it needs to do the following in the error callback:
1. Close the producer so that no queued batches are sent. This is important for guaranteeing ordering.
2. Read the tail of the log to inspect the last message committed. This is important for avoiding duplicates.
Author: Apurva Mehta <apurva@confluent.io>
Author: hachikuji <jason@confluent.io>
Author: Apurva Mehta <apurva.1618@gmail.com>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: fpj <fpj@apache.org>
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
Closes#2735 from apurvam/exactly-once-idempotent-producer
See the JIRA for the full details. Essentially the test assertions depend on receiving reliable events from the consumer processes, but this is not generally possible in the presence of a hard failure (i.e. `kill -9`). Until we solve this problem, the hard failure scenarios will be turned off.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2771 from hachikuji/KAFKA-4689
- Improves streams efficiency by more than 200K requests/second (small 100 byte requests)
- Gets streams efficiency very close to pure consumer (see results in https://jenkins.confluent.io/job/system-test-kafka-branch-builder/746/console)
- Maintains same fairness across tasks
- Schedules all records in the queue in-between poll() calls, not just one per task.
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Matthias J. Sax, Guozhang Wang
Closes#2643 from enothereska/minor-schedule-round-robin
The transient failures make it harder to spot real failures and we can live without what is being tested (adding security to ZK via a rolling upgrade) until KIP-101 lands.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>
Closes#2742 from ijuma/disable-zk-upgrade-test
Improves the reliability of this test by decreasing the lower bound (this is to be expected as throttling takes the first fetch to stabilise and will "over-fetch" for this first request)
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2698 from benstopford/throttling-test-fix
ijuma ewencp cmccabe harshach Please review.
Here is a sample run:
https://travis-ci.org/raghavgautam/kafka/builds/191714520
In this run 214 tests were run and 144 tests passed.
I will open separate jiras for fixing failures.
Author: Raghav Kumar Gautam <raghav@apache.org>
Reviewers: Sriharsha Chintalapani <harsha@hortonworks.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2376 from raghavgautam/trunk
There were some minor differences in the basic consumer config and streams config that are now rectified. In addition, in AWS environments the socket size makes a big difference to performance and I've tuned it up accordingly. I've also increased the number of records now that perf is higher.
Author: Eno Thereska <eno@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2634 from enothereska/minor-standardize-params
Fix tests/docker/Dockerfile to put the old Kafka distributions in the
correct spot for tests. Also, run_tests.sh should exit with an error
code if image rebuilding fails, rather than silently falling back to an
older image.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2613 from cmccabe/dockerfix
There won't be a 0.10.3.0.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2628 from ijuma/bump-version-to-0.11.0.0-SNAPSHOT
…0.x and not 0.8
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2602 from cmccabe/KAFKA-4809
The phase_two security upgrade test verifies upgrading inter-broker and client protocols to the same value as well as different values. The second case currently changes inter-broker protocol without first enabling the protocol, disrupting produce/consume until the whole cluster is updated. This commit changes the test to be a non-disruptive upgrade test that enables protocols first (simulating phase one of upgrade).
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2589 from rajinisivaram/KAFKA-4779
The throttling system test sometimes fail because it takes longer than the current 10 second time out for partitions to get assigned to the consumer.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2567 from apurvam/increase-timeout-for-partitions-assigned
This PR fixes a blocker issue, where the streams client code cannot talk to the controller. It also enables a system test that was previously failing.
This PR is for trunk only. A separate PR with just the fix (but not the tests) will be created for 0.10.2.
Author: Eno Thereska <eno@confluent.io>
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Damian Guy, Ismael Juma, Matthias J. Sax, Guozhang Wang
Closes#2522 from enothereska/KAFKA-4716-metadata
This caused the bounce and smoke tests to fail on trunk.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2524 from enothereska/hotfix-tests
Author: Eno Thereska <eno.thereska@gmail.com>
Author: Eno Thereska <eno@confluent.io>
Author: Ubuntu <ubuntu@ip-172-31-22-146.us-west-2.compute.internal>
Reviewers: Matthias J. Sax, Guozhang Wang
Closes#2478 from enothereska/minor-benchmark-args
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Closes#2508 from ewencp/minor-streams-compatibility-trunk-dev-branch
With this change, the consumer will be considered initialized in the
ProduceConsumeValidate tests once its partitions have been assigned.
Author: Apurva Mehta <apurva.1618@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2347 from apurvam/KAFKA-4588-fix-race-between-producer-consumer-start
Kafka brokers have a config called "offsets.topic.replication.factor" that specify the replication factor for the "__consumer_offsets" topic. The problem is that this config isn't being enforced. If an attempt to create the internal topic is made when there are fewer brokers than "offsets.topic.replication.factor", the topic ends up getting created anyway with the current number of live brokers. The current behavior is pretty surprising when you have clients or tooling running as the cluster is getting setup. Even if your cluster ends up being huge, you'll find out much later that __consumer_offsets was setup with no replication.
The cluster not meeting the "offsets.topic.replication.factor" requirement on the internal topic is another way of saying the cluster isn't fully setup yet.
The right behavior should be for "offsets.topic.replication.factor" to be enforced. Topic creation of the internal topic should fail with GROUP_COORDINATOR_NOT_AVAILABLE until the "offsets.topic.replication.factor" requirement is met. This closely resembles the behavior of regular topic creation when the requested replication factor exceeds the current size of the cluster, as the request fails with error INVALID_REPLICATION_FACTOR.
Author: Onur Karaman <okaraman@linkedin.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2177 from onurkaraman/KAFKA-3959
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Damian Guy, Eno Thereska, Guozhang Wang
Closes#2403 from mjsax/addStreamsClientCompatibilityTest
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2390 from cmccabe/KAFKA-4630
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2424 from cmccabe/KAFKA-4688
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2426 from hachikuji/improve-consumer-test-error-messages
Renames `HoistToStruct` SMT to `HoistField`.
Adds the following SMTs:
`ExtractField`
`MaskField`
`RegexRouter`
`ReplaceField`
`SetSchemaMetadata`
`ValueToKey`
Adds HTML doc generation and updates to `connect.html`.
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2374 from shikhar/more-smt
Switched console_consumer, verifiable_consumer and verifiable_producer to use new sasl.jaas_config property instead of static JAAS configuration file when used with SASL_PLAINTEXT.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2323 from rajinisivaram/KAFKA-4580
Runs sanity test and one replication test using SASL/SCRAM.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2355 from rajinisivaram/KAFKA-4590
This way, if the ${KAFKA_NUM_CONTAINERS} is changed in docker/run_tests.sh, the json is still valid
Author: Emanuele Cesena <emanuele.cesena@gmail.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2370 from 0x0ece/patch-1
Besides API and runtime changes, this PR also includes 2 data transformations (`InsertField`, `HoistToStruct`) and 1 routing transformation (`TimestampRouter`).
There is some gnarliness in `ConnectorConfig` / `ConfigDef` around creating, parsing and validating a dynamic `ConfigDef`.
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2299 from shikhar/smt-2017
Otherwise in this test the sink task goes through the pause/resume cycle with 0 assigned partitions, since the default metadata refresh interval is quite long
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2313 from shikhar/kafka-4575
In reality, we’ll only test older brokers after KAFKA-4462 is fully implemented.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Apurva Mehta <apurva.1618@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#2263 from cmccabe/KAFKA-4508
At present, the test is fragile in the sense that the console consumer
has to start and be initialized before the verifiable producer begins
producing in the produce-consume-validate loop.
If this doesn't happen, the consumer will miss messages at the head of
the log and the test will fail.
At present, the consumer is considered inited once it has a PID. This is
a weak assumption. The plan is to poll appropriate metrics (like
partition assignment), and use those as a proxy for consumer
initialization. That work will be tracked in a separate ticket. For now,
we will disable the tests so that we can get the builds healthy again.
Author: Apurva Mehta <apurva.1618@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2278 from apurvam/KAFKA-4526-throttling-test-failures
Author: Ashish Singh <asingh@cloudera.com>
Reviewers: Jason Gustafson <jason@confluent.io>, Colin P. Mccabe <cmccabe@confluent.io>, Dana Powers <dana.powers@gmail.com>, Gwen Shapira <cshapi@gmail.com>, Grant Henke <granthenke@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1251 from SinghAsDev/KAFKA-3600
I ran it 3 times and it works again.
Author: Eno Thereska <eno.thereska@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
Closes#2257 from enothereska/minor-reenable-smoke-test
Updates to take advantage of soon-to-be-released ducktape features.
Author: Geoff Anderson <geoff@confluent.io>
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1834 from granders/systest-parallel-friendly
This reverts commit e035fc0395 for the
following reasons:
1. License files are missing causing local builds to fail during the
rat task (rat is not being run in Jenkins for some reason, filed
KAFKA-4459 for that)
2. It renames a number of system test files when there's a better
way to achieve the goal of running a subset of system tests to stay
under the Travis limit.
3. It adds the gradle wrapper binary even though this was removed
intentionally a while back.
A new PR will be submitted for KAFKA-4345 without the undesired
changes.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2187 from ijuma/kafka-4345-revert
As of now the ducktape tests that we have for kafka are not run for pull request. We can run these test using travis-ci. Here is a sample run:
https://travis-ci.org/raghavgautam/kafka/builds/170574293
Author: Raghav Kumar Gautam <raghav@apache.org>
Reviewers: Sriharsha Chintalapani <harsha@hortonworks.com>
Closes#2064 from raghavgautam/trunk
Added `timeout` and `timeUnit` to `KafkaStreams.close(..)`. Now do close on a thread and `join` that thread with the provided `timeout`.
Changed `state` in `KafkaStreams` to use an enum.
Added system test to ensure we don't deadlock on close when an uncaught exception handler that calls `System.exit(..)` is used and there is also a shutdown hook that calls `KafkaStreams.close(...)`
Author: Damian Guy <damian.guy@gmail.com>
Reviewers: Matthias J. Sax, Eno Thereska, Guozhang Wang
Closes#2097 from dguy/kafka-4366
Update system test method signatures and method calls to use the new consumer by default.
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2060 from vahidhashemian/KAFKA-4211
Since #1911 was merged it is hard to externally test a connector transitioning to FAILED state due to an initialization failure, which is what this test was attempting to verify.
The unit test added in #1778 already exercises exception-handling around Connector instantiation.
Author: Shikhar Bhushan <shikhar@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#2131 from shikhar/test_bad_connector_class
Author: Ben Stopford <benstopford@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2034 from benstopford/throttling-system-test-kafka-changes
In this patch, we test `kafka-reassign-partitions` when throttling is active.
This patch also fixes the following:
1. KafkaService.verify_reassign_partitions did not check whether
partition reassignment actually completed successfully (KAFKA-4204).
This patch works around those shortcomings so that we get the right
signal from this method.
2. ProduceConsumeValidateTest.annotate_missing_messages would call
`pop' on the list of missing messages, causing downstream methods to get
incomplete data. We fix that in this patch as well.
Author: Apurva Mehta <apurva.1618@gmail.com>
Reviewers: Geoff Anderson <geoff@confluent.io>, Ben Stopford <benstopford@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#1904 from apurvam/throttling-tests
Fix existing client-id quota test which currently don't configure quota overrides correctly. Add new tests for user and (user, client-id) quota overrides and default quotas.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#1860 from rajinisivaram/KAFKA-4055