Commit Graph

4838 Commits

Author SHA1 Message Date
John Roesler adbf31ab1d KAFKA-6473: Add MockProcessorContext to public test-utils (#4736)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
2018-03-27 14:03:24 -07:00
Manikumar Reddy O 395c7e0f09 MINOR: Fix ReassignPartitionsClusterTest.testHwAfterPartitionReassignment test (#4781)
Reviewers: Dong Lin <lindong28@gmail.com>, Jason Gustafson <jason@confluent.io>
2018-03-27 13:18:53 -07:00
huxi 9eb32eaad5 KAFKA-6446; KafkaProducer initTransactions() should timeout after max.block.ms (#4563)
Currently the `initTransactions()` API blocks indefinitely if the broker cannot be reached. This patch changes the behavior to raise a `TimeoutException` after waiting for `max.block.ms`. 

Reviewers: Apurva Mehta <apurva@confluent.io>, Jason Gustafson <jason@confluent.io>
2018-03-27 09:21:18 -07:00
Bill Bejeck f4a39f9b83 MINOR: Fix flaky standby task test (#4767)
The standby-task test failed due to standby task distribution not be exactly equal. I think this will be the case from time to time, so I've updated test to make sure the standby task assignment count is not zero.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2018-03-26 16:22:20 -07:00
Vahid Hashemian 3f611044cc KAFKA-6052; Fix WriteTxnMarkers request retry issue in InterBrokerSendThread (#4705)
This resolves the issue found when running the brokers on Windows which prevents the coordinator from sending WriteTxnMarkers requests to complete a transaction.
2018-03-24 14:04:08 -07:00
Rajini Sivaram f66aebff36
KAFKA-6710: Remove Thread.sleep from LogManager.deleteLogs (#4771)
`Thread.sleep` in `LogManager.deleteLogs` potentially blocks a scheduler thread for up to `log.segment.delete.delay.ms` with a default value of a minute. To avoid this, `deleteLogs` now deletes the logs for which `currentDefaultConfig.fileDeleteDelayMs` has elapsed after the delete was scheduled. Logs for which this interval has not yet elapsed are considered for deletion in the next iteration of `deleteLogs`, which is scheduled sooner if required.

Reviewers: Jun Rao <junrao@gmail.com>, Dong Lin <lindong28@gmail.com>, Ted Yu <yuzhihong@gmail.com>
2018-03-24 20:54:09 +00:00
cburroughs 514936af6f MINOR: Remove redundant initialization of `Stats.index` (#4751) 2018-03-24 12:39:10 -07:00
Attila Sasvari 549a5cec1e MINOR: Fix potential resource leak in FileOffsetBackingStore (#4739)
Reviewers: Sandor Murakozi <smurakozi@gmail.com>, Jason Gustafson <jason@confluent.io>
2018-03-24 12:20:11 -07:00
huxi dd78b9fa26 KAFKA-6637; Avoid divide by zero error with segment.ms set to zero (#4698)
Require a minimum value of 1 for `segment.ms` to avoid division by zero when computing random jitter.

Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Jason Gustafson <jason@confluent.io>
2018-03-24 12:14:53 -07:00
Lucas Wang 685fd03dda KAFKA-6612: Added logic to prevent increasing partition counts during topic deletion
This patch adds logic in handling the PartitionModifications event, so that if the partition count is increased when a topic deletion is still in progress, the controller will restore the data of the path /brokers/topics/"topic" to remove the added partitions.

Testing done:
Added a new test method to cover the bug

Author: Lucas Wang <luwang@linkedin.com>

Reviewers: Jiangjie (Becket) Qin <becket.qin@gmail.com>

Closes #4666 from gitlw/prevent_increasing_partition_count_during_topic_deletion
2018-03-23 18:18:16 -07:00
Rajini Sivaram 2307314432
MINOR: Fix encoder config to make DynamicBrokerReconfigurationTest stable (#4764)
DynamicBrokerReconfigurationTest currently assumes that passwords encoded with one secret will fail with an exception if decoded with another secret and configures an old.secret in setUp. This could potentially cause test failures if a password was incorrectly decoded with the wrong secret, since the test writes passwords encoded with the new secret directly to ZooKeeper. Since old.secret is only used in one test for verifying secret rotation, this config can be moved to that test to avoid transient failures.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2018-03-23 17:01:32 +00:00
Ismael Juma 395dcad7d9
MINOR: Java 9/10 fixes, gradle and minor deps update (#4725)
* Added dependencies so that Trogdor and Connect work with Java 9 and 10
* Updated Jacoco to 0.8.1 so that it works with Java 10
* Updated Gradle to 4.6
* A few minor version bumps (not related to Java9/10 fixes)

I tested manually that we can run ./gradlew test with Java 10
after these changes. There are test failures as EasyMock
and PowerMock will have to be updated to use a newer
ASM version. But compiling successfully and most tests
passing is progress. :)

I also tested manually that Trogdor can be started with Java 10.
It previously failed with a ClassNotFoundError.

Reviewers: Jason Gustafson <jason@confluent.io>
2018-03-22 22:01:51 -07:00
Jason Gustafson fcf8781602 KAFKA-6683; Ensure producer state not mutated prior to append (#4755)
We were unintentionally mutating the cached queue of batches prior to appending to the log. This could have several bad consequences if the append ultimately failed or was truncated. In the reporter's case, it caused the snapshot to be invalid after a segment roll. The snapshot contained producer state at offsets higher than the snapshot offset. If we ever had to load from that snapshot, the state was left inconsistent, which led to an error that ultimately crashed the replica fetcher.

The fix required some refactoring to avoid sharing the same underlying queue inside ProducerAppendInfo. I have added test cases which reproduce the invalid snapshot state. I have also made an effort to clean up logging since it was not easy to track this problem down.

One final note: I have removed the duplicate check inside ProducerStateManager since it was both redundant and incorrect. The redundancy was in the checking of the cached batches: we already check these in Log.analyzeAndValidateProducerState. The incorrectness was the handling of sequence number overflow: we were only handling one very specific case of overflow, but others would have resulted in an invalid assertion. Instead, we now throw OutOfOrderSequenceException.

Reviewers: Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com>
2018-03-22 21:42:49 -07:00
Guozhang Wang f2fbfaaccc
KAFKA-6611: PART I, Use JMXTool in SimpleBenchmark (#4650)
1. Use JmxMixin for SimpleBenchmark (will remove the self reporting in #4744), only when loading phase is false (i.e. we are in fact starting the streams app).

2. Reported the full jmx reported metrics in log files, and in the returned data only return the max values: this is because we want to skip the warming up and cooling down periods that will have lower rate numbers, while max represents the actual rate at full speed.

3. Incorporates two other improves to JMXTool: #1241 and #2950

Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Rohan Desai <desai.p.rohan@gmail.com>
2018-03-22 16:46:56 -07:00
Bill Bejeck 286216b56e MINOR: Rolling bounce upgrade fixed broker system test (#4690)
Reviewers: Guozhang Wang <guozhang@confluent.io>, John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2018-03-22 16:02:16 -07:00
Stuart Perks 9ee00c4b66 KAFKA-6659: Improve error message if state store is not found (#4732)
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2018-03-22 15:37:28 -07:00
Rajini Sivaram 57b1c28d60
MINOR: Fix AdminClient.describeConfigs() of listener configs (#4747)
Don't return config values from `describeConfigs` if the config type cannot be determined. Obtain config types correctly for listener configs for `describeConfigs` and password encryption.

Reviewers: Jason Gustafson <jason@confluent.io>
2018-03-22 20:05:45 +00:00
Matthias J. Sax f0a29a6935
MINOR: remove obsolete warning in StreamsResetter (#4749)
Reviewer: Guozhang Wang <guozhang@confluent.io>
2018-03-21 16:24:59 -07:00
Rajini Sivaram 2f90cb86c1 MINOR: Remove acceptor creation in network thread update code (#4742)
Fix dynamic addition of network threads to only create new Processor threads and not the Acceptor.
2018-03-20 22:40:16 -07:00
Colin Patrick McCabe f5287ccad2 MINOR: Fix flaky TestUtils functions (#4743)
TestUtils#produceMessages should always close the KafkaProducer, even
when there is an exception.  Otherwise, the test will leak threads when
there is an error.

TestUtils#createNewProducer should create a producer with a
requestTimeoutMs of 30 seconds by default, not around 10 seconds.
This should avoid tests that flake when the load on Jenkins climbs.

Fix two cases where a very short timeout of 2 seconds was getting set.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2018-03-20 22:36:41 -07:00
Mickael Maison 36d4b91573 MINOR: Updated SASL Authentication Sequence Docs (#4724)
Reworded the SASL Authentication sequence to update it to >= 1.0.0

Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>, Mickael Maison <mickael.maison@gmail.com>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2018-03-20 13:56:40 +00:00
Anna Povzner 5c24295d44 Trogdor's ProducerBench does not fail if topics exists (#4673)
Added configs to ProducerBenchSpec:
topicPrefix: name of topics will be of format topicPrefix + topic index. If not provided, default is "produceBenchTopic".
partitionsPerTopic: number of partitions per topic. If not provided, default is 1.
replicationFactor: replication factor per topic. If not provided, default is 3.

The behavior of producer bench is changed such that if some or all topics already exist (with topic names = topicPrefix + topic index), and they have the same number of partitions as requested, the worker uses those topics and does not fail. The producer bench fails if one or more existing topics has number of partitions that is different from expected number of partitions.

Added unit test for WorkerUtils -- for existing methods and new methods.

Fixed bug in MockAdminClient, where createTopics() would over-write existing topic's replication factor and number of partitions while correctly completing the appropriate futures exceptionally with TopicExistsException.

Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2018-03-20 13:51:45 +00:00
Manikumar Reddy O aefe35e493 KAFKA-6680: Fix issues related to Dynamic Broker configs (#4731)
- Fix kafkaConfig initialization if there are no dynamic configs defined in ZK.
- Update DynamicListenerConfig.validateReconfiguration() to check new Listeners must be subset of listener map

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2018-03-19 22:22:53 +00:00
Guozhang Wang 0f364cd53a
MINOR: Pass a streams config to replace the single state dir (#4714)
This is a general change and is re-requisite to allow streams benchmark test with different streams tests. For the streams benchmark itself I will have a separate PR for switching configs. Details:

1. Create a "streams.properties" file under PERSISTENT_ROOT before all the streams test. For now it will only contain a single config of state.dir pointing to PERSISTENT_ROOT.

2. For all the system test related code, replace the main function parameter of state.dir with propsFilename, then inside the function load the props from the file and apply overrides if necessary.

3. Minor fixes.

Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
2018-03-19 14:17:00 -07:00
Colin Patrick McCabe 27bb3ccace MINOR: KafkaFutureImpl#addWaiter should be protected (#4734)
KafkaFutureImpl#addWaiter should be protected, just like KafkaFuture#addWaiter.  As described in KIP-218, whenComplete is the public API, not addWaiter.
2018-03-19 13:30:13 -07:00
Ismael Juma 6fab286da2 MINOR: Fix some compiler warnings (#4726) 2018-03-19 15:03:58 +00:00
Colin Patrick McCabe 0560193706 MINOR: improve trogdor commandline (#4721)
Allow -c as a synonym for --agent.config and --coordinator.config.

Allow -n as a synonym for --node-name.

Add an example trogdor.conf file.
2018-03-19 11:55:29 +00:00
Jason Gustafson 7041e76bd6 MINOR: Some logging improvements for debugging delayed produce status (#4691)
A few small logging improvements which help debugging replication issues.
2018-03-19 11:08:12 +00:00
Ewen Cheslack-Postava f264bfa296 KAFKA-6676: Ensure Kafka chroot exists in system tests and use chroot on one test with security parameterizations (#4729)
Ensures Kafka chroot exists in ZK when starting KafkaService so commands that use ZK and are executed before the first Kafka broker starts do not fail due to the missing chroot.

Also uses chroot with one test that also has security parameterizations so Kafka's test suite exercises these combinations. Previously no tests were exercising chroots.

Changes were validated using sanity_checks which include the chroot-ed test as well as some non-chroot-ed tests.
2018-03-19 10:37:02 +00:00
asutosh936 02a8ef8595 KAFKA-6486: Implemented LinkedHashMap in TimeWindows (#4628)
Reviewers: Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2018-03-18 20:44:11 -07:00
Dong Lin 4391a4214d MINOR: Use log start offset as high watermark if current value is out of range (#4722)
Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
2018-03-18 11:21:43 -07:00
Matthias J. Sax 7fe06a8666
MINOR: fix flaky Streams EOS system test (#4728)
Reviewer: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
2018-03-17 18:02:20 -07:00
Dhruvil Shah ae31ee63dc KAFKA-6530: Use actual first offset of message set when rolling log segment (#4660)
Use the exact first offset of message set when rolling log segment. This is possible to do for message format V2 and beyond without any performance penalty, because we have the first offset stored in the header. This augments the fix made in KAFKA-4451 to avoid using the heuristic for V2 and beyond messages.

Added unit tests to simulate cases where segment needs to roll because of overflow in index offsets. Verified that the new segment created in these cases uses the first offset, instead of the heuristic in use previously.
2018-03-17 10:29:42 -07:00
Sandor Murakozi 2afac71566 MINOR: Remove unnecessary null checks (#4708)
Remove unnecessary null check in StringDeserializer, MockProducerInterceptor and KStreamImpl.

Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Jason Gustafson <jason@confluent.io>
2018-03-16 23:34:16 -07:00
Jason Gustafson 9782465d6f
KAFKA-6672; ConfigCommand should create config change parent path if needed (#4727)
Change `KafkaZkClient.createConfigChangeNotification` to ensure creation of the change directory. This fixes failing system tests which depend on setting SCRAM credentials prior to broker startup. Existing test case has been modified for new expected usage.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2018-03-16 22:29:22 -07:00
Colin Patrick McCabe a70e4f95d7 KAFKA-6658; Fix RoundTripWorkload and make k/v generation configurable (#4710)
Make PayloadGenerator an interface which can have multiple implementations: constant, uniform random, sequential.

Allow different payload generators to be used for keys and values.

This change fixes RoundTripWorkload.  Previously RoundTripWorkload was unable to get the sequence number of the keys that it produced.
2018-03-16 16:15:49 -07:00
Matthias J. Sax 394aa74261
KAFKA-6454: Allow timestamp manipulation in Processor API (#4519)
Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2018-03-16 16:02:11 -07:00
Chris Egerton b6ae51b108 Allow users to specify 'if-not-exists' when creating topics while testing (#4715)
Author: Chris Egerton <chrise@confluent.io>
2018-03-16 15:30:38 -07:00
huxi a9c16a8110 KAFKA-6663: Doc for `GlobalKTable` should be corrected. (#4723)
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Damian Guy <damian@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2018-03-16 13:24:14 -07:00
Colin Patrick McCabe 9e0e6e43a7 MINOR: Trogdor should not assume an agent co-located with the controller (#4712) 2018-03-16 17:57:38 +00:00
Randall Hauch e7ef719a5b KAFKA-6661: Ensure sink connectors don’t resume consumer when task is paused
Changed WorkerSinkTaskContext to only resume the consumer topic partitions when the connector/task is not in the paused state.

The context tracks the set of topic partitions that are explicitly paused/resumed by the connector, and when the WorkerSinkTask resumes the tasks it currently resumes all topic partitions *except* those that are still explicitly paused in the context. Therefore, the change above should result in the desired behavior.

Several debug statements were added to record when the context is called by the connector.

This can be backported to older releases, since this bug goes back to 0.10 or 0.9.

Author: Randall Hauch <rhauch@gmail.com>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #4716 from rhauch/kafka-6661
2018-03-15 15:52:53 -07:00
John Roesler 7006d0f58b MINOR: Streams system tests fixes/updates (#4689)
Some changes required to get the Streams system tests working via Docker

To test:

TC_PATHS="tests/kafkatest/tests/streams" bash tests/docker/run_tests.sh

That command will take about 3.5 hours, and should pass. Note there are a couple of ignored tests.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>
2018-03-15 14:42:43 -07:00
Kamal C a6fad27372 KAFKA-6106: Postpone normal processing of tasks within a thread until restoration of all tasks have completed. (#4651)
Author:  Kamal Chandraprakash <kamal.chandraprakash@gmail.com>

Reviewer: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
2018-03-15 13:02:28 -07:00
huxi d2aca95aeb MINOR: Fix incorrect expression for KTable in stream doc. (#4718)
In dsi-api.html, when reading from Kafka to a KTable, the doc says "In the case of a KStream.." where `Kstream` here is not correct. They should be `KTable`.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2018-03-15 11:34:33 -07:00
Jason Gustafson 372b4c6a77
KAFKA-6656; Config tool should return non-zero status code on failure (#4711)
Prior to this patch, we caught some exceptions when executing the command, which meant that it would return with status code zero. This patch fixes this and makes the expected exit behavior explicit. Test cases have been added to verify the change.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2018-03-15 10:33:29 -07:00
Colin Patrick McCabe 8c10e06007 MINOR: Avoid nulls when deserializing Trogodor JSON (#4688) 2018-03-15 11:44:27 +00:00
Rajini Sivaram 5cdb951091 KAFKA-6653; Complete delayed operations even when there is lock contention (#4704)
If there is lock contention while multiple threads check if a delayed operation may be completed (e.g. a produce request with acks=-1), the threads perform completion only if the lock is free, to avoid deadlocks. This leaves a timing window when an operation becomes ready to complete after another thread has acquired the lock and performed the check for completion, but not yet released the lock. The PR adds an additional flag to ensure that the operation is completed in this case.
2018-03-15 00:09:48 -07:00
Guozhang Wang 925acc9f47
KAFKA-4831: add documentation for KIP-265 (#4686)
Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>
2018-03-14 17:33:54 -07:00
Edoardo Comar 35c08189cb MINOR: fixing streams test-util compilation errors in Eclipse (#4631)
Author: Edoardo Comar <ecomar@uk.ibm.com>

Reviewer: Matthias J. Sax <matthias@confluent.io>
2018-03-14 15:19:51 -07:00
Dong Lin d935699486 KAFKA-6640; Improve efficiency of KafkaAdminClient.describeTopics() (#4694)
Currently in KafkaAdminClient.describeTopics(), for each topic in the request, a complete map of cluster and errors will be constructed for every topic and partition. This unnecessarily increases the complexity of describeTopics() to O(n^2). This patch improves the complexity to O(n).

Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin Patrick McCabe <colin@cmccabe.xyz>, Jason Gustafson <jason@confluent.io>
2018-03-14 14:58:24 -07:00