Commit Graph

4381 Commits

Author SHA1 Message Date
Colin P. Mccabe 4fac83ba1f KAFKA-6060; Add workload generation capabilities to Trogdor
Previously, Trogdor only handled "Faults."  Now, Trogdor can handle
"Tasks" which may be either faults, or workloads to execute in the
background.

The Agent and Coordinator have been refactored from a
mutexes-and-condition-variables paradigm into a message passing
paradigm.  No locks are necessary, because only one thread can access
the task state or worker state.  This makes them a lot easier to reason
about.

The MockTime class can now handle mocking deferred message passing
(adding a message to an ExecutorService with a delay).  I added a
MockTimeTest.

MiniTrogdorCluster now starts up Agent and Coordinator classes in
paralle in order to minimize junit test time.

RPC messages now inherit from a common Message.java class.  This class
handles implementing serialization, equals, hashCode, etc.

Remove FaultSet, since it is no longer necessary.

Previously, if CoordinatorClient or AgentClient hit a networking
problem, they would throw an exception.  They now retry several times
before giving up.  Additionally, the REST RPCs to the Coordinator and
Agent have been changed to be idempotent.  If a response is lost, and
the request is resent, no harm will be done.

Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>

Closes #4073 from cmccabe/KAFKA-6060
2017-11-03 09:37:29 +00:00
Bill Bejeck e4208b1d5f MINOR: update producer client request timeout in system test
Author: Bill Bejeck <bill@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #4168 from bbejeck/MINOR_update_streams_produer_timeout_in_system_test
2017-11-02 17:53:15 -07:00
Guozhang Wang b6765e46c8 MINOR: Change version format in release notes python code
ijuma ewencp

Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #4171 from guozhangwang/KMinor-update-releasepy
2017-11-02 17:50:13 -07:00
Ismael Juma 554e0b5298 MINOR: Remove clients/out directory
It was committed inadvertently.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #4172 from ijuma/remove-out-folder
2017-11-02 22:41:29 +00:00
Manikumar Reddy f88fdbd311 KAFKA-6072; User ZookeeperClient in GroupCoordinator and TransactionCoordinator
Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Ted Yu <yuzhihong@gmail.com>, Jun Rao <junrao@gmail.com>

Closes #4126 from omkreddy/KAFKA-6072-ZK-IN-GRoupCoordinator
2017-10-31 18:06:51 -07:00
Manjula K 5178702715 MINOR: Adding Trivago logo to Streams landing page
guozhangwang Please review

Author: Manjula K <manjula@kafka-summit.org>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #4164 from manjuapu/ny-trivago-logos
2017-10-31 13:22:56 -07:00
Apurva Mehta 3c9e30a2f7 MINOR: Tighten up locking when aborting expired transactions
This is a followup to #4137

Author: Apurva Mehta <apurva@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>

Closes #4146 from apurvam/MINOR-followups-to-bump-epoch-on-expire-patch
2017-10-31 09:57:05 -07:00
Jason Gustafson 71fe23b445 MINOR: Fix inconsistency in StopReplica/LeaderAndIsr error counts
Author: Jason Gustafson <jason@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #4147 from hachikuji/fix-error-inconsistencies
2017-10-31 09:43:18 -07:00
Matthias J. Sax c7ab3efcbe MINOR: Code cleanup and JavaDoc improvements for clients and Streams
Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Bill Bejeck <bill@confluent.io>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

Closes #4128 from mjsax/minor-cleanup

minor fix
2017-10-30 13:13:19 -07:00
Richard Yu 7fe88e8bd9 KAFKA-5212; Consumer ListOffsets request can starve group heartbeats
Author: Richard Yu <richardyu@Richards-Air.attlocal.net>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes #4110 from ConcurrencyPractitioner/trunk
2017-10-30 11:31:36 -07:00
Ismael Juma 8e4b3dca7b KAFKA-2903; FileRecords.read doesn't handle size > sizeInBytes when start is not zero
Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes #4158 from ijuma/kafka-2903-file-records-read-slice-size-greater
2017-10-30 11:06:11 -07:00
Tom Bentley 6118ecb590 KAFKA-6130; Ensure VerifiableConsumer halts when --max-messages is reached
Author: Tom Bentley <tbentley@redhat.com>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes #4157 from tombentley/KAFKA-6130-verifiable-consumer-max-messages
2017-10-30 09:57:06 -07:00
Mickael Maison 9504af72ff KAFKA-6073; Use ZookeeperClient in KafkaApis
I kept zkUtils for the call to AdminUtils.createTopic(). AdminUtils can be done in another PR.

Is there a reason why we use TopicAndPartition instead of TopicPartition in KafkaControllerZkUtils ?

Author: Mickael Maison <mickael.maison@gmail.com>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>

Closes #4111 from mimaison/KAFKA-6073
2017-10-30 09:46:11 -07:00
Ismael Juma f4e9c84c52 MINOR: Remove TLS renegotiation code
This has been disabled since the start and since
it's removed in TLS 1.3, there are no plans to
ever support it.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #4034 from ijuma/remove-tls-renegotiation-support
2017-10-27 16:40:39 +01:00
Ismael Juma 4e8ad90b94 MINOR: Ensure that the producer in testAlterReplicaLogDirs is always closed
Failure to close the producer could cause a transient failure, more details
below.

The request timeout was only 2 seconds, exceptions thrown were not
propagated and the producer would not be closed. If the exception
was thrown during `send`, we did not increment `numMessages`
allowing the test to pass.

I have increased the timeout to 10 seconds and made sure that
exceptions are propagated.

Example of the error:

```text
kafka.api.SaslSslAdminClientIntegrationTest > classMethod STARTED

kafka.api.SaslSslAdminClientIntegrationTest > classMethod FAILED
    java.lang.AssertionError: Found unexpected threads, allThreads=Set(metrics-meter-tick-thread-2, Signal Dispatcher, main, Reference Handler, scala-execution-context-global-164, kafka-producer-network-thread | producer-1, scala-execution-context-global-166, Test worker, scala-execution-context-global-1249, /0:0:0:0:0:0:0:1:58910 to /0:0:0:0:0:0:0:1:43025 workers Thread 2, Finalizer, /0:0:0:0:0:0:0:1:58910 to /0:0:0:0:0:0:0:1:43025 workers Thread 3, scala-execution-context-global-163, metrics-meter-tick-thread-1)
```

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #4144 from ijuma/ensure-producer-is-closed-test-alter-replica-log-dirs
2017-10-27 15:49:42 +01:00
Manikumar Reddy 603d4e5d9c MINOR: Mention "per listener" security overrides in listener.security.protocol.map config doc
Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Tom Bentley <tbentley@redhat.com>, Ismael Juma <ismael@juma.me.uk>

Closes #3951 from omkreddy/KIP-103-DOCS
2017-10-27 14:59:56 +01:00
Manikumar Reddy 9b53c31d08 MINOR: Update docs wrt topic deletion being enabled by default
Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #3835 from omkreddy/update-delete-topic-doc
2017-10-27 14:41:46 +01:00
Paolo Patierno 4ea5fb760f MINOR: Document "high watermark" magic value for delete records request
Author: Paolo Patierno <ppatierno@live.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #4119 from ppatierno/minor-delrecords-prot
2017-10-27 14:26:58 +01:00
Vahid Hashemian 6197481b8b MINOR: Fix indentation in KafkaApis.handleOffsetFetchRequest
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #4139 from vahidhashemian/minor/indentation_fix_1710
2017-10-27 14:10:56 +01:00
Apurva Mehta 501a5e2627 KAFKA-6119: Bump epoch when expiring transactions in the TransactionCoordinator
A description of the problem is in the JIRA. I have added an integration test which reproduces the original scenario, and also added unit test cases.

Author: Apurva Mehta <apurva@confluent.io>

Reviewers: Jason Gustafson <jason@confluent.io>, Ted Yu <yuzhihong@gmail.com>, Guozhang Wang <wangguoz@gmail.com>

Closes #4137 from apurvam/KAFKA-6119-bump-epoch-when-expiring-transactions
2017-10-26 23:26:33 -07:00
Rajini Sivaram 69e8463c06 KAFKA-6131; Use atomic putIfAbsent to create txn marker queues
Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>

Closes #4140 from rajinisivaram/KAFKA-6131-txn-concurrentmap
2017-10-27 02:56:19 +01:00
Vahid Hashemian ebe749b4ee MINOR: Improve a Windows quickstart instruction
The output of `wmic` can be very long and could truncate the search keywords in the existing command. If those keywords are truncated no process is returned in the output. An update is suggested to the command by which the query is performed inside the `wmic` command itself instead of using pipes and `find`.

Author: Vahid Hashemian <vahidhashemian@us.ibm.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #4083 from vahidhashemian/minor/improve_quickstart_for_windows_wmic
2017-10-26 13:46:11 -07:00
Guozhang Wang ea6a67af70 KAFKA-6100: Down-grade RocksDB to 5.7.3
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>

Closes #4136 from guozhangwang/K6100-rocksdb-580-regression
2017-10-26 13:27:08 -07:00
Manikumar Reddy adb9d5ae76 MINOR: Add missing semicolon to example jaas configuration
Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>

Closes #4101 from omkreddy/SCRAM-DOCS
2017-10-26 14:24:29 +01:00
Ismael Juma 070ec0fc58 MINOR: Revert EmbeddedZooKeeper rename
Even though this class is internal, it's widely
used by other projects and it's better to avoid
breaking them until we have a publicly supported
test library.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #4138 from ijuma/revert-embedded-zookeeper-rename
2017-10-26 14:23:00 +01:00
Ismael Juma ab6f848ba6 MINOR: Rename and change package of async ZooKeeper classes
- kafka.controller.ZookeeperClient -> kafka.zookeeper.ZooKeeperClient
- kafka.controller.ControllerZkUtils -> kafka.zk.KafkaZkClient
- kafka.controller.ZkData -> kafka.zk.ZkData
- Renamed various fields to match new names and for consistency
- A few clean-ups in ZkData
- Document intent

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Onur Karaman <okaraman@linkedin.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Jun Rao <junrao@gmail.com>

Closes #4112 from ijuma/rename-zookeeper-client-and-move-to-zookeper-package
2017-10-25 21:11:16 -07:00
Xavier Léauté f7f8e11213 MINOR: reset state in cleanup, fixes jmx mixin flakiness
ewencp ijuma

Author: Xavier Léauté <xl+github@xvrl.net>
Author: Ewen Cheslack-Postava <me@ewencp.org>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #4123 from xvrl/fix-jmx-flakiness

(cherry picked from commit 91eb178e95)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
2017-10-25 13:34:36 -07:00
Joel Hamill 5f779ca3f3 MINOR: Fix typo dev guide title
related to https://github.com/apache/kafka-site/pull/103

Author: Joel Hamill <git config --global user.email>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #4133 from joel-hamill/dev-guide-title
2017-10-25 10:11:09 -07:00
Jeff Widman c504b22841 MINOR: Fix typo in ConsumerCoordinator comment
Author: Jeff Widman <jeff@jeffwidman.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #4032 from jeffwidman/patch-3
2017-10-24 01:13:06 +01:00
Guozhang Wang b281015c15 HOTFIX: Poll with zero milliseconds during restoration phase
1. After the poll call, re-check if the state has been changed or not; if yes, initialize the tasks again.
2. Minor log4j improvements.

Author: Guozhang Wang <wangguoz@gmail.com>
Author: Damian Guy <damian.guy@gmail.com>
Author: Jason Gustafson <jason@confluent.io>
Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Ted Yu <yuzhihong@gmail.com>

Closes #4096 from guozhangwang/KHotfix-restore-only
2017-10-23 16:12:54 -07:00
Guozhang Wang d3f24798f9 KAFKA-5140: Fix reset integration test
A couple of root causes of this flaky test is fixed:

1. The MockTime was incorrectly used across multiple test methods within the class, as a class rule. Instead we set it on each test case; also remove the scala MockTime dependency.

2. List topics may not contain the deleted topics while their ZK paths are yet to be deleted; so the delete-check-recreate pattern may fail to successfully recreate the topic at all. Change the checking to read from zk path directly instead.

Another minor fix is to remove the misleading wait condition error message as the accumData is always empty.

Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>

Closes #4095 from guozhangwang/KMinor-reset-integration-test
2017-10-23 12:35:31 -07:00
Colin P. Mccabe a3c4ab2427 KAFKA-6070; add ipaddress and enum34 dependencies to docker image
Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>, Alex Ayars <alex.ayars@confluent.io>

Closes #4084 from cmccabe/KAFKA-6070
2017-10-23 16:38:39 +01:00
tedyu 277fc927c0 KAFKA-6101; Reconnecting to broker does not exponentially backoff
Author: tedyu <yuzhihong@gmail.com>
Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Soenke Liebau <soenke.liebau@opencore.com>, Ismael Juma <ismael@juma.me.uk>

Closes #4118 from tedyu/trunk
2017-10-23 15:52:30 +01:00
Ismael Juma 6fc5732259 MINOR: Configure owasp.dependencycheck gradle plugin
It seems to output a few false positives, but still
worth verifying.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #4117 from ijuma/dependency-check
2017-10-23 14:49:01 +01:00
Soenke Liebau 021f2e7e24 KAFKA-6104; Added unit tests for ClusterConnectionStates
Author: Soenke Liebau <soenke.liebau@opencore.com>

Reviewers: Ted Yu <yuzhihong@gmail.com>, Ismael Juma <ismael@juma.me.uk>

Closes #4113 from soenkeliebau/KAFKA-6104
2017-10-23 12:07:58 +01:00
Ismael Juma 580390b78c MINOR: Update Scala to 2.12.4
Mainly for Java 9 fixes and improved compilation times (5-10% reduction):

http://www.scala-lang.org/news/2.12.4

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #4102 from ijuma/update-scala-version
2017-10-23 11:39:39 +01:00
Matthias J. Sax c216adb4bb MINOR: add hint for setting an uncaught exception handler to JavaDocs
Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>

Closes #4104 from mjsax/minor-uncaught-exception-handler
2017-10-23 10:33:51 +01:00
Soenke Liebau 86cd558b33 MINOR: Changed visibility of methods in ClusterConnectionStates to private
The methods resetReconnectBackoff and updateReconnectBackoff in ClusterConnectionStates both take an instance of a private inner class as parameter and thus cannot be called from outside the class anyway.

Author: Soenke Liebau <soenke.liebau@opencore.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #4114 from soenkeliebau/MINOR_private
2017-10-23 00:54:44 +01:00
Ismael Juma b1cd6c5301 MINOR: Controller and async ZookeeperClient improvements
* Fix issue in `retryRequestsUntilConnected` where the same response
could appear multiple times (implies that we are lacking test coverage)
* Introduce type member in AsyncRequest for the AsyncResponse
type and refactor the code to eliminate most downcasts
* Remove a number of unnecessary collection copies in
`retryRequestsUntilConnected`
* Move ControllerContext to its own file
* Rename getACL/setACL to getAcl/setAcl to match Kafka naming
convention
* Replace tuple of 3 elements with case class in one place (we
should do this in other places too)
* Extract `send` and `shouldWatch` from
`ZooKeeperClient.handleRequests`
* Use pattern matching instead of if/else chains in a few places (we
should do it in more places)
* A couple of renames to avoid overloads and hence benefit from
better type inference
* Use Option and default arguments instead of passing null in
some places
* `Expired` is no longer a case class since it has no parameters,
but it has state
* Various minor clean-ups

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Jun Rao <junrao@gmail.com>, Onur Karaman <okaraman@linkedin.com>

Closes #4088 from ijuma/async-zkclient-cleanups
2017-10-22 08:44:08 +01:00
Rajini Sivaram efefb452df KAFKA-6042: Avoid deadlock between two groups with delayed operations
Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

Closes #4103 from rajinisivaram/KAFKA-6042-group-deadlock

(cherry picked from commit 5ee157126d)
Signed-off-by: Guozhang Wang <wangguoz@gmail.com>
2017-10-21 20:18:20 -07:00
Rajini Sivaram 9be71f7bdc MINOR: Use ObjectName.quote instead of URL-encoding for JMX metric tags
Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #4099 from rajinisivaram/1.0

(cherry picked from commit 51bb83d0dc)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
2017-10-19 19:37:52 -07:00
Manikumar Reddy fdbd4d62f3 KAFKA-6071; Use ZookeeperClient in LogManager
Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Jun Rao <junrao@gmail.com>

Closes #4089 from omkreddy/KAFKA-6071-ZK-LOGMANAGER
2017-10-19 18:48:54 -07:00
Konstantine Karantasis 5ec6765bdb KAFKA-6087: Scanning plugin.path needs to support relative symlinks.
Author: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #4092 from kkonstantine/KAFKA-6087-Scanning-plugin.path-needs-to-support-relative-symlinks
2017-10-19 14:24:57 -07:00
Ismael Juma 561dd3864d MINOR: Update docs with regards to max.in.flight and idempotent producer
The idempotent producer doesn't change that setting any more and the
accepted range has changed.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Apurva Mehta <apurva@confluent.io>, Jason Gustafson <jason@confluent.io>

Closes #4097 from ijuma/fix-javadoc-wrt-max-in-flight-for-idempotent
2017-10-19 13:11:51 -07:00
Magnus Edenhill 60c36b0984 MINOR: Fix var typo in verifiable_consumer assertion
Author: Magnus Edenhill <magnus@edenhill.se>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes #4098 from edenhill/verfcons_var_fix
2017-10-19 08:42:12 -07:00
Tommy Becker 249e398bf8 KAFKA-6069: Properly tag KafkaStreams metrics with the client id.
Author: Tommy Becker <tobecker@tivo.com>

Reviewers: Bill Bejeck <bill@confluent.io>, Damian Guy <damian.guy@gmail.com>

Closes #4081 from twbecker/KAFKA-6069
2017-10-19 15:40:26 +01:00
Hugo Louro 7fdafda979 MINOR: Correct KafkaProducer Javadoc spelling of property 'max.in.flight.requests.per.connection'
Currently, in branches _trunk_, _0.11.0_, and _1.0_ the property **max.in.flight.requests.per.connection** is incorrectly misspelled as _max.inflight.requests.per.connection_

harshach ijuma guozhangwang can you please review. Thank you.

Author: Hugo Louro <hmclouro@gmail.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #4094 from hmcl/trunk_MINOR_Doc_InflightProp
2017-10-19 10:35:18 +01:00
Ismael Juma 6cb649b56b MINOR: Remove dead code
Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #4087 from ijuma/remove-dead-code
2017-10-18 22:15:22 +01:00
Maytee Chinavanichkit 5c1a85caa0 KAFKA-6051; Close the ReplicaFetcherBlockingSend earlier on shutdown
Rearranged the testAddPartitionDuringDeleteTopic() test to keep the
likelyhood of the race condition.

Author: Maytee Chinavanichkit <maytee.chinavanichkit@linecorp.com>

Reviewers: Jun Rao <junrao@gmail.com>

Closes #4056 from mayt/KAFKA-6051
2017-10-18 09:59:13 -07:00
Onur Karaman b71ee043f8 KAFKA-5642; Use async ZookeeperClient in Controller
Kafka today uses ZkClient, a wrapper client around the raw Zookeeper client. This library only exposes synchronous apis to the user. Synchronous apis mean we must wait an entire round trip before doing the next operation.

This becomes problematic with partition-heavy clusters, as we find the controller spending a significant amount of time just sending many sequential reads and writes to zookeeper at the per-partition granularity. This especially becomes an issue during:
- controller failover, where the newly elected controller effectively reads all zookeeper state.
- broker failures and controlled shutdown. The controller tries to elect a new leader for partitions previously led by the broker. The controller also removes the broker from isr on partitions for which the broker was a follower. These all incur partition-granular reads and writes to zookeeper.

As a first step in addressing these issues, we built a low-level wrapper client called ZookeeperClient in KAFKA-5501 that encourages pipelined, asynchronous apis.

This patch converts the controller to use the async ZookeeperClient to improve controller failover, broker failure handling, and controlled shutdown times.

Some notable changes made in this patch:
- All ControllerEvents now defer access to zookeeper at processing time instead of enqueue time as was intended with the single-threaded event queue model patch from KAFKA-5028. This results in a fresh view of the zookeeper state by the time we process the event. This reverts the hacks from KAFKA-5502 and KAFKA-5879.
- We refactored PartitionStateMachine and ReplicaStateMachine to process multiple partitions and replicas in batch rather than one-at-a-time so that we can send a batch of requests over to ZookeeperClient to pipeline.
- We've decouple ZookeeperClient handler registration from watcher registration. Previously, these two were coupled, which meant handler registrations actually sent out a request to the zookeeper ensemble to do the actual watcher registration. In KafkaController.onControllerFailover, we register partition modification handlers (thereby registering watchers) and additionally lookup the partition assignments for every topic in the cluster. We can shave a bit of time off failover if we merge these two operations. We can do this by decoupling ZookeeperClient handler registration from watcher registration. This means ZookeeperClient's registration apis have been changed so that they are purely in-memory operations, and they only take effect when the client sends ExistsRequest, GetDataRequest, or GetChildrenRequest.
- We've simplified the logic for updating LeaderAndIsr such that if we get a BADVERSION error code, the controller will now just retry in the next round by reading the new state and trying the update again. This simplifies logic when updating the partition leader epoch, removing replicas from isr, and electing leaders for partitions.
- We've implemented KAFKA-5083: always leave the last surviving member of the ISR in ZK. This means that if people re-disabled unclean leader election, we can still try to elect the leader from the last in-sync replica.
- ZookeeperClient's handlers have been changed so that their methods default to no-ops for convenience.
- All znode paths and definitions for znode encoding and decoding have been consolidated as static methods in ZkData.scala.
- The partition leader election algorithms have been refactored as pure functions so that they can be easily unit tested.
- PartitionStateMachine and ReplicaStateMachine now have unit tests.

Author: Onur Karaman <okaraman@linkedin.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>

Closes #3765 from onurkaraman/KAFKA-5642
2017-10-18 09:14:59 -07:00