Added plugin isolation unit tests for various scenarios, with a `TestPlugins` class that compiles and builds multiple test plugins without them being on the classpath and verifies that the Plugins and DelegatingClassLoader behave properly. These initially failed for several cases, but now pass since the issues have been fixed.
KAFKA-8340 and KAFKA-8819 are closely related, and this fix corrects the problems reported in both issues.
Author: Greg Harris <gregh@confluent.io>
Reviewers: Chris Egerton <chrise@confluent.io>, Magesh Nandakumar <mageshn@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
Fix bug in Connect REST extension API caused by invalid constructor parameter validation, and update integration test to play nicely with Jenkins
Fix instantiation of TaskState objects by Connect framework.
Author: Chris Egerton <chrise@confluent.io>
Reviewers: Magesh Nandakumar <mageshn@confluent.io>, Randall Hauch <rhauch@gmail.com>
Since generation is private in AbstractCoordinator, I need to modify the generation() to let it return the object directly.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>
Implementation of [KIP-382 "MirrorMaker 2.0"](https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0)
Author: Ryanne Dolan <ryannedolan@gmail.com>
Author: Arun Mathew <arunmathew88@gmail.com>
Author: In Park <inpark@cloudera.com>
Author: Andre Price <obsoleted@users.noreply.github.com>
Author: christian.hagel@rio.cloud <christian.hagel@rio.cloud>
Reviewers: Eno Thereska <eno.thereska@gmail.com>, William Hammond <william.t.hammond@gmail.com>, Viktor Somogyi <viktorsomogyi@gmail.com>, Jakub Korzeniowski, Tim Carey-Smith, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Arun Mathew, Jeremy-l-ford, vpernin, Oleg Kasian <oleg.kasian@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Qihong Chen, Sriharsha Chintalapani <sriharsha@apache.org>, Jun Rao <junrao@gmail.com>, Randall Hauch <rhauch@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#6295 from ryannedolan/KIP-382
Trim whitespaces in topic names specified in sink connector configs before subscribing to the consumer. Topic names don't allow whitespace characters, so trimming only will eliminate potential problems and will not place additional limits on topics specified in sink connectors.
Author: Magesh Nandakumar <magesh.n.kumar@gmail.com>
Reviewers: Arjun Satish <arjun@confluent.io>, Randall Hauch <rhauch@gmail.com>
Adds support for the Connect Cast transforms to cast from Connect logical types, such as DATE, TIME, TIMESTAMP, and DECIMAL. Casting to numeric types will produce the underlying numeric value represented in the desired type. For logical types represented by underlying Java Date class, this means the milliseconds since EPOCH. For Decimal, this means the underlying value. If the value does not fit in the desired target type, it may overflow.
Casting to String from Date, Time, and Timestamp types will produce their ISO 8601 representation. Casting to String from Decimal will result in the value represented as a string. e.g. 1234 -> "1234".
Author: Nigel Liang <nigel@nigelliang.com>
Reviewer: Randall Hauch <rhauch@gmail.com>
Implemented KIP-475 to add new metrics for each worker to expose the number of current tasks per connector and per status.
Author: Cyrus Vafadari <cyrus@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>, Boyang Chen <Boyang Chen <boyang@confluent.io>
Implemented KIP-507 to secure the internal Connect REST endpoints that are only for intra-cluster communication. A new V2 of the Connect subprotocol enables this feature, where the leader generates a new session key, shares it with the other workers via the configuration topic, and workers send and validate requests to these internal endpoints using the shared key.
Currently the internal `POST /connectors/<connector>/tasks` endpoint is the only one that is secured.
This change adds unit tests and makes some small alterations to system tests to target the new `sessioned` Connect subprotocol. A new integration test ensures that the endpoint is actually secured (i.e., requests with missing/invalid signatures are rejected with a 400 BAD RESPONSE status).
Author: Chris Egerton <chrise@confluent.io>
Reviewed: Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
Implemented KIP-495 to expose a new `admin/loggers` endpoint for the Connect REST API that lists the current log levels and allows the caller to change log levels.
Author: Arjun Satish <arjun@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
Implemented KIP-481 by adding support for deserializing Connect DECIMAL values encoded in JSON as numbers, in addition to raw byte array (base64) format used previously.
Author: Almog Gavra <almog@confluent.io>
Reviewers: Chris Egerton <chrise@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
As noted in the KIP-467, the updated ProduceResponse is
```
Produce Response (Version: 8) => [responses] throttle_time_ms
responses => topic [partition_responses]
topic => STRING
partition_responses => partition error_code base_offset log_append_time log_start_offset
partition => INT32
error_code => INT16
base_offset => INT64
log_append_time => INT64
log_start_offset => INT64
error_records => [INT32] // new field, encodes the relative offset of the records that caused error
error_message => STRING // new field, encodes the error message that client can use to log itself
throttle_time_ms => INT32
with a new error code:
```
INVALID_RECORD(86, "Some record has failed the validation on broker and hence be rejected.", InvalidRecordException::new);
Reviewers: Jason Gustafson <jason@confluent.io>, Magnus Edenhill <magnus@edenhill.se>, Guozhang Wang <wangguoz@gmail.com>
Implemented KIP-440 to allow Connect converters to use record headers when serializing or deserializing keys and values. This change is backward compatible in that the new methods default to calling the older existing methods, so existing Converter implementations need not be changed. This changes the WorkerSinkTask and WorkerSourceTask to use the new converter methods, but Connect's existing Converter implementations and the use of converters for internal topics are intentionally not modified. Added unit tests.
Author: Yaroslav Tkachenko <sapiensy@gmail.com>
Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Ewen Cheslack-Postava <me@ewencp.org>, Randall Hauch <rhauch@gmail.com>
Replace the `<table>` elements by `<ul>` so the full page width can be used for the configuration descriptions instead of only a very narrow column. I moved the other fields (Type, Default Value, etc) below each entry.
Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>
Added unit test for recent fix of `KafkaConfigBackingStore`.
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
Corrected the `KafkaConfigBackingStore` logic to notify of only the changed tasks, rather than all tasks. This was not noticed before because Connect always stopped and restarted all tasks during a rebalanced, but since 2.3 the incremental rebalance logic exposed this bug.
Author: Luying Liu <lyliu@lyliu-mac.freewheelmedia.net>
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
I've spent quite a bit of time on trying to discover the root cause, with no luck so far. I have been able to reproduce it locally by running the following 100 times:
```
./gradlew connect:runtime:clean connect:runtime:test --tests org.apache.kafka.connect.integration.RebalanceSourceConnectorsIntegrationTest
```
The `testReconfigConnector` test failed 28% of the time and the others failed 0%. This issue and KAFKA-8661 suggest that `testDeleteConnector` and `testStartTwoConnectors` are also flaky, though I've not seen those tests fail locally.
Because this flakiness is causing issues for the rest of the project, I'm going to temporarily ignore several of the flaky ITs while I continue to investigate:
* `RebalanceSourceConnectorsIntegrationTest.testReconfigConnector`
* `RebalanceSourceConnectorsIntegrationTest.testDeleteConnector`
* `RebalanceSourceConnectorsIntegrationTest.testStartTwoConnectors`
**This should be backported to the `2.3` branch, which is when these integration tests were first added.**
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Ismael Juma
Closes#7237 from rhauch/kafka-8391-temporary
Changed Connect's `WorkerSourceTask` to capture non-retriable exceptions from the `producer.send(...)` (e.g., authentication or authorization errors) and to fail the connector task when such an error is encountered. Modified the existing unit tests to verify this functionality.
Note that most producer errors are retriable, and Connect will (by default) set up each producer with 1 max in-flight message and infinite retries. This change only affects non-retriable errors.
Added the ability for the connector handles and task handles, which are used by the monitorable source and sink connectors used to verify the functionality of the Connect framework, to record the number of times the connector and tasks have each been started, and to allow a test to obtain a `RestartLatch` that can be used to block until the connectors and/or tasks have been restarted a specified number of types.
Typically, a test will get the `ConnectorHandle` for a connector, and call the `ConnectorHandle.expectedRestarts(int)` method with the expected number of times that the connector and/or tasks will be restarted, and will hold onto the resulting `RestartLatch`. The test will then change the connector (or otherwise cause the connector to restart) one or more times as desired, and then call `RestartLatch.await(long, TimeUnit)` to block the test up to a specified duration for the connector and all tasks to be started the specified number of times.
This commit also increases several of the maximum wait times used in other integration tests. It doesn’t hurt to potentially wait longer, since most test runs will not need to wait the maximum amount of time anyway. However, in the rare cases that do need that extra time, waiting a bit more is fine if we can reduce the flakiness and minimize test failures that happened to time out too early.
Unit tests were added for the new `RestartLatch` and `StopAndStartCounter` utility classes. This PR only affects the tests and does not affect any runtime code or API.
**This should be merged on `trunk` and backported to the `2.3.x` branch.**
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Konstantine Karantasis, Arjun Satish
Closes#7019 from rhauch/kafka-8391
When calling readLogToEnd(), the KafkaBasedLog worker thread should catch TimeoutException and log a warning, which can occur if brokers are unavailable, otherwise the worker thread terminates.
Includes an enhancement to MockConsumer that allows simulating exceptions not just when polling but also when querying for offsets, which is necessary for testing the fix.
Author: Paul Whalen <pgwhalen@gmail.com>
Reviewers: Randall Hauch <rhauch@gmail.com>, Arjun Satish <arjun@confluent.io>, Ryanne Dolan <ryannedolan@gmail.com>
Corrected the AbstractHerder to correctly identify task configs that contain variables for externalized secrets. The original method incorrectly used `matcher.matches()` instead of `matcher.find()`. The former method expects the entire string to match the regex, whereas the second one can find a pattern anywhere within the input string (which fits this use case more correctly).
Added unit tests to cover various cases of a config with externalized secrets, and updated system tests to cover case where config value contains additional characters besides secret that requires regex pattern to be found anywhere in the string (as opposed to complete match).
Author: Arjun Satish <arjun@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
* Clean up one redundant and one misplaced metric
* Clarify the relationship among these metrics to avoid future confusion
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
The transient failures are usually caused by a timeout in `awaitCommits`. This patch increases the timeout from 15s to 30s.
Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Matthias J. Sax <mjsax@apache.org>
We have seen some failures recently in `RestServerTest`. It's the usual problem with reliance on static ports.
```
Caused by: java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:8083
at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:346)
at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:308)
at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.server.Server.doStart(Server.java:396)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.kafka.connect.runtime.rest.RestServer.initializeServer(RestServer.java:178)
... 56 more
Caused by: java.net.BindException: Address already in use
```
This patch makes the chosen port dynamic.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Correct the Flatten SMT to properly handle null key or value `Struct` instances.
Author: Michal Borowiecki <michal.borowiecki@openbet.com>
Reviewers: Arjun Satish <arjun@confluent.io>, Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com>
A bug in `WorkerConfigTransformer` prevents the connector configuration reload when the ConfigData TTL expires.
The issue boils down to the fact that `worker.herder().restartConnector` is receiving a null callback.
```
[2019-06-17 14:34:12,320] INFO Scheduling a restart of connector workshop-incremental in 60000 ms (org.apache.kafka.connect.runtime.WorkerConfigTransformer:88)
[2019-06-17 14:34:12,321] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:227)
java.lang.NullPointerException
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1187)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$19.onCompletion(DistributedHerder.java:1183)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:273)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:219)
```
This patch adds a callback which just logs the error.
Reviewers: Robert Yokota <rayokota@gmail.com>, Jason Gustafson <jason@confluent.io>
`EmbeddedConnectCluster` has the ability to mask system exits to avoid killing the jvm. It appears that the default was intended to be `true`, but is actually `false`. The `maskExitProcedures` method on `EmbeddedConnectCluster.Builder` documents the parameter as:
```
* @param mask if false, exit and halt procedures remain unchanged; true is the default.
```
Because this is not enabled by default as intended, we are seeing some build failures which exit abruptly:
```
17:29:11 Execution failed for task ':connect:runtime:integrationTest'.
17:29:11 > Process 'Gradle Test Executor 25' finished with non-zero exit value 1
```
The culprit often appears to be `ExampleConnectIntegrationTest`, which indeed does not override the default value of `maskExitProcedures`.
Reviewers: Ewen Cheslack-Postava <me@ewencp.org>
Mocking of WorkerCoordinator was not precise after adding an argument (reason) to AbstractCoordinator#maybeLeaveGroup in KAFKA-8569:
Unit test case for DistributedHerderTest is now precise with respect to the expected argument and succeeds
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Static members never leave the group, so potentially we could log a flooding number of warning messages in the hb thread. The solution is to only log as warning when we are on dynamic membership.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
An attempt to refactor current coordinator logic.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Konstantine Karantasis <konstantine@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
As title suggested, we boost 3 stream instances stream job with one minute session timeout, and once the group is stable, doing couple of rolling bounces for the entire cluster. Every rejoin based on restart should have no generation bump on the client side.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bbejeck@gmail.com>
When Connect forwards a REST request from one worker to another, the Authorization header was not forwarded. This commit changes the Connect framework to add include the authorization header when forwarding requests to other workers.
Author: Hai-Dang Dam <damquanghaidang@gmail.com>
Reviewers: Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com>
Restart task on reconfiguration under incremental cooperative rebalancing, and keep execution paths separate for config updates between eager and cooperative. Include the group generation in the log message when the worker receives its assignment.
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
Because of how config values are converted into strings in the `AbstractHerder.validateClientOverrides()` method after being validated by the client override policy, an exception is thrown if the value returned by the policy isn't already parsed as the type expected by the client `ConfigDef`. The fix here involves parsing client override properties before passing them to the override policy.
A unit test is added to ensure that several different types of configs are validated properly by the herder.
Author: Chris Egerton <chrise@confluent.io>
Reviewers: Magesh Nandakumar <magesh.n.kumar@gmail.com>, Randall Hauch <rhauch@gmail.com>
Prior to this change, the next commit time advances
_each_ time a commit happens -- including when a commit happens
because it was requested by the `Task`. When a `Task` requests a
commit several times, the clock advances far into the future
which prevents expected periodic commits from happening.
This commit changes the behavior, we reset `nextCommit` relative
to the time of the commit.
Reviewers: Jason Gustafson <jason@confluent.io>
As title states. We plan to merge this to both trunk and 2.3 if it could fix the stream system tests globally.
Reference implementation: #6673
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>
Return a copy of the ConfigDef in Client Configs. Related to KIP-458.
Author: Magesh Nandakumar <magesh.n.kumar@gmail.com
Reviewer: Randall Hauch <rhauch@gmail.com>
* Remove deprecated class Slf4jRequestLog: use Slf4jRequestLogWriter, CustomRequestLog instread.
1. Remove '@SuppressWarnings("deprecation")' from RestServer#initializeResources, JsonRestServer#start.
2. Remove unused JsonRestServer#httpRequest.
* Fix deprecated class usage: SslContextFactory -> SslContextFactory.[Server, Client]
1. Split SSLUtils#createSslContextFactory into SSLUtils#create[Server, Client]SideSslContextFactory: each method instantiates SslContextFactory.[Server, Client], respectively.
2. SSLUtils#configureSslContextFactoryAuthentication is called from SSLUtils#createServerSideSslContextFactory only.
3. Update SSLUtilsTest following splittion; for client-side SSL Context Factory, SslContextFactory#get[Need, Want]ClientAuth is always false. (SSLUtilsTest#testCreateClientSideSslContextFactory)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Implementation to enable policy for Connector Client config overrides. This is
implemented per the KIP-458.
Reviewers: Randall Hauch <rhauch@gmail.com>
Added the incremental cooperative rebalancing in Connect to avoid global rebalances on all connectors and tasks with each new/changed/removed connector. This new protocol is backward compatible and will work with heterogeneous clusters that exist during a rolling upgrade, but once the clusters consist of new workers only some affected connectors and tasks will be rebalanced: connectors and tasks on existing nodes still in the cluster and not added/changed/removed will continue running while the affected connectors and tasks are rebalanced.
This commit attempted to minimize the changes to the existing V0 protocol logic, though that was not entirely possible.
This commit adds extensive unit and integration tests for both the old V0 protocol and the new v1 protocol. Soak testing has been performed multiple times to verify behavior while connectors and added, changed, and removed and while workers are added and removed from the cluster.
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewers: Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <me@ewencp.org>, Robert Yokota <rayokota@gmail.com>, David Arthur <mumrah@gmail.com>, Ryanne Dolan <ryannedolan@gmail.com>
Per KIP-465, kept existing behavior of `/connectors` resource in the Connect's REST API, but added the ability to specify `?expand` query parameter to get list of connectors with status details on each connector. Added unit tests, and verified passing existing system tests (which use the older list form).
See https://cwiki.apache.org/confluence/display/KAFKA/KIP-465%3A+Add+Consolidated+Connector+Endpoint+to+Connect+REST+API.
Author: Dan Norwood <norwood@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
See https://cwiki.apache.org/confluence/display/KAFKA/KIP-449%3A+Add+connector+contexts+to+Connect+worker+logs
Added LoggingContext as a simple mechanism to set and unset Mapped Diagnostic Contexts (MDC) in the loggers to provide for each thread useful parameters that can be used within the logging configuration. MDC avoids having to modify lots of log statements, since the parameters are available to all log statements issued by the thread, no matter what class makes those calls.
The design intentionally minimizes the number of changes to any existing classes, and does not use Java 8 features so it can be easily backported if desired, although per this KIP it will be applied initially only in AK 2.3 and later and must be enabled via the Log4J configuration.
Reviewers: Jason Gustafson <jason@conflent.io>, Guozhang Wang <wangguoz@gmail.com>