This fix is already in 2.1+ branches, but did not get into older branches.
Should be cherry-picked all the way to 0.10.2.
Reviewers: Matthias J. Sax <mjsax@apache.org>
MINOR: Increase produce timeout for EmbeddedKafkaCluster to 120 seconds
Previous value was 500ms. This change gives more room to pass tests on systems with low resources running many parallel tests.
Reviewers: Randall Hauch <randall@confluent.io>
It is important to clean up any cached epochs that may exist if the log message format does not support it (due to a regression in KAFKA-7415). Otherwise, the broker might make use of them once it upgrades its message format. This can cause unnecessary truncation of data.
Reviewers: Jason Gustafson <jason@confluent.io>
The StreamsBrokerBounceTest.test_broker_type_bounce experienced what looked like a transient failure. After looking over this test and failure, it seems like it is vulnerable to timing error that streams will start before the kafka service creates all topics.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>
The test `org.apache.kafka.connect.runtime.rest.RestServerTest#testCORSEnabled` assumes Jersey client can send restricted HTTP headers(`Origin`).
Jersey client uses `sun.net.www.protocol.http.HttpURLConnection`.
`sun.net.www.protocol.http.HttpURLConnection` drops restricted headers(`Host`, `Keep-Alive`, `Origin`, etc) based on static property `allowRestrictedHeaders`.
This property is initialized in a static block by reading Java system property `sun.net.http.allowRestrictedHeaders`.
So, if classloader loads `HttpURLConnection` before we set `sun.net.http.allowRestrictedHeaders=true`, then all subsequent changes of this system property won't take any effect(which happens if `org.apache.kafka.connect.integration.ExampleConnectIntegrationTest` is executed before `RestServerTest`).
To prevent this, we have to either make sure we set `sun.net.http.allowRestrictedHeaders=true` as early as possible or do not rely on this system property at all.
This PR adds test dependency on `httpcomponents-client` which doesn't depend on `sun.net.http.allowRestrictedHeaders` system property. Thus none of existing tests should interfere with `RestServerTest`.
Author: Alex Diachenko <sansanichfb@gmail.com>
Reviewers: Gwen Shapira
Closes#6276 from avocader/KAFKA-7799_2.0
It used to preallocate an array of responses and then complete each response from the original collection sequentially. The problem was that the original collection could have been modified (another thread completing the response) while this was hapenning
When an older message format is in use, we should disable the leader epoch cache so that we resort to truncation by high watermark. Previously we updated the cache for all versions when a broker became leader for a partition. This can cause large and unnecessary truncations after leader changes because we relied on the presence of _any_ cached epoch in order to tell whether to use the improved truncation logic possible with the OffsetsForLeaderEpoch API.
Note this is a simplified fix than what was merged to trunk in #6232 since the branches have diverged significantly. Rather than removing the epoch cache file, we guard usage of the cache with the record version.
Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jun Rao <junrao@gmail.com>
Avoid mentioning unreleased versions in the docs.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
* Enable heap dumps on OOM with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file.bin> in the major services in system tests
* Collect the heap dump from the predefined location as part of the result logs for each service
* Change Connect service to delete the whole root directory instead of individual expected files
* Tested by running the full suite of system tests
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6158 from kkonstantine/KAFKA-7834
(cherry picked from commit 83c435f3ba)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
Explicitly seek KafkaBasedLog’s consumer to the beginning of the topic partitions, rather than potentially use committed offsets (which would be unexpected) if group.id is set or rely upon `auto.offset.reset=earliest` if the group.id is null.
This should not change existing behavior but should remove some potential issues introduced with KIP-287 if `group.id` is not set in the consumer configurations. Note that even if `group.id` is set, we still always want to consume from the beginning.
Reviewers: Jason Gustafson <jason@confluent.io>
Upgrade from 171 to 202. Unpack and install directly from a cached tgz rather than going via the installer deb from webupd8. The installer is still on 8u919 while we want 202.
Testing via kafka branch builder job
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2305/
Author: Jarek Rudzinski <jarek@confluent.io>
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Alex Diachenko <sansanichfb@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6165 from jarekr/trunk-jdk8-from-tgz
(cherry picked from commit ad3b6dd835)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
The problem is that the sequence number is an Int and should wrap around when it reaches the Int.MaxValue. The bug here is it doesn't wrap around and become negative and raises an error.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch fixes a few overflow issues with wrapping sequence numbers in the broker's producer state tracking.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch changes the behavior of KafkaProducer.waitOnMetadata to wait up to max.block.ms when the partition specified in the produce request is out of the range of partitions present in the metadata. This improves the user experience in the case when partitions are added to a topic and a client attempts to produce to one of the new partitions before the metadata has propagated to the brokers. Tested with unit tests.
Reviewers: Arjun Satish <arjun@confluent.io>, Jason Gustafson <jason@confluent.io>
[KIP-297](https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations#KIP-297:ExternalizingSecretsforConnectConfigurations-PublicInterfaces) introduced the `ConfigProvider` mechanism, which was primarily intended for externalizing secrets provided in connector configurations. However, when querying the Connect REST API for the configuration of a connector or its tasks, those secrets are still exposed. The changes here prevent the Connect REST API from ever exposing resolved configurations in order to address that. rhauch has given a more thorough writeup of the thinking behind this in [KAFKA-5117](https://issues.apache.org/jira/browse/KAFKA-5117)
Tested and verified manually. If these changes are approved unit tests can be added to prevent a regression.
Author: Chris Egerton <chrise@confluent.io>
Reviewers: Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6129 from C0urante/hide-provided-connect-configs
(cherry picked from commit 743607af5a)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
Start the Rest server in the standalone mode similar to how it's done for distributed mode.
Author: Magesh Nandakumar <magesh.n.kumar@gmail.com>
Reviewers: Arjun Satish <arjun@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6148 from mageshn/KAFKA-7826
(cherry picked from commit dec68c9350)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
Expose a programmatic way to bring up a Kafka and Zk cluster through Java API to facilitate integration tests for framework level changes in Kafka Connect. The Kafka classes would be similar to KafkaEmbedded in streams. The new classes would reuse the kafka.server.KafkaServer classes from :core, and provide a simple interface to bring up brokers in integration tests.
Signed-off-by: Arjun Satish <arjunconfluent.io>
Author: Arjun Satish <arjun@confluent.io>
Author: Arjun Satish <wicknicks@users.noreply.github.com>
Reviewers: Randall Hauch <rhauch@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5516 from wicknicks/connect-integration-test
(cherry picked from commit 69d8d2ea11)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
As documented in https://issues.apache.org/jira/browse/KAFKA-7741,
the javax dependency we receive transitively from connect is incompatible
with SBT builds.
Streams doesn't use the portion of Connect that needs the dependency,
so we can fix the builds by simply excluding it.
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
The test always fails if testOptionsDoesNotIncludeWadlOutput is executed before testCORSEnabled. It seems the problem is the use of the system property. Perhaps there is some static caching somewhere.
Reviewers: Randall Hauch <rhauch@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
The null map returned from the current snapshot causes the null type in response. The connector class name can be taken from the config of request instead since we require the config should contain the connector class name.
Reviewers: Jason Gustafson <jason@confluent.io>
Check `running` in `Sender.maybeWaitForProducerId` to ensure that the producer can be closed while awaiting initialization of the producerId.
Reviewers: Jason Gustafson <jason@confluent.io>
Replace `channel` by `fileRecords` in potentially thrown KafkaException
descriptions when loading/writing `FileChannelRecordBatch`. This makes exception
messages more readable (channel only shows an object hashcode, fileRecords shows
the path of the file being read and start/end positions in the file).
Reviewers: Jason Gustafson <jason@confluent.io>
When using the Connect `JsonConverter`, it's impossible to produce tombstone messages, thus impacting the compaction of the topic. This patch allows the converter with and without schemas to output a NULL byte value in order to have a proper tombstone message. When it's regarding to get this data into a connect record, the approach is the same as when the payload looks like `"{ "schema": null, "payload": null }"`, this way the sink connectors can maintain their functionality and reduces the BCC.
Reviewers: Gunnar Morling <gunnar.morling@googlemail.com>, Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
Previous PR #6043 reduced throughput for VerifiableProducer in base class, but the streams_standby_replica_test needs higher throughput for consumer to complete verification in 60 seconds. Same update as #6060 and #6061
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This patch disables support for WADL output in the Connect REST API since it was never intended to be exposed.
Reviewers: Randall Hauch <rhauch@gmail.com>, Jason Gustafson <jason@confluent.io>
This PR addresses a few issues with this system test flakiness. I'll issue similar PRs for 2.1 and trunk as well.
1. Need to grab the monitor before a given operation to observe logs for signal
2. Relied too much on a timely rebalance and only sent a handful of messages.
I've updated the test and ran it here https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2141/ parameterized for 15 repeats all passed.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
…eturned by poll() if there are any records to return
The MockConsumer behaves unlike the real consumer in that it can return a non-empty ConsumerRecords from poll, that also has a count of 0. This change makes the MockConsumer only add partitions to the ConsumerRecords if there are records to return for those partitions.
A unit test in MockConsumerTest demonstrates the issue.
Author: Stig Rohde Døssing <stigdoessing@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#5901 from srdo/KAFKA-7616
There were several reported incidents where the log is rolled to a new segment with the same base offset as an active segment, causing KafkaException: Trying to roll a new log segment for topic partition X-N with start offset M while it already exists. In the cases we have seen, this happens to an empty log segment where there is long idle time before the next append and somehow we get to a state where offsetIndex.isFull() returns true due to _maxEntries == 0. This PR recovers from this state by deleting and recreating the segment and all of its associated index files.
Reviewers: Jason Gustafson <jason@confluent.io>
Should be ported back to 2.0
Author: Cyrus Vafadari <cyrus@confluent.io>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#6004 from cyrusv/cyrus-npe
(cherry picked from commit 9f954ac614)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
Includes Update to ConnectRecord string representation to give
visibility into schemas, useful in SMT tracing
Author: Cyrus Vafadari <cyrus@confluent.io>
Reviewers: Randall Hauch <rhauch@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5860 from cyrusv/cyrus-logging
(cherry picked from commit 4712a36416)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
This PR fixes an issue reported from a user. When we join a KStream with a GlobalKTable we should not reset the repartition flag as the stream may have previously changed its key, and the resulting stream could be used in an aggregation operation or join with another stream which may require a repartition for correct results.
I've added a test which fails without the fix.
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
The restart logic for TTLs in `WorkerConfigTransformer` was broken when trying to make it toggle-able. Accessing the toggle through the `Herder` causes the same code to be called recursively. This fix just accesses the toggle by simply looking in the properties map that is passed to `WorkerConfigTransformer`.
Author: Robert Yokota <rayokota@gmail.com>
Reviewers: Magesh Nandakumar <magesh.n.kumar@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5914 from rayokota/KAFKA-7620
(cherry picked from commit a2e87feb8b)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
The Kafka Streams system tests fail with some regularity due to a timeout starting the broker.
The initial start is quite quick, but many of our tests involve stopping and restarting nodes with data already loaded, and also while processing is ongoing.
Under these conditions, it seems to be normal for the broker to take about 25 seconds to start, which makes the 30 second timeout pretty close for comfort.
I have seen many test failures in which the broker successfully started within a couple of seconds after the tests timed out and already initiated the failure/shut-down sequence.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
In TopologyTestDriver constructor set non-null topic; and in unit test intentionally turn on caching to verify this case.
Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>
ReplicaFetcherThread.shutdown attempts to close the fetcher's Selector while the thread is running. This in unsafe and can result in `Selector.close()` failing with an exception. The exception is caught and logged at debug level, but this can lead to socket leak if the shutdown is due to dynamic config update rather than broker shutdown. This PR changes the shutdown logic to close Selector after the replica fetcher thread is shutdown, with a wakeup() and flag to terminate blocking sends first.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Some connector configs may be sensitive, so we should avoid logging them.
Reviewers: Alex Diachenko, Dustin Cote <dustin@confluent.io>, Jason Gustafson <jason@confluent.io>