Commit Graph

7322 Commits

Author SHA1 Message Date
Nikolay dd9f70e3d1 KAFKA-9573: Fix VerifiableProducer and VerifiableConsumer to work with older Kafka versions (#8197)
These classes are used by `upgrade_test.py` with old Kafka versions so they can
only use functionality that exists in all Kafka versions. This change fixes the test
for Kafka versions older than 0.11.0.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2020-03-01 16:31:52 -08:00
Chia-Ping Tsai c3e453428c MINOR: Fix unrelated types comparison in MetadataRequestTest (#8195)
The type of `leaderId` was changed to `Optional` in a recent change.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2020-02-29 14:09:14 -08:00
David Arthur 1b98aee9dc MINOR: Modify release.py prompt to only accept y/n 2020-02-28 14:41:08 -05:00
Matthew Wong c9d01dbbf8 throttle consumer timeout increase (#8188)
The test_throttled_reassignment test fails because the consumer that is used to validate reassignment does not start on time to consume all messages. This does not seem like an issue with the throttling of the reassignment, since increasing the timeout allowed the test to pass multiple consecutive runs locally.

This test seemed to rely on the default JmxTool for the console consumer that was removed in this commit: 179d0d7
The console consumer would check to see if it had partitions assigned to it before beginning to consume. Although the test occasionally failed with the JmxTool, it began to fail much more after the removal.

Error messages of failures followed the below format with varying numbers of missed messages. They are the first messages by the producer.

535 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 515 more. Total Acked: 192792, Total Consumed: 192259. We validated that the first 535 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer.
In the scope of the test, this error suggests that the test is falling into the race condition described in produce_consume_validate.py, which has the timeout to prevent the consumer from missing initial messages.

This can serve as a temporary fix until the logic of consumer startup is addressed further.

Reviewers: Jason Gustafson <jason@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
2020-02-27 17:49:28 -05:00
Ismael Juma d764599a73 MINOR: Revert Jetty to 9.4.25 (#8183)
9.4.25 renamed closeOutput to completeOutput
(c5acf96506),
which is a method used by recent Jersey versions including the
latest (2.30.1). An example of the error:

> java.lang.NoSuchMethodError: org.eclipse.jetty.server.Response.closeOutput()V
> 	at org.glassfish.jersey.jetty.JettyHttpContainer$ResponseWriter.commit(JettyHttpContainer.java:326)

The request still completes and hence why no test fails. We should think about how
to improve the testing for this kind of problem, but I want to get the fix in before
2.5 RC0.

Credit to @rigelbm for finding this.

Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Andrew Choi <a24choi@edu.uwaterloo.ca>
2020-02-27 13:07:35 -08:00
Bruno Cadonna d32f72c90d Adapt docs about metrics of Streams according to KIP-444 (#8171)
Adapts the docs about metrics of Streams according to https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%253A+Augment+metrics+for+Kafka+Streams

Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-02-27 09:50:29 -08:00
Bruno Cadonna 54136758a5 MINOR: Remove tag from metric to measure process-rate on source nodes (#8175)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-02-27 09:03:52 -08:00
David Arthur b7dac331d4 Update year in NOTICE 2020-02-27 11:19:41 -05:00
Chris Egerton 9456d0955f KAFKA-9601: Stop logging raw connector config values (#8165)
Author: Chris Egerton <chrise@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
2020-02-26 15:52:55 -06:00
Viktor Somogyi e5e133fe2c
KAFKA-8835: KIP-352 docs update (#8127)
Clarifying URP definition and defining new metrics related to partition reassignment.
2020-02-26 12:24:56 -05:00
bill 0416121531 Revert "KAFKA-9533: ValueTransform forwards `null` values (#8108)"
This reverts commit a41d3d86c1.
2020-02-25 12:33:50 -05:00
Ron Dagostino edf8809561 KAFKA-9567: Docs, system tests for ZooKeeper 3.5.7
These changes depend on [KIP-515: Enable ZK client to use the new TLS supported authentication](https://cwiki.apache.org/confluence/display/KAFKA/KIP-515%3A+Enable+ZK+client+to+use+the+new+TLS+supported+authentication), which was only added to 2.5.0. The upgrade to ZooKeeper 3.5.7 was merged to both 2.5.0 and 2.4.1 via https://issues.apache.org/jira/browse/KAFKA-9515, but this change must only be merged to 2.5.0 (it will break the system tests if merged to 2.4.1).

Author: Ron Dagostino <rdagostino@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Andrew Choi <li_andchoi@microsoft.com>

Closes #8132 from rondagostino/KAFKA-9567

(cherry picked from commit 9d53ad794d)
Signed-off-by: Manikumar Reddy <manikumar@confluent.io>
2020-02-25 20:00:27 +05:30
Chia-Ping Tsai 4b39238673 KAFKA-9599 create unique sensor to record group rebalance (#8159)
The "offset deletion" and "group rebalance" should not be recorded by the same sensor since they are totally different.

The code is introduced by #7276.

Reviewers: David Jacot <djacot@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-02-24 11:33:04 -08:00
Matthias J. Sax 139714b1fb HOTFIX: fix NPE in Kafka Streams IQ (#8158)
Reviewers: Vito Jeng <vito@is-land.com.tw>, Guozhang Wang <guozhang@confluent.io>
2020-02-23 13:09:11 +01:00
Lucas Bradstreet c6cbdf2be8 KAFKA-9577; SaslClientAuthenticator incorrectly negotiates SASL_HANDSHAKE version (#8142)
The SaslClientAuthenticator incorrectly negotiates supported SaslHandshakeRequest version and  uses the maximum version supported by the broker whether or not the client supports it. 

This bug was exposed by a recent version bump in 0a2569e2b9.

This PR rolls back the recent SaslHandshake[Request,Response] bump, fixes the version negotiation, and adds a test to prevent anyone from accidentally bumping the version without a workaround such as a new ApiKey. The existing key will be difficult to support for clients < 2.5 due to the incorrect negotiation.

Reviewers: Ron Dagostino <rdagostino@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>, Colin P. McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
2020-02-21 21:49:48 -08:00
Konstantine Karantasis 972119e34c MINOR: Document endpoints for connector topic tracking (KIP-558)
Update the site documentation to include the endpoints introduced with KIP-558 and a short paragraph on how this feature is used in Connect.

Author: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Toby Drake <tobydrake7@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #8148 from kkonstantine/kip-558-docs

(cherry picked from commit bbfecaef72)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
2020-02-21 12:25:55 -08:00
Boyang Chen e2f9f08d1c MINOR: Improve EOS example exception handling (#8052)
The current EOS example mixes fatal and non-fatal error handling. This patch fixes this problem and simplifies the example.

Reviewers: Jason Gustafson <jason@confluent.io>
2020-02-21 10:41:46 -08:00
Boyang Chen 121df465fa
KAFKA-9582: Do not abort transaction in unclean close (#8143)
In order to avoid hitting the fatal exception during unclean close, we should avoid calling the abortTransaction() call.

Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-02-21 10:27:57 -08:00
Lee Dongjin a721e71bc2 KAFKA-9586: Fix errored json filename in ops documentation
This PR is the counterpart of apache/kafka-site#253.

cc/ omkreddy

Author: Lee Dongjin <dongjin@apache.org>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #8149 from dongjinleekr/feature/KAFKA-9586

(cherry picked from commit b28aa4ece6)
Signed-off-by: Manikumar Reddy <manikumar@confluent.io>
2020-02-21 18:49:32 +05:30
Ron Dagostino f008dbf9b7 KAFKA-9575: Mention ZooKeeper 3.5.7 upgrade
*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*

*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*

Author: Ron Dagostino <rdagostino@confluent.io>

Reviewers: Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #8139 from rondagostino/KAFKA-9575

(cherry picked from commit d9b8b86bdd)
Signed-off-by: Manikumar Reddy <manikumar@confluent.io>
2020-02-21 18:45:55 +05:30
John Roesler c3da9ac86a
KAFKA-9562: part 1: ignore exceptions while flushing stores in close(dirty) (#8116)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-02-20 13:28:11 -08:00
Michael Viamari 7bf808ae0d KAFKA-9533: ValueTransform forwards `null` values (#8108)
Fixes a bug where KStream#transformValues would forward null values from the provided ValueTransform#transform operation.

A test was added for verification KStreamTransformValuesTest#shouldEmitNoRecordIfTransformReturnsNull. A parallel test for non-key ValueTransformer was not added, as they share the same code path.

Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
2020-02-20 11:06:52 -05:00
Stanislav Kozlovski f334b19722 MINOR: Add missing @Test annotation to MetadataTest#testMetadataMerge (#8141)
Reviewers: Brian Byrne <bbyrne@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2020-02-20 11:11:21 +00:00
Lee Dongjin 63a297fa1c MINOR: fix omitted 'next.' in interactive queries documentation (#7883)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-02-19 10:24:29 -08:00
Jason Gustafson 8eddfcbed7 KAFKA-9544; Fix flaky test `AdminClientTest.testDefaultApiTimeoutOverride` (#8101)
There is a race condition with the backoff sleep in the test case and setting the next allowed send time in the AdminClient. To fix it, we allow the test case to do the backoff sleep multiple times if needed.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2020-02-19 09:25:07 -08:00
Sanjana Kaundinya 0c7a33cec1 KAFKA-9558; Fix retry logic in KafkaAdminClient listOffsets (#8119)
This PR is to fix the retry logic for `getListOffsetsCalls`. Previously, if there were partitions with errors, it would only pass in the current call object to retry after a metadata refresh. However, if there's a leader change, the call object never gets updated with the correct leader node to query. This PR fixes this by making another call to `getListOffsetsCalls` with only the error topic partitions as the next calls to be made after the metadata refresh. In addition there is an additional test to test the scenario where a leader change occurs.

Reviewers: Jason Gustafson <jason@confluent.io>
2020-02-19 09:13:22 -08:00
Boyang Chen fd686497d0 MINOR: Reduce log level to Trace for fetch offset downgrade (#8093)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-02-18 16:01:10 -08:00
Matthias J. Sax 7f6ac80f83 KAFKA-8025: Fix flaky RocksDB test (#8126)
Reviewers: Bill Bejeck <bill@confluent.io>
2020-02-18 11:30:35 -08:00
Chia-Ping Tsai 027984eaee KAFKA-8245: Fix Flaky Test DeleteConsumerGroupsTest#testDeleteCmdAllGroups (#8032)
Change unit tests to make sure the consumer group is in Stable state (i.e. consumers have completed joining the group)
2020-02-17 12:43:09 -08:00
Ismael Juma cf68d4d02d KAFKA-9515: Upgrade ZooKeeper to 3.5.7 (#8125)
A couple of critical fixes:

ZOOKEEPER-3644: Data loss after upgrading standalone ZK server 3.4.14 to 3.5.6 with snapshot.trust.empty=true
ZOOKEEPER-3701: Split brain on log disk full (3.5)

Full release notes:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12346098

Reviewers: Bill Bejeck <bbejeck@gmail.com>
2020-02-17 07:59:25 -08:00
Rajini Sivaram e4971cd2d8 MINOR: Add upgrade note about TLSv1 and TLSv1.1 being disabled in 2.5.0 (#8128)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2020-02-17 11:38:12 +00:00
Nikolay 7a3f4a1fb1 KAFKA-9319: Fix generation of CA certificate for system tests. (#8106)
Newer versions of Java have added checks to ensure that trust anchors are CA certificates and contain proper extensions. This PR adds Basic Constraints extension with the CA field set to true for system tests.

Reviewers: ajini Sivaram <rajinisivaram@googlemail.com>
2020-02-17 09:50:41 +00:00
Boyang Chen 8535313293 KAFKA-9535; Update metadata before retrying partitions when fetching offsets (#8088)
Today if we attempt to list offsets with a fenced leader epoch, consumer will retry without updating the metadata until the timeout is reached. This affects synchronous APIs such as `offsetsForTimes`, `beginningOffsets`, and `endOffsets`. The fix in this patch is to trigger the metadata update call whenever we see a retriable error before additional attempts.

Reviewers: Jason Gustafson <jason@confluent.io>
2020-02-16 12:07:24 -08:00
Bob Barrett 5a073fca4c KAFKA-8805; Bump producer epoch on recoverable errors (#7389)
This change is the client-side part of KIP-360. It identifies cases where it is safe to abort a transaction, bump the producer epoch, and allow the application to continue without closing the producer. In these cases, when KafkaProducer.abortTransaction() is called, the producer sends an InitProducerId following the transaction abort, which causes the producer epoch to be bumped. The application can then start a new transaction and continue processing.

For recoverable errors in the idempotent producer, the epoch is bumped locally. In-flight requests for partitions with an error are rewritten to reflect the new epoch, and in-flights of all other partitions are allowed to complete using the old epoch. 

Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>
2020-02-15 22:51:18 -08:00
Brian Byrne a53f8be3fa KAFKA-8904: Improve producer's topic metadata fetching. (#7781)
When the producer encouteres new topic(s), it now only fetches the metadata for the new topics. For cases where a producer interacts with a lot of topics, this reduces the cost for the topic being evicted from the cache, and during startup when populating the topic cache.

Additionally adds a new producer configuration variable 'metadata.max.idle.ms', which controls how long topic metadata may be idle (i.e. not produced to) before it's finally discarded from the metadata cache.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, dengziming <dengziming1993@gmail.com>
2020-02-14 23:15:54 +00:00
Konstantine Karantasis b403c669db KAFKA-9556; Fix two issues with KIP-558 and expand testing coverage (#8085)
Correct the Connect worker logic to properly disable the new topic status (KIP-558) feature when `topic.tracking.enable=false`, and fix automatic topic status reset after a connector is deleted.

Also adds new `ConnectorTopicsIntegrationTest` and expanded unit tests.

Reviewers: Randall Hauch <rhauch@gmail.com>
2020-02-14 14:35:26 -08:00
Jason Gustafson 61e298dfd0 HOTFIX: Fix breakage in `ConsumerPerformanceTest` (#8113)
Test cases in `ConsumerPerformanceTest` were failing and causing a system exit rather than throwing the expected exception following #8023. We didn't catch this because the tests were marked as skipped and not failed.

Reviewers: Guozhang Wang <guozhang@confluent.io>
2020-02-13 18:31:59 -08:00
Mitch 1424b60ebc KAFKA-8507; Unify connection name flag for command line tool [KIP-499] (#8023)
This change updates ConsoleProducer, ConsumerPerformance, VerifiableProducer, and VerifiableConsumer classes to add and prefer the --bootstrap-server flag for defining the connection point of the Kafka cluster. This change is part of KIP-499: https://cwiki.apache.org/confluence/display/KAFKA/KIP-499+-+Unify+connection+name+flag+for+command+line+tool.

Reviewers: Ron Dagostino <rdagostino@confluent.io>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>,  Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
2020-02-13 13:48:58 -08:00
Konstantine Karantasis 5af8380c64 MINOR: Small Connect integration test fixes (#8100)
Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
2020-02-12 17:41:22 -06:00
Lev Zemlyanov 93bf82605b allow ReplaceField SMT to handle tombstone records (#7731)
Signed-off-by: Lev Zemlyanov <lev@confluent.io>
2020-02-12 16:36:54 -06:00
Lev Zemlyanov 60dafe942e KAFKA-9192: fix NPE when for converting optional json schema in structs (#7733)
Author: Lev Zemlyanov <lev@confluent.io>
Reviewers: Greg Harris <gregh@confluent.io>, Randall Hauch <rhauch@gmail.com>
2020-02-12 15:45:54 -06:00
Boyang Chen a017dcbd3e KAFKA-9417: New Integration Test for KIP-447 (#8000)
This change mainly have 2 components:

1. extend the existing transactions_test.py to also try out new sendTxnOffsets(groupMetadata) API to make sure we are not introducing any regression or compatibility issue
  a. We shrink the time window to 10 seconds for the txn timeout scheduler on broker so that we could trigger expiration earlier than later

2. create a completely new system test class called group_mode_transactions_test which is more complicated than the existing system test, as we are taking rebalance into consideration and using multiple partitions instead of one. For further breakdown:
  a. The message count was done on partition level, instead of global as we need to visualize 
the per partition order throughout the test. For this sake, we extend ConsoleConsumer to print out the data partition as well to help message copier interpret the per partition data.
  b. The progress count includes the time for completing the pending txn offset expiration
  c. More visibility and feature improvements on TransactionMessageCopier to better work under either standalone or group mode.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-02-12 12:45:19 -08:00
Boyang Chen c853929c3a HOTFIX: Fix spotsbug failure in Kafka examples (#8051)
Reviewers: Jason Gustafson <jason@confluent.io>
2020-02-12 12:45:07 -08:00
Boyang Chen a4b2a086f8 KAFKA-9447: Add new customized EOS model example (#8031)
With the improvement of 447, we are now offering developers a better experience on writing their customized EOS apps with group subscription, instead of manual assignments. With the demo, user should be able to get started more quickly on writing their own EOS app, and understand the processing logic much better.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-02-12 12:44:56 -08:00
David Jacot 8e6d24a50f KAFKA-9499; Improve deletion process by batching more aggressively (#8053)
This PR speeds up the deletion process by doing the following:
- Batch whenever possible to minimize the number of requests sent out to other brokers;
- Refactor `onPartitionDeletion` to remove the usage of `allLiveReplicas`.

Reviewers: Jason Gustafson <jason@confluent.io>
2020-02-12 11:54:07 -08:00
Jason Gustafson a73636070a MINOR: Fix unnecessary metadata fetch before group assignment (#8095)
The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to https://github.com/apache/kafka/pull/7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic.

Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, #6561 is probably still needed to improve the resilience of this test.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2020-02-12 11:54:02 -08:00
Sönke Liebau 93a4820488 KAFKA-9423: Refine layout of configuration options on website and make individual settings directly linkable (#7955)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2020-02-12 17:44:31 +00:00
John Roesler 1679839cde KAFKA-9500: Fix FK Join Topology (#8015)
Corrects a flaw leading to an exception while building topologies that include both:

* A foreign-key join with the result not explicitly materialized
* An operation after the join that requires source materialization

Also corrects a flaw in TopologyTestDriver leading to output records being enqueued in the wrong order under some (presumably rare) circumstances.

Cherry-pick of 1681c78f60 from trunk

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-02-11 22:39:24 -06:00
John Roesler 7b71cb92b5 KAKFA-9503: Fix TopologyTestDriver output order (#8065)
Migrates TopologyTestDriver processing to be closer to the same processing/ordering
semantics as KafkaStreams. This corrects the output order for recursive topologies
as reported in KAFKA-9503, and also works similarly in the case of task idling.

Cherry-pick of 998f1520f9 from trunk

Reviewers: Matthias J. Sax <matthias@confluent.io>
2020-02-11 21:48:40 -06:00
Bruno Cadonna 0d82d25c32 KAFKA-9355: Fix bug that removed RocksDB metrics after failure in EOS (#7996)
* Added init() method to RocksDBMetricsRecorder
* Added call to init() of RocksDBMetricsRecorder to init() of RocksDB store
* Added call to init() of RocksDBMetricsRecorder to openExisting() of segmented state stores
* Adapted unit tests
* Added integration test that reproduces the situation in which the bug occurred

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-02-11 17:32:21 -08:00