Commit Graph

7322 Commits

Author SHA1 Message Date
Bruno Cadonna 2808a05b01 KAFKA-9675: Fix bug that prevents RocksDB metrics to be updated (#8256)
Reviewers: John Roesler <vvcephei@apache.org>
2020-04-15 13:38:05 -07:00
Bill Bejeck f80912aed9
Kafka 9739: Fix for 2.5 branch (#8492)
This is a port of #8400 for the 2.5 branch

For some context, when building a streams application, the optimizer keeps track of the key-changing operations and any repartition nodes that are descendants of the key-changer. During the optimization phase (if enabled), any repartition nodes are logically collapsed into one. The optimizer updates the graph by inserting the single repartition node between the key-changing node and its first child node. This graph update process is done by searching for a node that has the key-changing node as one of its direct parents, and the search starts from the repartition node, going up in the parent hierarchy.

The one exception to this rule is if there is a merge node that is a descendant of the key-changing node, then during the optimization phase, the map tracking key-changers to repartition nodes is updated to have the merge node as the key. Then the optimization process updates the graph to place the single repartition node between the merge node and its first child node.

The error in KAFKA-9739 occurred because there was an assumption that the repartition nodes are children of the merge node. But in the topology from KAFKA-9739, the repartition node was a parent of the merge node. So when attempting to find the first child of the merge node, nothing was found (obviously) resulting in StreamException(Found a null keyChangingChild node for..)

This PR fixes this bug by first checking that all repartition nodes for optimization are children of the merge node.

Reviewers: John Roesler <john@confluent.io>
2020-04-15 15:41:28 -04:00
Rajini Sivaram 933370c39e KAFKA-9797; Fix TestSecurityRollingUpgrade.test_enable_separate_interbroker_listener (#8403)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2020-04-15 18:54:58 +01:00
Ewen Cheslack-Postava 7e3df8ff05 MINOR: Upgrade ducktape to 0.7.7 (#8487)
This fixes a version pinning issue where a transitive dependency had a
major version upgrade that a dependency did not account for, breaking
the build.

Reviewers: Andrew Egelhofer <aegelhofer@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-04-14 16:38:30 -07:00
Jason Gustafson 264e2af69b KAFKA-9842; Add test case for OffsetsForLeaderEpoch grouping in Fetcher (#8457)
This is a follow-up to #8077. The bug exposed a testing gap in how we group partitions. This patch adds a test case which reproduces the reported problem.

Reviewers: David Arthur <mumrah@gmail.com>
2020-04-13 17:26:27 -07:00
Dezhi “Andy” Fang 3ee844c91f KAFKA-9583; Use topic-partitions grouped by node to send OffsetsForLeaderEpoch requests (#8077)
In `validateOffsetsAsync` in t he consumer, we group the requests by leader node for efficiency. The list of topic-partitions are grouped from `partitionsToValidate` (all partitions) to `node` => `fetchPostitions` (partitions by node). However, when actually sending the request with `OffsetsForLeaderEpochClient`, we use `partitionsToValidate`, which is the list of all topic-partitions passed into `validateOffsetsAsync`. This results in extra partitions being included in the request sent to brokers that are potentially not the leader for those partitions.

This PR fixes the issue by using `fetchPositions`, which is the proper list of partitions that we should send in the request. Additionally, a small typo of API name in `OffsetsForLeaderEpochClient` is corrected (it originally referenced `LisfOffsets` as the API name).

Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>
2020-04-13 17:25:59 -07:00
Jason Gustafson bad93b775d KAFKA-9835; Protect `FileRecords.slice` from concurrent write (#8451)
A read from the end of the log interleaved with a concurrent write can result in reading data above the expected read limit. In particular, this would allow a read above the high watermark. The root of the problem is consecutive calls to `sizeInBytes` in `FileRecords.slice` which do not account for an increase in size due to a concurrent write. This patch fixes the problem by using a single call to `sizeInBytes` and caching the result.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2020-04-08 11:36:02 -07:00
Guozhang Wang 146811ab29
KAFKA-9801: Still trigger rebalance when static member joins in CompletingRebalance phase (#8405)
Fix the direct cause of the observed issue on the client side: when heartbeat getting errors and resetting generation, we only need to set it to UNJOINED when it was not already in REBALANCING; otherwise, the join-group handler would throw the retriable UnjoinedGroupException to force the consumer to re-send join group unnecessarily.

Fix the root cause of the issue on the broker side: we should still trigger rebalance when static member joins in CompletingRebalance phase; otherwise the member.ids would be changed when the assignment is received from the leader, hence causing the new member.id's assignment to be empty.

Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>
2020-04-07 15:29:15 -07:00
Rajini Sivaram fd6bd32c7e KAFKA-9815; Ensure consumer always re-joins if JoinGroup fails (#8420)
On metadata change for assigned topics, we trigger rebalance, revoke partitions and send JoinGroup. If metadata reverts to the original value and JoinGroup fails, we don't resend JoinGroup because we don't set `rejoinNeeded`. This PR sets `rejoinNeeded=true` when rebalance is triggered due to metadata change to ensure that we retry on failure.

Reviewers: Boyang Chen <boyang@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
2020-04-06 17:01:11 -07:00
Jason Gustafson 6e7ebfa9db KAFKA-9807; Protect LSO reads from concurrent high-watermark updates (#8418)
If the high-watermark is updated in the middle of a read with the `read_committed` isolation level, it is possible to return data above the LSO. In the worst case, this can lead to the read of an aborted transaction. The root cause is that the logic depends on reading the high-watermark twice. We fix the problem by reading it once and caching the value.

Reviewers: David Arthur <mumrah@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2020-04-03 13:57:17 -07:00
Konstantine Karantasis b4fe7cca13 KAFKA-9810: Document Connect Root REST API on / (#8408)
Document the supported endpoint at the top-level (root) REST API resource and the information that it returns when a request is made to a Connect worker.

Fixes an omission in documentation after KAFKA-2369 and KAFKA-6311 (KIP-238)

Reviewers: Toby Drake <tobydrake7@gmail.com>, Soenke Liebau <soenke.liebau@opencore.com>
2020-04-03 13:29:08 -07:00
Jason Gustafson d6016b6906 KAFKA-9750; Fix race condition with log dir reassign completion (#8412)
There is a race on receiving a LeaderAndIsr request for a replica with an active log dir reassignment. If the reassignment completes just before the LeaderAndIsr handler updates epoch information, it can lead to an illegal state error since no future log dir exists. This patch fixes the problem by ensuring that the future log dir exists when the fetcher is started. Removal cannot happen concurrently because it requires access the same partition state lock.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>

Co-authored-by: Chia-Ping Tsai <chia7712@gmail.com>
2020-04-03 11:56:46 -07:00
Andrew Olson 4b2268bd29 KAFKA-9233: Fix IllegalStateException in Fetcher retrieval of beginning or end offsets for duplicate TopicPartition values (#7755)
Co-authored-by: Andrew Olson <aolson1@cerner.com>

Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
2020-04-01 19:06:19 +01:00
Guozhang Wang 54eafcd7bd KAFKA-9659: Add more log4j when updating static member mappings (#8269)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Boyang Chen <boyang@confluent.io>, Rohan <desai.p.rohan@gmail.com>
2020-04-01 10:39:22 -07:00
Guozhang Wang 2190ae4cf5 MINOR: more logs for empty assignment (#8397)
We find that brokers may send empty assignment for some members unexpectedly, and would need more logs investigating this issue.

Reviewers: John Roesler <vvcephei@apache.org>
2020-03-31 17:00:05 -07:00
Greg Harris f534da078e KAFKA-9706: Handle null in keys or values when Flatten transformation is used (#8279)
* Fixed DataException thrown when handling tombstone events with null value
* Passes through original record when finding a null key when it's configured for keys or a null value when it's configured for values. 
* Added unit tests for schema and schemaless data
2020-03-30 15:41:41 -07:00
Boyang Chen ae29cb8513 MINOR: Update docs for KIP-530 and KIP-562 (#8388)
Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-03-30 13:13:35 -07:00
Boyang Chen 622a3ac1e8 KAFKA-9760: Add KIP-447 protocol change to upgrade notes (#8350)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-03-30 13:13:16 -07:00
Svend Vanderveken bf7c23de4c MINOR: Fix error message in exception when records have schemas in Connect's Flatten transformation (#3982)
In case of an error while flattening a record with schema, the Flatten transformation was reporting an error about a record without schema, as follows: 

```
org.apache.kafka.connect.errors.DataException: Flatten transformation does not support ARRAY for record without schemas (for field ...)
```

The expected behaviour would be an error message specifying "with schemas". 

This looks like a simple copy/paste typo from the schemaless equivalent methods, in the same file 

Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Konstantine Karantasis <konstantine@confluent.io>
2020-03-28 21:30:41 -07:00
Scott 505f74803e MINOR: Fix code example reference to SchemaBuilder call in Connect's documentation (#3029)
Simple doc fix in a code snippet in connect.html

Co-authored-by: Scott Ferguson <smferguson@gmail.com>

Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Konstantine Karantasis <konstantine@confluent.io>
2020-03-28 18:36:32 -07:00
Chris Egerton f61ee01e84 KAFKA-9771: Port patch for inter-worker Connect SSL from Jetty 9.4.25 (#8369)
For reasons outlined in https://issues.apache.org/jira/browse/KAFKA-9771
we can't upgrade to a version of Jetty with the bug fixed, or downgrade to one prior to the introduction of the bug. Luckily, the actual fix is pretty straightforward and can be ported over to Connect for use until it's possible to upgrade to a version of Jetty with that bug fixed: https://github.com/eclipse/jetty.project/pull/4404/files#diff-58640db0f8f2cd84b7e653d1c1540913R2188-R2193

The changes here have been verified locally; a test with multiple certificates/multiple hostnames will be submitted in a follow up. 

Reviewers: Jeff Huang <47870461+jeffhuang26@users.noreply.github.com>, Konstantine Karantasis <konstantine@confluent.io>
2020-03-27 10:34:54 -07:00
Greg Harris d65ab9dede KAFKA-9707: Fix InsertField.Key should apply to keys of tombstone records (#8280)
* KAFKA-9707: Fix InsertField.Key not applying to tombstone events

* Fix typo that hardcoded .value() instead of abstract operatingValue
* Add test for Key transform that was previously not tested

Signed-off-by: Greg Harris <gregh@confluent.io>

* Add null value assertion to tombstone test

* Remove mis-named function and add test for passing-through a null-keyed record.

Signed-off-by: Greg Harris <gregh@confluent.io>

* Simplify unchanged record assertion

Signed-off-by: Greg Harris <gregh@confluent.io>

* Replace assertEquals with assertSame

Signed-off-by: Greg Harris <gregh@confluent.io>

* Fix checkstyleTest indent issue

Signed-off-by: Greg Harris <gregh@confluent.io>
2020-03-26 11:40:00 -07:00
Boyang Chen 704c5a6d35 KAFKA-9758: Doc changes for KIP-523 and KIP-527 (#8343)
Reviewer: Matthias J. Sax <matthias@confluent.io>
2020-03-25 09:31:41 -07:00
Anna Povzner 301abf040b KAFKA-9677: Fix consumer fetch with small consume bandwidth quotas (#8290)
When we changed quota communication with KIP-219, fetch requests get throttled by returning empty response with the delay in throttle_time_ms and Kafka consumer retries again after the delay. With default configs, the maximum fetch size could be as big as 50MB (or 10MB per partition). The default broker config (1-second window, 10 full windows of tracked bandwidth/thread utilization usage) means that < 5MB/s consumer quota (per broker) may block consumers from being able to fetch any data.

This PR ensures that consumers cannot get blocked by quota by capping fetchMaxBytes in KafkaApis.handleFetchRequest() to quota window * consume bandwidth quota. In the example of default configs (10-second quota window) and 1MB/s consumer bandwidth quota, fetchMaxBytes would be capped to 10MB.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2020-03-25 12:47:52 +00:00
Jason Gustafson 7b807b16c9 KAFKA-9752; New member timeout can leave group rebalance stuck (#8339)
Older versions of the JoinGroup rely on a new member timeout to keep the group from growing indefinitely in the case of client disconnects and retrying. The logic for resetting the heartbeat expiration task following completion of the rebalance failed to account for an implicit expectation that shouldKeepAlive would return false the first time it is invoked when a heartbeat expiration is scheduled. This patch fixes the issue by making heartbeat satisfaction logic explicit.

Reviewers:  Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
2020-03-24 22:18:26 -07:00
jiameixie b8b30bc386 KAFKA-9700: Fix negative estimatedCompressionRatio (#8285)
There are cases where `currentEstimation` is less than
`COMPRESSION_RATIO_IMPROVING_STEP` causing
`estimatedCompressionRatio` to be negative. This, in turn,
may result in `MESSAGE_TOO_LARGE`.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2020-03-24 20:49:31 -07:00
Bob Barrett 5a1567631d KAFKA-9749; Transaction coordinator should treat KAFKA_STORAGE_ERROR as retriable (#8336)
When handling a WriteTxnResponse, the TransactionMarkerRequestCompletionHandler throws an IllegalStateException when the remote broker responds with a KAFKA_STORAGE_ERROR and does not retry the request. This leaves the transaction state stuck in PendingAbort or PendingCommit, with no way to change that state other than restarting the broker, because both EndTxnRequest and InitProducerIdRequest return CONCURRENT_TRANSACTIONS if the state is PendingAbort or PendingCommit. This patch changes the error handling behavior in TransactionMarkerRequestCompletionHandler to retry KAFKA_STORAGE_ERRORs. This matches the existing client behavior, and makes sense because KAFKA_STORAGE_ERROR causes the host broker to shut down, meaning that the partition being written to will move its leadership to a new, healthy broker.

Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>
2020-03-24 14:29:01 -07:00
Hossein Torabi e2c58be775 KAFKA-9563: Fix Kafka Connect documentation around consumer and producer overrides (#8124)
Kafka Connect main doc required a fix to distinguish between worker level producer and consumer overrides and per-connector level producer and consumer overrides, after the latter were introduced with KIP-458.  

* [KAFKA-9563] Fix Kafka connect consumer and producer override documentation

Co-authored-by: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Konstantine Karantasis <konstantine@confluent.io>
2020-03-22 20:08:41 -07:00
Tom Bentley f9d0ec39cc KAFKA-9634: Add note about thread safety in the ConfigProvider interface (#8205)
In Kafka Connect, a ConfigProvider instance can be used concurrently (e.g. via a PUT request to the `/connector-plugins/{connectorType}/config/validate` REST endpoint), but there is no mention of concurrent usage in the Javadocs of the ConfigProvider interface. 

It's worth calling out that implementations need to be thread safe.

Reviewers: Konstantine Karantasis <konstantine@confluent.io>
2020-03-22 13:02:14 -07:00
nicolasguyomar 11da7b6208 MINOR: Update Connect error message to point to the correct config validation REST endpoint (#7991)
When incorrect connector configuration is detected, the returned exception message suggests to check the connector's configuration against the `{connectorType}/config/validate` endpoint. 

Changing the error message to refer to the exact REST endpoint which is `/connector-plugins/{connectorType}/config/validate` 

This aligns the exception message with the documentation at: https://kafka.apache.org/documentation/#connect_rest 

Reviewers: Konstantine Karantasis <konstantine@confluent.io>
2020-03-22 12:45:49 -07:00
Alaa Zbair 6b7940d3d8 KAFKA-8842: : Reading/Writing confused in Connect QuickStart Guide
In step 7 of the QuickStart guide, "Writing data from the console and writing it back to the console" should be "Reading data from the console and writing it back to the console".

Co-authored-by: Alaa Zbair <alaa.zbair@grenoble-inp.org>

Reviewer: Konstantine Karantasis <konstantine@confluent.io>
2020-03-21 22:02:43 -07:00
Matthias J. Sax 223bb49623 KAFKA-9741: Update ConsumerGroupMetadata before calling onPartitionsRevoked() (#8325)
If partitions are revoked, an application may want to commit the current offsets.

Using transactions, committing offsets would be done via the producer passing in the current ConsumerGroupMetadata. If the metadata is not updates before the callback, the call to commitTransaction(...) fails as and old generationId would be used.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-03-20 21:36:33 -07:00
Boyang Chen 3ea04ff5a1 KAFKA-9701 (fix): Only check protocol name when generation is valid (#8324)
This bug was incurred by #7994 with a too-strong consistency check. It is because a reset generation operation could be called in between the joinGroupRequest -> joinGroupResponse -> SyncGroupRequest -> SyncGroupResponse sequence of events, if user calls unsubscribe in the middle of consumer#poll().

Proper fix is to avoid the protocol name check when the generation is invalid.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-03-20 21:27:27 -07:00
Konstantine Karantasis 3b2a717b2c MINOR: Use Exit.exit instead of System.exit in MM2 (#8321)
Exit.exit needs to be used in code instead of System.exit.

Particularly in integration tests using System.exit is disrupting because it exits the jvm process and does not just fail the test correctly. Integration tests override procedures in Exit to protect against such cases.

Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Randall Hauch <rhauch@gmail.com>
2020-03-20 16:46:28 -07:00
Chia-Ping Tsai 12cfd0e176 KAFKA-9654; Update epoch in `ReplicaAlterLogDirsThread` after new LeaderAndIsr (#8223)
Currently when there is a leader change with a log dir reassignment in progress, we do not update the leader epoch in the partition state maintained by `ReplicaAlterLogDirsThread`. This can lead to a FENCED_LEADER_EPOCH error, which results in the partition being marked as failed, which is a permanent failure until the broker is restarted. This patch fixes the problem by updating the epoch in `ReplicaAlterLogDirsThread` after receiving a new LeaderAndIsr request from the controller.

Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
2020-03-19 17:49:32 -07:00
Guozhang Wang 45e0d07c8a HOTFIX: do not rely on file modified time in StateDirectoryTest 2020-03-18 11:38:19 -07:00
Boyang Chen 89a5355779 HOTFIX: StateDirectoryTest should use Set instead of List (#8305)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-03-17 20:14:28 -07:00
Guozhang Wang 9d68b8e3db KAFKA-8803: Remove timestamp check in completeTransitionTo (#8278)
In prepareAddPartitions the txnStartTimestamp could be updated as updateTimestamp, which is assumed to be always larger then the original startTimestamp. However, due to ntp time shift the timer may go backwards and hence the newStartTimestamp be smaller than the original one. Then later in completeTransitionTo the time check would fail with an IllegalStateException, and the txn would not transit to Ongoing.

An indirect result of this, is that this txn would NEVER be expired anymore because only Ongoing ones would be checked for expiration.

We should do the same as in #3286 to remove this check.

Also added test coverage for both KAFKA-5415 and KAFKA-8803.

Reviewers: Jason Gustafson<jason@confluent.io>
2020-03-17 14:40:33 -07:00
Nigel Liang e0750e2f8a KAFKA-9712: Catch and handle exception thrown by reflections scanner (#8289)
This commit works around a bug in version v0.9.12 of the upstream `reflections` library by catching and handling the exception thrown.

The reflections issue is tracked by:
https://github.com/ronmamo/reflections/issues/273

New unit tests were introduced to test the behavior.

* KAFKA-9712: Catch and handle exception thrown by reflections scanner

* Update connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java

Co-Authored-By: Konstantine Karantasis <konstantine@confluent.io>

* Move result initialization back to right before it is used

* Use `java.io.File` in tests

* Fix checkstyle

Co-authored-by: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2020-03-16 14:45:06 -07:00
Guozhang Wang f2963c212e KAFKA-6647: Do note delete the lock file while holding the lock (#8267)
1. Inside StateDirectory#cleanRemovedTasks, skip deleting the lock file (and hence the parent directory) until releasing the lock. And after the lock is released only go ahead and delete the parent directory if manualUserCall == true. That is, this is triggered from KafkaStreams#cleanUp and users are responsible to make sure that Streams instance is not started and hence there are no other threads trying to grab that lock.

2. As a result, during scheduled cleanup the corresponding task.dir would not be empty but be left with only the lock file, so effectively we still achieve the goal of releasing disk spaces. For callers of listTaskDirectories like KIP-441 (cc @ableegoldman to take a look) I've introduced a new listNonEmptyTaskDirectories which excludes such dummy task.dirs with only the lock file left.

3. Also fixed KAFKA-8999 along the way to expose the exception while traversing the directory.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>
2020-03-14 14:08:44 -07:00
Matthias J. Sax 2f6fcb89d9 KAFKA-9533: Fix JavaDocs of KStream.transformValues (#8298)
Reviewers: Bill Bejeck <bill@confluent.io>
2020-03-14 11:35:33 -07:00
Matthias J. Sax 3c73ae271b MINOR: Update Streams IQ JavaDocs to not point to a deprecated method (#8271)
Reviewers: John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-03-11 15:26:17 -07:00
Anna Povzner 8f18f0dbc7 KAFKA-9658; Fix user quota removal (#8232)
Adding (add-config) default user, user, or <user, client-id> quota and then removing it via delete-config does not update quota bound in ClientQuotaManager.Metrics for existing users or <user,client-id>. This causes brokers to continue to throttle with the previously set quotas until brokers restart (or <user,client> stops sending traffic for sometime and sensor expires). This happens only when removing the user or user,client-id where there are no more quotas  to fall back to. Common example where the issue happens: Initial no quota state --> add default user quota --> remove default user quota. 

The cause of the issue was `DefaultQuotaCallback.quotaLimit` was returning `null` when no default user quota set, which caused `ClientQuotaManager.updateQuotaMetricConfigs` to skip updating the appropriate sensor, which left it unchanged with the previous quota. Since `null` is an acceptable return value for `ClientQuotaCallback.quotaLimit`, which is already treated as unlimited quota in other parts of the code, this PR ensures that `ClientQuotaManager.updateQuotaMetricConfigs` updates the quotas for which  `ClientQuotaCallback.quotaLimit` returns `null` to unlimited quota.

Reviewers: Jason Gustafson <jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2020-03-10 12:32:01 -07:00
Andy Coates e6d0d6ea3d KAFKA-9668: Iterating over KafkaStreams.getAllMetadata() results in ConcurrentModificationException (#8233)
`KafkaStreams.getAllMetadata()` returns `StreamsMetadataState.getAllMetadata()`. All the latter methods is `synchronized` it returns a reference to internal mutable state.  Not only does this break encapsulation, but it means any thread iterating over the returned collection when the metadata gets rebuilt will encounter a `ConcurrentModificationException`.

This change:
 * switches from clearing and rebuild `allMetadata` when `onChange` is called to building a new list and swapping this in. This is thread safe and has the benefit that the returned list is not empty during a rebuild: you either get the old or the new list.
 * removes synchronisation from `getAllMetadata` and `getLocalMetadata`. These are returning member variables. Synchronisation adds nothing.
 * changes `getAllMetadata` to wrap its return value in an unmodifiable wrapper to avoid breaking encapsulation.
 * changes the getters in `StreamsMetadata` to wrap their return values in unmodifiable wrapper to avoid breaking encapsulation.

Co-authored-by: Andy Coates <big-andy-coates@users.noreply.github.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-03-06 13:09:20 -08:00
Rajini Sivaram 573a149acc KAFKA-9662: Wait for consumer offset reset in throttle test to avoid losing early messages (#8227) 2020-03-06 14:52:37 -05:00
David Arthur 3f7af52097 KAFKA-9661: Propagate includeSynonyms option to AdminClient in ConfigCommand (#8229) 2020-03-05 11:57:29 -05:00
Jason Gustafson 42b1d64b62 KAFKA-9530; Fix flaky test `testDescribeGroupWithShortInitializationTimeout` (#8154)
With a short timeout, a call in KafkaAdminClient may timeout and the client might disconnect. Currently this can be exposed to the user as either a TimeoutException or a DisconnectException. To be consistent, rather than exposing the underlying retriable error, we handle both cases with a TimeoutException.

Reviewers: Boyang Chen <boyang@confluent.io>, Ismael Juma <ismael@juma.me.uk>
2020-03-04 20:24:08 -05:00
Guozhang Wang a529892182 KAFKA-8995: delete all topics before recreating (#8208)
I think the root cause of KAFKA-8893, KAFKA-8894, KAFKA-8895 and KSTREAMS-3779 are the same: some intermediate topics are not deleted in the setup logic before recreating the user topics, which could cause the waitForDeletion (that check exact match of all existing topics) to fail, and also could cause more records to be returned because of the intermediate topics that are not deleted from the previous test case.

Also inspired by https://github.com/apache/kafka/pull/5418/files I used a longer timeout (120 secs) for deleting all topics.

Reviewers: John Roesler <vvcephei@apache.org>
2020-03-02 17:25:32 -08:00
David Arthur 9c01110a59 Fix NOTICE year 2020-03-02 12:50:09 -05:00
John Roesler 3f99f9dd91
MINOR: Port streams broker compatibility fix (#8203)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-03-02 09:27:17 -08:00