Commit Graph

7371 Commits

Author SHA1 Message Date
Tom Bentley 6cfed8ad00
KAFKA-9651: Fix ArithmeticException (÷ by 0) in DefaultStreamPartitioner (#8226)
In Streams `StreamsMetadataState.getMetadataWithKey`, we should use the inferred max topic partitions passed in directly from the caller than relying on cluster to contain its topic-partition information.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-03-23 10:07:54 -07:00
Hossein Torabi afe26e2992
KAFKA-9563: Fix Kafka Connect documentation around consumer and producer overrides (#8124)
Kafka Connect main doc required a fix to distinguish between worker level producer and consumer overrides and per-connector level producer and consumer overrides, after the latter were introduced with KIP-458.  

* [KAFKA-9563] Fix Kafka connect consumer and producer override documentation

Co-authored-by: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Konstantine Karantasis <konstantine@confluent.io>
2020-03-22 20:07:21 -07:00
Matthias J. Sax 1ad5f346cb
KAFKA-9451: Enable producer per thread for Streams EOS (#8318)
- KIP-447
- add new configs to enable producer per thread EOS
- updates TaskManager to use single shared producer for eos-beta

Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-03-22 13:50:07 -07:00
Tom Bentley 8aa1861816
KAFKA-9634: Add note about thread safety in the ConfigProvider interface (#8205)
In Kafka Connect, a ConfigProvider instance can be used concurrently (e.g. via a PUT request to the `/connector-plugins/{connectorType}/config/validate` REST endpoint), but there is no mention of concurrent usage in the Javadocs of the ConfigProvider interface. 

It's worth calling out that implementations need to be thread safe.

Reviewers: Konstantine Karantasis <konstantine@confluent.io>
2020-03-22 12:58:21 -07:00
nicolasguyomar f8173c2df5
MINOR: Update Connect error message to point to the correct config validation REST endpoint (#7991)
When incorrect connector configuration is detected, the returned exception message suggests to check the connector's configuration against the `{connectorType}/config/validate` endpoint. 

Changing the error message to refer to the exact REST endpoint which is `/connector-plugins/{connectorType}/config/validate` 

This aligns the exception message with the documentation at: https://kafka.apache.org/documentation/#connect_rest 

Reviewers: Konstantine Karantasis <konstantine@confluent.io>
2020-03-22 12:42:45 -07:00
Chia-Ping Tsai 4f6907947a
MINOR: fix linking errors in javadoc (#8198)
This improvement fixes several linking errors to classes and methods from within javadocs. 

Related to #8291

Reviewers: Konstantine Karantasis <konstantine@confluent.io>
2020-03-22 10:15:18 -07:00
Alaa Zbair 5ccd3cd46d
KAFKA-8842: : Reading/Writing confused in Connect QuickStart Guide
In step 7 of the QuickStart guide, "Writing data from the console and writing it back to the console" should be "Reading data from the console and writing it back to the console".

Co-authored-by: Alaa Zbair <alaa.zbair@grenoble-inp.org>

Reviewer: Konstantine Karantasis <konstantine@confluent.io>
2020-03-21 22:00:33 -07:00
A. Sophie Blee-Goldman 6cf27c9c77
KAFKA-6145: Pt 2.5 Compute overall task lag per client (#8252)
Once we have encoded the offset sums per task for each client, we can compute the overall lag during assign by fetching the end offsets for all changelog and subtracting.

If the listOffsets request fails, we simply return a "completely sticky" assignment, ie all active tasks are given to previous owners regardless of balance.

Builds (but does not yet use) the statefulTasksToRankedCandidates map with the ranking:
Rank -1: active running task
Rank 0: standby or restoring task whose overall lag is within acceptableRecoveryLag
Rank 1: tasks whose lag is unknown (eg during version probing)
Rank 1+: all other tasks are ranked according to their actual total lag

Implements: KIP-441
Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>
2020-03-21 13:40:34 -05:00
Lucas Bradstreet c16938fd96
MINOR: allow retries for unitTest and integrationTest runs (#8323)
We will currently retry if you run gradle test, but not unitTest or
integrationTest, which are used directly in Jenkins. This means that we
have not been achieving the expected retry behavior.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2020-03-21 09:09:29 -07:00
Matthias J. Sax 635b5fd47c
KAFKA-9741: Update ConsumerGroupMetadata before calling onPartitionsRevoked() (#8325)
If partitions are revoked, an application may want to commit the current offsets.

Using transactions, committing offsets would be done via the producer passing in the current ConsumerGroupMetadata. If the metadata is not updates before the callback, the call to commitTransaction(...) fails as and old generationId would be used.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-03-20 21:34:08 -07:00
Boyang Chen c75dc5e2e0
KAFKA-9701 (fix): Only check protocol name when generation is valid (#8324)
This bug was incurred by #7994 with a too-strong consistency check. It is because a reset generation operation could be called in between the joinGroupRequest -> joinGroupResponse -> SyncGroupRequest -> SyncGroupResponse sequence of events, if user calls unsubscribe in the middle of consumer#poll().

Proper fix is to avoid the protocol name check when the generation is invalid.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-03-20 21:26:57 -07:00
Konstantine Karantasis 406635bcc9
MINOR: Use Exit.exit instead of System.exit in MM2 (#8321)
Exit.exit needs to be used in code instead of System.exit.

Particularly in integration tests using System.exit is disrupting because it exits the jvm process and does not just fail the test correctly. Integration tests override procedures in Exit to protect against such cases.

Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Randall Hauch <rhauch@gmail.com>
2020-03-20 16:39:17 -07:00
Boyang Chen c249ea8e5d
KAFKA-9727: cleanup the state store for standby task dirty close and check null for changelogs (#8307)
This PR fixes three things:

* the state should be closed when standby task is restoring as well
* the EOS standby task should also wipe out state under dirty close
* the changelog reader should check for null as well

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-03-20 15:26:13 -07:00
Bruno Cadonna cbdd0d6cf1
KAFKA-6145: Add constrained balanced assignment algorithm (#8262)
Adds a currently unused component of the new Streams assignment algorithm.

Implements: KIP-441
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>
2020-03-20 13:51:24 -05:00
John Roesler 960b216290
KAFKA-9734: Fix IllegalState in Streams transit to standby (#8319)
Consolidate ChangelogReader state management inside of StreamThread to avoid having to reason about all execution paths in both StreamThread and TaskManager.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-03-20 10:17:51 -05:00
Agam Brahma 24d05aa601
KAFKA-9553; Improve measurement for loading groups and transactions (#8155)
This patch modifies the loading time metric to account for time spent waiting for the loading time task to be scheduled.

Reviewers: Jason Gustafson <jason@confluent.io>
2020-03-19 23:35:16 -07:00
Colin Patrick McCabe 56051e7639
KAFKA-8820: kafka-reassign-partitions.sh should support the KIP-455 API (#8244)
Rewrite ReassignPartitionsCommand to use the KIP-455 API when possible, rather
than direct communication with ZooKeeper.  Direct ZK access is still supported,
but deprecated, as described in KIP-455.

As specified in KIP-455, the tool has several new flags.  --cancel stops
an assignment which is in progress.  --preserve-throttle causes the
--verify and --cancel commands to leave the throttles alone.
--additional allows users to execute another partition assignment even
if there is already one in progress.  Finally, --show displays all of
the current partition reassignments.

Reorganize the reassignment code and tests somewhat to rely more on unit
testing using the MockAdminClient and less on integration testing.  Each
integration test where we bring up a cluster seems to take about 5 seconds, so
it's good when we can get similar coverage from unit tests.  To enable this,
MockAdminClient now supports incrementalAlterConfigs, alterReplicaLogDirs,
describeReplicaLogDirs, and some other APIs.  MockAdminClient is also now
thread-safe, to match the real AdminClient implementation.

In DeleteTopicTest, use the KIP-455 API rather than invoking the reassignment
command.
2020-03-19 20:44:34 -07:00
Chia-Ping Tsai c27f629e95
KAFKA-9654; Update epoch in `ReplicaAlterLogDirsThread` after new LeaderAndIsr (#8223)
Currently when there is a leader change with a log dir reassignment in progress, we do not update the leader epoch in the partition state maintained by `ReplicaAlterLogDirsThread`. This can lead to a FENCED_LEADER_EPOCH error, which results in the partition being marked as failed, which is a permanent failure until the broker is restarted. This patch fixes the problem by updating the epoch in `ReplicaAlterLogDirsThread` after receiving a new LeaderAndIsr request from the controller.

Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
2020-03-19 16:49:35 -07:00
jiameixie 11e6aedff6
MINOR: Bump RocksDB version from 5.18.3 to 5.18.4 (#8284)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-03-19 15:50:23 -07:00
Matthias J. Sax 21cfd0b453
MINOR: Fix generic types in StreamsBuilder and Topology (#8273)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <guozhang@confluent.io>, John Roesler <john@confluent.io>
2020-03-19 14:29:15 -07:00
Matthias J. Sax 89cd2f2a0b
KAFKA-9441: Unify committing within TaskManager (#8218)
- part of KIP-447
 - commit all tasks at once using non-eos (and eos-beta in follow up work)
 - unified commit logic into TaskManager
 - split existing methods of Task interface in pre/post parts

Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-03-19 11:31:51 -07:00
A. Sophie Blee-Goldman 9ee8277cdd
KAFKA-6145: Add new assignment configs
Add 4 new assignor configs in preparation for the new assignment algorithm:
1. acceptable.recovery.lag
2. balance.factor
3. max.warmup.replicas
4. probing.rebalance.interval.ms

Implements: KIP-441
Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>
2020-03-19 13:19:14 -05:00
Bruno Cadonna 0174c95f4f
MINOR: Fix javadoc warning in StreamsMetric (#8314)
Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@apache.org>
2020-03-19 13:12:24 -04:00
Tom Bentley 4f1e8331ff
KAFKA-9435: DescribeLogDirs automated protocol (#7972)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2020-03-19 15:26:18 +00:00
Lucas Bradstreet 5c0cf02947
MINOR: return unmodifiableMap for PartitionStates.partitionStateMap. (#7637)
Makes the map returned by partitionStateMap unmodifiable to prevent mutation.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2020-03-19 08:19:51 -07:00
Boyang Chen c7164a3866
KAFKA-8618: Replace Txn marker with automated protocol (#7039)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2020-03-19 14:26:41 +00:00
A. Sophie Blee-Goldman 85c96f5230
KAFKA-9568: enforce rebalance if client endpoint has changed (#8299)
Since the assignment info includes a map with all member's host info, we can just check the received map to make sure our endpoint is contained. If not, we need to force the group to rebalance and get our updated endpoint info.

Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-03-18 18:37:38 -07:00
Guozhang Wang b1999ba22d
KAFKA-5604: Remove the redundant TODO marker on the Streams side (#8313)
The issue itself has been fixed a while ago on the producer side, so we can just remove this TODO marker now (we've removed the isZombie flag already anyways).

Reviewers: John Roesler <vvcephei@apache.org>
2020-03-18 14:22:53 -07:00
Boyang Chen b586283c53
KAFKA-9656; Return COORDINATOR_NOT_AVAILABLE for older producer clients (#8253)
The `TxnOffsetCommit` API suffers from a bug affecting older client versions which treat `COORDINATOR_LOAD_IN_PROGRESS` errors as fatal. This PR changes the handling on the broker to instead return `COORDINATOR_NOT_AVAILABLE` in this case so that clients won't crash upon doing txn commit. 

Reviewers: Jason Gustafson <jason@confluent.io>
2020-03-18 12:25:36 -07:00
Sanjana Kaundinya 34a7ba56a1
KAFKA-9047; AdminClient group operations should respect retries and backoff (#8161)
Previously, `AdminClient` group operations did not respect a `Call`'s number of configured tries and retry backoff. This could lead to tight retry loops that put a lot of pressure on the broker. This PR introduces fixes that ensures for all group operations the `AdminClient` respects the number of tries and the backoff a given `Call` has.

Reviewers: Vikas Singh <vikas@confluent.io>, Jason Gustafson <jason@confluent.io>
2020-03-18 12:19:40 -07:00
Guozhang Wang c0cff61e8c HOTFIX: do not depend on file modified time in StateDirectoryTest 2020-03-18 11:31:23 -07:00
Ismael Juma 93f082e093
MINOR: Update Scala to 2.12.11 (#8308)
Highlights:
* Performance improvements in the ollections
library: algorithmic improvements and
changes to avoid unnecessary allocations.
* Performance improvements in the compiler.
* ASM was upgraded to 7.3.1, allowing the
optimizer to run on JDK 13+.

Full release notes: https://github.com/scala/scala/releases/tag/v2.12.11

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2020-03-18 09:33:14 -07:00
belugabehr da7d134640
KAFKA-9404: Use ArrayList instead of LinkedList in Sensor (#7936)
The former is generally a better option than the latter.

Reviewers: Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <ismael@juma.me.uk>
2020-03-18 06:38:36 -07:00
Sanjana Kaundinya 5fc3cd61fc
KAFKA-9625: Fix altering and describing dynamic broker configurations (#8260)
* Broker throttles were incorrectly marked as sensitive configurations.  Fix this, so that their values can be returned via DescribeConfigs as expected.

* Previously, changes to broker configs that consisted only of deletions were ignored by the brokers because of faulty delta calculation logic that didn't consider deletions as changes, only alterations as changes.  Fix this and add a regression test.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2020-03-17 23:02:33 -07:00
A. Sophie Blee-Goldman d38e97e319
MINOR: clean up required setup for StreamsPartitionAssignorTest (#8306)
No logical or behavioral changes, just a bit of cleanup in this class before we have to write and fix a lot of these tests for KIP-441:

* Moved creation of streamsMetadata mock to setUp (in exactly one test it will be overwritten with a strict mock)
* Tried to clean up the use of helper methods for configuring the assignor.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-03-17 22:14:00 -07:00
Chia-Ping Tsai f08c9c7de6
HOTFIX: fix flaky StateDirectoryTest.shouldReturnEmptyArrayIfListFilesReturnsNull (#8310)
StateDirectoryTest.shouldReturnEmptyArrayIfListFilesReturnsNull always moves the stage dir to /tmp/state-renamed so it always fails if there is already a folder (for example, the stuff leaved by previous test).

Reviewers: Boyang Chen <boyang@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-03-17 22:13:13 -07:00
Guozhang Wang 6a88d32b9f
KAFKA-8803: Remove timestamp check in completeTransitionTo (#8278)
In prepareAddPartitions the txnStartTimestamp could be updated as updateTimestamp, which is assumed to be always larger then the original startTimestamp. However, due to ntp time shift the timer may go backwards and hence the newStartTimestamp be smaller than the original one. Then later in completeTransitionTo the time check would fail with an IllegalStateException, and the txn would not transit to Ongoing.

An indirect result of this, is that this txn would NEVER be expired anymore because only Ongoing ones would be checked for expiration.

We should do the same as in #3286 to remove this check.

Also added test coverage for both KAFKA-5415 and KAFKA-8803.

Reviewers: Jason Gustafson<jason@confluent.io>
2020-03-17 14:40:02 -07:00
Lucas Bradstreet 97156256c7
MINOR: double -Xss setting from 2m to 4m in build.gradle (#8264)
I have seen an increased incidence in StackOverflowError(s) when compiling scala. This
change doubles the max stack size to 4m.

```
> Task :core:compileScala FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':core:compileScala'.
> java.lang.StackOverflowError (no error message)
```

Reviewers: Andrew Choi <a24choi@edu.uwaterloo.ca>, Ismael Juma <ismael@juma.me.uk>
2020-03-17 09:12:40 -07:00
A. Sophie Blee-Goldman 673018504f
MINOR: cleanup and add tests to StateDirectoryTest (#8304)
Adds tests for edge conditions of listAllTaskDirectories
Also includes some minor cleanup of the StateDirectoryTest class

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-03-16 21:59:02 -07:00
Boyang Chen 1e6d944813
HOTFIX: StateDirectoryTest should use Set instead of List (#8305)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-03-16 21:37:46 -07:00
Matthias J. Sax dffc7f8c30
MINOR: Fix build and JavaDoc warnings (#8291)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, @SoontaekLim, Bill Bejeck <bill@confluent.io>
2020-03-16 18:23:21 -07:00
Brian Byrne fc79853c4d
MINOR: Fix kafka.server.RequestQuotaTest missing new ApiKeys. (#8302)
The test was broken by commit 227a7322b7.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2020-03-16 16:21:45 -07:00
Nigel Liang 569cf994b0
KAFKA-9712: Catch and handle exception thrown by reflections scanner (#8289)
This commit works around a bug in version v0.9.12 of the upstream `reflections` library by catching and handling the exception thrown.

The reflections issue is tracked by:
https://github.com/ronmamo/reflections/issues/273

New unit tests were introduced to test the behavior.

* KAFKA-9712: Catch and handle exception thrown by reflections scanner

* Update connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java

Co-Authored-By: Konstantine Karantasis <konstantine@confluent.io>

* Move result initialization back to right before it is used

* Use `java.io.File` in tests

* Fix checkstyle

Co-authored-by: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2020-03-16 14:43:10 -07:00
Manikumar Reddy a0e1407820
KAFKA-9670; Reduce allocations in Metadata Response preparation (#8236)
This PR removes  intermediate  conversions between `MetadataResponse.TopicMetadata` => `MetadataResponseTopic` and `MetadataResponse.PartitionMetadata` => `MetadataResponsePartition` objects.

There is 15-20% reduction in object allocations and 5-10% improvement in metadata request performance.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson<jason@confluent.io>
2020-03-16 09:30:48 -07:00
Chia-Ping Tsai 31659c3ee1
MINOR: fix Scala 2.13 build error introduced in #8083 (#8301)
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Brian Byrne <bbyrne@confluent.io>
2020-03-16 08:56:47 -07:00
A. Sophie Blee-Goldman 045c6c3c48
MINOR: enforce non-negative invariant for checkpointed offsets (#8297)
While discussing KIP-441 we realize we don't strictly enforce that all checkpointed offset sums are positive (or 0, though there's not much point to checkingpoint a 0 offset is there)?

Rather than awkwardly try handle this within every user/reader of the checkpoint file, we should just make a guarantee that all returned checkpointed offsets are positive.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-03-15 09:30:40 -07:00
Dominic Evans ddd3dfbfae
MINOR: comment apikey types in generated switch (#8201)
As a developer, it would be convenient if the generated
{request,response}HeaderVersion case statements in ApiMessageType.java
included a comment to remind me which type each of them is so I don't
need to manually cross-reference the newer/rarer ones.

Also include commented lines for the two special cases around
ApiVersionsResponse and ControllerShutdownRequest which are hardcoded in
the ApiMessageTypeGenerator.java and not covered by the message format
json files.

Before:
```java
    public short requestHeaderVersion(short _version) {
        switch (apiKey) {
            case 0:
                return (short) 1;
            case 1:
                return (short) 1;
            case 2:
                return (short) 1;
            case 3:
                if (_version >= 9) {
                    return (short) 2;
                } else {
                    return (short) 1;
                }
            // ...etc
```

After:
```java
    public short requestHeaderVersion(short _version) {
        switch (apiKey) {
            case 0: // Produce
                return (short) 1;
            case 1: // Fetch
                return (short) 1;
            case 2: // ListOffset
                return (short) 1;
            case 3: // Metadata
                if (_version >= 9) {
                    return (short) 2;
                } else {
                    return (short) 1;
                }
            // ...etc
```

Signed-off-by: Dominic Evans <dominic.evans@uk.ibm.com>

Reviewers: Mickael Maison <mickael.maison@gmail.com>
2020-03-15 10:46:09 +00:00
Mickael Maison 2e6b15813c
MINOR: Fix typo in CreateTopicsResponse.json (#8300)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Boyang Chen <boyang@confluent.io>
2020-03-15 10:24:37 +00:00
Brian Byrne 227a7322b7
KIP-546: Implement describeClientQuotas and alterClientQuotas. (#8083)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2020-03-14 23:03:13 -07:00
Guozhang Wang 605d55dc17
KAFKA-6647: Do note delete the lock file while holding the lock (#8267)
1. Inside StateDirectory#cleanRemovedTasks, skip deleting the lock file (and hence the parent directory) until releasing the lock. And after the lock is released only go ahead and delete the parent directory if manualUserCall == true. That is, this is triggered from KafkaStreams#cleanUp and users are responsible to make sure that Streams instance is not started and hence there are no other threads trying to grab that lock.

2. As a result, during scheduled cleanup the corresponding task.dir would not be empty but be left with only the lock file, so effectively we still achieve the goal of releasing disk spaces. For callers of listTaskDirectories like KIP-441 (cc @ableegoldman to take a look) I've introduced a new listNonEmptyTaskDirectories which excludes such dummy task.dirs with only the lock file left.

3. Also fixed KAFKA-8999 along the way to expose the exception while traversing the directory.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>
2020-03-14 13:49:08 -07:00