Commit Graph

10332 Commits

Author SHA1 Message Date
Sanjana Kaundinya beac86f049
KAFKA-13043: Implement Admin APIs for offsetFetch batching (#10964)
This implements the AdminAPI portion of KIP-709: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=173084258. The request/response protocol changes were implemented in 3.0.0. A new batched API has been introduced to list consumer offsets for different groups. For brokers older than 3.0.0, separate requests are sent for each group.

Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com>
Co-authored-by: David Jacot <djacot@confluent.io>

Reviewers: David Jacot <djacot@confluent.io>,  Rajini Sivaram <rajinisivaram@googlemail.com>
2022-07-14 13:47:34 +01:00
Christo Lolov 94d4fdeb28
KAFKA-14008: Add docs for Streams throughput metrics introduced in KIP-846 (#12377)
Reviewers: Walker Carlson <wcarlson@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2022-07-13 17:47:34 -07:00
Alyssa Huang 8e9869a777
MINOR: Run MessageFormatChangeTest in ZK mode only (#12395)
KRaft mode will not support writing messages with an older message format (2.8) since the min supported IBP is 3.0 for KRaft. Testing support for reading older message formats will be covered by https://issues.apache.org/jira/browse/KAFKA-14056.

Reviewers: David Jacot <djacot@confluent.io>
2022-07-13 08:46:04 +02:00
Hao Li b5d4fa7645
KAFKA-13785: [10/N][emit final] more unit test for session store and disable cache for emit final sliding window (#12370)
1. Added more unit test for RocksDBTimeOrderedSessionStore and RocksDBTimeOrderedSessionSegmentedBytesStore
2. Disable cache for sliding window if emit strategy is ON_WINDOW_CLOSE

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2022-07-12 10:57:11 -07:00
dengziming 98726c2bac
KAFKA-13968: Fix 3 major bugs of KRaft snapshot generating (#12265)
There are 3 bugs when a broker generates a snapshot.

1. Broker should not generate snapshots until it starts publishing.
    Before a broker starts publishing, BrokerMetadataListener._publisher=None, so _publisher.foreach(publish) will do nothing, so featuresDelta.metadataVersionChange().isPresent is always true, so we will generating a snapshot on every commit since we believe metadata version has changed, here are the logs, note offset 1 is a LeaderChangeMessage so there is no snapshot:

[2022-06-08 13:07:43,010] INFO [BrokerMetadataSnapshotter id=0] Creating a new snapshot at offset 0... (kafka.server.metadata.BrokerMetadataSnapshotter:66)
[2022-06-08 13:07:43,222] INFO [BrokerMetadataSnapshotter id=0] Creating a new snapshot at offset 2... (kafka.server.metadata.BrokerMetadataSnapshotter:66)
[2022-06-08 13:07:43,727] INFO [BrokerMetadataSnapshotter id=0] Creating a new snapshot at offset 3... (kafka.server.metadata.BrokerMetadataSnapshotter:66)
[2022-06-08 13:07:44,228] INFO [BrokerMetadataSnapshotter id=0] Creating a new snapshot at offset 4... (kafka.server.metadata.BrokerMetadataSnapshotter:66)

2. We should compute metadataVersionChanged before _publisher.foreach(publish)
    After _publisher.foreach(publish) the BrokerMetadataListener_delta is always Empty, so metadataVersionChanged is always false, this means we will never trigger snapshot generating even metadata version has changed.

3. We should try to generate a snapshot when starting publishing
    When we started publishing, there may be a metadata version change, so we should try to generate a snapshot before first publishing.

Reviewers: Jason Gustafson <jason@confluent.io>, Divij Vaidya <diviv@amazon.com>, José Armando García Sancio <jsancio@users.noreply.github.com>
2022-07-12 08:02:43 -07:00
RivenSun 7ec759d67c
MINOR: Mention switch to reload4j in Notable changes in 3.1.1 (#12313)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Kvicii
2022-07-12 16:51:18 +02:00
Kirk True d3130f2e91
KAFKA-14062: OAuth client token refresh fails with SASL extensions (#12398)
- Different objects should be considered unique even with same content to support logout
- Added comments for SaslExtension re: removal of equals and hashCode
- Also swapped out the use of mocks in exchange for *real* SaslExtensions so that we exercise the use of default equals() and hashCode() methods.
- Updates to implement equals and hashCode and add tests in SaslExtensionsTest to confirm

Co-authored-by: Purshotam Chauhan <pchauhan@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2022-07-12 14:28:19 +05:30
Eugene Tolbakov a3f06d8814
KAFKA-14013: Limit the length of the `reason` field sent on the wire (#12388)
KIP-800 added the `reason` field to the JoinGroupRequest and the LeaveGroupRequest as I mean to provide more information to the group coordinator. In https://issues.apache.org/jira/browse/KAFKA-13998, we discovered that the size of the field is limited to 32767 chars by our serialisation mechanism. At the moment, the field either provided directly by the user or constructed internally is directly set regardless of its length.

This patch sends only the first 255 chars of the used provided or internally generated reason on the wire. Given the purpose of this field, that seems acceptable and that should still provide enough information to operators to understand the cause of a rebalance.

Reviewers: David Jacot <djacot@confluent.io>
2022-07-12 09:31:16 +02:00
SC 23c92ce793
MINOR: Use String#format for niceMemoryUnits result (#12389)
Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>
2022-07-11 10:36:56 +08:00
Jason Gustafson 0bc8da7aec
KAFKA-14055; Txn markers should not be removed by matching records in the offset map (#12390)
When cleaning a topic with transactional data, if the keys used in the user data happen to conflict with the keys in the transaction markers, it is possible for the markers to get removed before the corresponding data from the transaction is removed. This results in a hanging transaction or the loss of the transaction's atomicity since it would effectively get bundled into the next transaction in the log. Currently control records are excluded when building the offset map, but not when doing the cleaning. This patch fixes the problem by checking for control batches in the `shouldRetainRecord` callback.

Reviewers: Jun Rao <junrao@gmail.com>
2022-07-10 10:16:39 -07:00
Divij Vaidya fc6e91e199
KAFKA-13474: Allow reconfiguration of SSL certs for broker to controller connection (#12381)
What:
When a certificate is rotated on a broker via dynamic configuration and the previous certificate expires, the broker to controller connection starts failing with SSL Handshake failed.

Why:
A similar fix was earlier performed in #6721 but when BrokerToControllerChannelManager was introduced in v2.7, we didn't enable dynamic reconfiguration for it's channel.

Summary of testing strategy (including rationale)
Add a test which fails prior to the fix done in the PR and succeeds afterwards. The bug wasn't caught earlier because there was no test coverage to validate the scenario.

Reviewers: Luke Chen <showuon@gmail.com>
2022-07-09 18:06:02 +08:00
Tomonari Yamashita e85500bbbe
KAFKA-13996: log.cleaner.io.max.bytes.per.second can be changed dynamically (#12296)
log.cleaner.io.max.bytes.per.second cannot be changed dynamically using bin/kafka-configs.sh. Call updateDesiredRatePerSec() of Throttler with new log.cleaner.io.max.bytes.per.second value in reconfigure() of Log Cleaner to fix the issue.

Reviewers: Tom Bentley <tbentley@redhat.com>, Luke Chen <showuon@gmail.com>
2022-07-08 20:41:47 +08:00
Aman Singh dc6f555492
KAFKA-13983: Fail the creation with "/" in resource name in zk ACL (#12359)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2022-07-08 15:47:48 +05:30
Marco Aurelio Lotz 63a6130af3
KAFKA-12943: update aggregating documentation (#12091)
Reviewers: Luke Chen <showuon@gmail.com>, Andrew Eugene Choi <andrew.choi@uwaterloo.ca>, Matthias J. Sax <matthias@confluent.io>
2022-07-07 14:00:05 -07:00
vamossagar12 5a1bac2608
KAFKA-13846: Follow up PR to address review comments (#12297)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-07-07 11:43:38 -07:00
Matthias J. Sax 38b08dfd33
MINOR: revert KIP-770 (#12383)
KIP-770 introduced a performance regression and needs some re-design.

Needed to resolve some conflict while reverting.

This reverts commits 1317f3f77a and 0924fd3f9f.

Reviewers:  Sagar Rao <sagarmeansocean@gmail.com>, Guozhang Wang <guozhang@confluent.io>
2022-07-07 11:19:37 -07:00
Lucas Bradstreet a521bbd755
MINOR: kafka system tests should support larger EBS volumes for newer instances (#12382)
When running with 4th generation instances supporting EBS only, we need
to use a larger volume or else we run out of  disk space during a system
test run.

This change also parameterizes the instance type as an env variable for
easier testing.

Reviewers: David Jacot <djacot@confluent.io>
2022-07-07 09:14:05 +02:00
Guozhang Wang ca8135b242 HOTFIX: KIP-851, rename requireStable in ListConsumerGroupOffsetsOptions 2022-07-06 22:00:31 -07:00
Guozhang Wang 915c781243
KAFKA-10199: Remove main consumer from store changelog reader (#12337)
When store changelog reader is called by a different thread than the stream thread, it can no longer use the main consumer to get committed offsets since consumer is not thread-safe. Instead, we would remove main consumer and leverage on the existing admin client to get committed offsets.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-07-06 17:23:18 -07:00
YU 6495a0768c
KAFKA-14032; Dequeue time for forwarded requests is unset (#12360)
When building a forwarded request, we need to override the dequeue time of the underlying request to match the same value as the envelope. Otherwise, the field is left unset, which causes inaccurate reporting.

Reviewers; Jason Gustafson <jason@confluent.io>
2022-07-06 13:21:28 -07:00
Bruno Cadonna 00f395bb88
KAFKA-10199: Remove call to Task#completeRestoration from state updater (#12379)
The call to Task#completeRestoration calls methods on the main consumer.
The state updater thread should not access the main consumer since the
main consumer is not thread-safe. Additionally, Task#completeRestoration
changed the state of active tasks, but we decided to keep task life cycle
management outside of the state updater.

Task#completeRestoration should be called by the stream thread on
restored active tasks returned by the state udpater.

Reviewer: Guozhang Wang <guozhang@apache.org>
2022-07-06 12:36:15 +02:00
Divij Vaidya 5e4c8f704c
KAFKA-13943; Make `LocalLogManager` implementation consistent with the `RaftClient` contract (#12224)
Fixes two issues in the implementation of `LocalLogManager`:

- As per the interface contract for `RaftClient.scheduleAtomicAppend()`, it should throw a `NotLeaderException` exception when the provided current leader epoch does not match the current epoch. However, the current `LocalLogManager`'s implementation of the API returns a LONG_MAX instead of throwing an exception. This change fixes the behaviour and makes it consistent with the interface contract.
-  As per the interface contract for `RaftClient.resign(epoch)`if the parameter epoch does not match the current epoch, this call will be ignored. But in the current `LocalLogManager` implementation the leader epoch might change when the thread is waiting to acquire a lock on `shared.tryAppend()` (note that tryAppend() is a synchronized method). In such a case, if a NotALeaderException is thrown (as per code change in above), then resign should be ignored.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Tom Bentley <tbentley@redhat.com>, Jason Gustafson <jason@confluent.io>
2022-07-05 20:08:28 -07:00
Chris Egerton 3ae1afa438
KAFKA-10000: Integration tests (#11782)
Implements embedded end-to-end integration tests for KIP-618, and brings together previously-decoupled logic from upstream PRs.

Reviewers: Luke Chen <showuon@gmail.com>, Tom Bentley <tbentley@redhat.com>, Mickael Maison <mickael.maison@gmail.com>
2022-07-06 10:35:05 +08:00
dengziming 448441a35d
KAFKA-13228; Ensure ApiVersionRequest is properly handled KRaft co-resident mode (#11784)
When brokers are co-resident with controllers using kraft, we incorrectly determine the supported API versions on the controller using `NodeApiVersions.create()`. The patch fixes the problem by using the versions from the sent `ApiVersions` request even when connecting to the local node.

The patch also improves integration tests by adding support for co-resident mode.

Reviewers: Justine Olshan <jolshan@confluent.io>, Jason Gustafson <jason@confluent.io>
2022-07-05 15:19:00 -07:00
Viktor Somogyi-Vass 277c4c2e97
KAFKA-6945: Add docs about KIP-373 (#12346)
Reviewers: Manikumar Reddy
2022-07-05 17:29:31 +05:30
Chris Egerton ec22af94a6
KAFKA-13613: Remove hard dependency on HmacSHA256 algorithm for Connect (#11894)
Reviewers: Mickael Maison <mickael.maison@gmail.com>

, Tom Bentley <tbentley@redhat.com>
2022-07-05 12:34:23 +02:00
Thomas Cooper aa735062eb
Upgrade Netty and Jackson versions for CVE fixes [KAFKA-14044] (#12376)
Reviewers: Luke Chen <showuon@gmail.com>
2022-07-05 14:16:18 +08:00
Matthew de Detrich 4e6326f889
KAFKA-13957: Fix flaky shouldQuerySpecificActivePartitionStores test (#12289)
Currently the tests fail because there is a missing predicate in the retrievableException which causes the test to fail, i.e. the current predicates

containsString("Cannot get state store source-table because the stream thread is PARTITIONS_ASSIGNED, not RUNNING"),
containsString("The state store, source-table, may have migrated to another instance"),
containsString("Cannot get state store source-table because the stream thread is STARTING, not RUNNING")

wasn't complete. Another one needed to be added, namely "The specified partition 1 for store source-table does not exist.". This is because its possible for

assertThat(getStore(kafkaStreams2, storeQueryParam2).get(key), is(nullValue()));

or

assertThat(getStore(kafkaStreams1, storeQueryParam2).get(key), is(nullValue()));

(depending on which branch) to be thrown, i.e. see

org.apache.kafka.streams.errors.InvalidStateStorePartitionException: The specified partition 1 for store source-table does not exist.

	at org.apache.kafka.streams.state.internals.WrappingStoreProvider.stores(WrappingStoreProvider.java:63)
	at org.apache.kafka.streams.state.internals.CompositeReadOnlyKeyValueStore.get(CompositeReadOnlyKeyValueStore.java:53)
	at org.apache.kafka.streams.integration.StoreQueryIntegrationTest.lambda$shouldQuerySpecificActivePartitionStores$5(StoreQueryIntegrationTest.java:223)
	at org.apache.kafka.streams.integration.StoreQueryIntegrationTest.retryUntil(StoreQueryIntegrationTest.java:579)
	at org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificActivePartitionStores(StoreQueryIntegrationTest.java:186)

This happens when the stream hasn't been initialized yet. I have run the test around 12k times using Intellij's JUnit testing framework without any flaky failures. The PR also does some minor refactoring regarding moving the list of predicates into their own functions.

Co-authored-by: Bruno Cadonna <cadonna@apache.org>

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-07-04 20:26:53 +02:00
Guozhang Wang ae570f5953
HOTFIX: Correct ordering of input buffer and enforced processing sensors (#12363)
1. As titled, fix the right constructor param ordering.
2. Also added a few more loglines.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Sagar Rao <sagarmeansocean@gmail.com>, Hao Li <1127478+lihaosky@users.noreply.github.com>
2022-07-03 10:02:59 -07:00
Bruno Cadonna a82a8e02ce
MINOR: Fix static mock usage in TaskMetricsTest (#12373)
Before this PR the calls to the static methods on StreamsMetricsImpl were just calls and not a verification on the mock. This miss happened during the switch from EasyMock to Mockito.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-07-02 18:48:07 -07:00
RivenSun bad475166f
MINOR: Add indent space after hyperlink in `docs/upgrade.html` (#12353)
Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, Divij Vaidya <divijvaidya13@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-07-01 10:30:32 -07:00
Nikolay c12348ac98
MINOR: Remove unused code from BrokerEndPoint (#12368)
Removes unused methods from `BrokerEndPoint`:

* `createBrokerEndPoint(Int, String)`
* `readFrom(buffer: ByteBuffer)`
* `connectionString(): String`
* `writeTo(buffer: ByteBuffer)`
* `sizeInBytes: Int`

Reviewers: dengziming <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-07-01 10:28:37 -07:00
Jason Gustafson 6bf5bfc298
KAFKA-14036; Set local time in `ControllerApis` when `handle` returns (#12372)
In `ControllerApis`, we are missing the logic to set the local processing end time after `handle` returns. As a consequence of this, the remote time ends up reported as the local time in the request level metrics. The patch adds the same logic we have in `KafkaApis` to set `apiLocalCompleteTimeNanos`.

Reviewers: José Armando García Sancio <jsancio@gmail.com>
2022-06-30 21:07:21 -07:00
Niket c19398ee66
KAFKA-14035; Fix NPE in `SnapshottableHashTable::mergeFrom()` (#12371)
The NPE causes the kraft controller to be in an inconsistent state. 

Reviewers: Jason Gustafson <jason@confluent.io>
2022-06-30 21:03:54 -07:00
Prashanth Joseph Babu 1daa149730
MINOR: record lag max metric documentation enhancement (#12367)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-06-30 09:30:14 -07:00
Guozhang Wang 3faa6cf6d0
MINOR: Use mock time in DefaultStateUpdaterTest (#12344)
For most tests we would need an auto-ticking mock timer to work with draining-with-timeout functions.
For tests that check for never checkpoint we need no auto-ticking timer to control exactly how much time elapsed.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-06-29 12:33:00 -07:00
Guozhang Wang ababc4261b
[9/N][Emit final] Emit final for session window aggregations (#12204)
* Add a new API for session windows to range query session window by end time (KIP related).
* Augment session window aggregator with emit strategy.
* Minor: consolidated some dup classes.
* Test: unit test on session window aggregator.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-06-29 09:22:37 -07:00
Luke Chen 08e509459b
KAFKA-14010: AlterPartition request won't retry when receiving retriable error (#12329)
When submitting the AlterIsr request, we register a future listener to handle the response. When receiving retriable error, we expected the AlterIsr request will get retried. And then, we'll re-submit the request again.

However, before the future listener got called, we didn't clear the `unsentIsrUpdates`, which causes we failed to "enqueue" the request because we thought there's an in-flight request. We use "try/finally" to make sure the unsentIsrUpdates got cleared, but it happened "after" we retry the request	

Reviewers: David Jacot <djacot@confluent.io>, dengziming <dengziming1993@gmail.com>
2022-06-29 20:53:36 +08:00
CHUN-HAO TANG 6ac7f4ea8f
KAFKA-13821: Update Kafka Streams WordCount demo to new Processor API (#12139)
https://issues.apache.org/jira/browse/KAFKA-13821

Reviewers: Jorge Esteban Quilcate Otoya <quilcate.jorge@gmail.com>, Bill Bejeck <bbejeck@apache.org>
2022-06-28 21:39:32 -04:00
Tom Kaszuba 025e47b833
KAFKA-13963: Clarified TopologyDescription JavaDoc for Processors API forward() calls (#12293)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-06-27 18:50:34 -07:00
Jason Gustafson d654bc1b15
MINOR: Support KRaft in GroupAuthorizerIntegrationTest (#12336)
Support KRaft in `GroupAuthorizerIntegrationTest`. 

Reviewers: David Arthur <mumrah@gmail.com>
2022-06-27 16:01:15 -07:00
Alyssa Huang a3c7017ff7
MINOR: Support KRaft in ReplicaFetchTest (#12345)
Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
2022-06-25 10:25:55 -07:00
Bruno Cadonna 1ceaf30039
KAFKA-10199: Expose tasks in state updater (#12312)
This PR exposes the tasks managed by the state updater. The state updater manages all tasks that were added to the state updater and that have not yet been removed from it by draining one of the output queues.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-06-24 09:33:24 -07:00
Bruno Cadonna 08e27914cc
HOTFIX: Fix NPE in StreamTask#shouldCheckpointState (#12341)
The mocks were not setup correctly in StreamTask#shouldCheckpointState
which caused a null pointer exception during test execution.
2022-06-24 12:19:22 +02:00
David Jacot cf9c3a2eca
MINOR: Fix group coordinator is unavailable log (#12335)
Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, Luke Chen <showuon@gmail.com>
2022-06-24 11:48:09 +02:00
Guozhang Wang 925c628173
KAFKA-10199: Commit the restoration progress within StateUpdater (#12279)
During restoring, we should always commit a.k.a. write checkpoint file regardless of EOS or ALOS, since if there's a failure we would just over-restore them upon recovery so no EOS violations happened.

Also when we complete restore or remove task, we should enforce a checkpoint as well; for failing cases though, we should not write a new one.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-06-23 10:46:14 -07:00
David Arthur c6c9da02a8
KAFKA-13966 Prepend bootstrap metadata to controller queue (#12269)
Also fixes flaky QuorumControllerTest#testInvalidBootstrapMetadata

Reviewers: Jason Gustafson <jason@confluent.io>
2022-06-23 11:29:21 -04:00
Chris Egerton d00b7875e0
KAFKA-13987: Isolate REST request timeout changes in Connect integration tests (#12291)
This causes the artificial reductions in the Connect REST request timeout to be more isolated. Specifically, they now only take place in the tests that need them (instead of any tests that happen to be running after the reduction has taken place and before it has been reset), and they are only performed for the requests that are expected to time out, before being immediately reset. This should help reduce spurious test failures (especially in slow environments like Jenkins) for all Connect integration tests that interact with the REST API, not just the BlockingConnectorTest test suite.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-06-23 16:56:53 +02:00
Bruno Cadonna 8026a0edd8
MINOR: Fix static mock usage in NamedCacheMetricsTest (#12322)
Before this PR the call to `StreamsMetricsImpl.addAvgAndMinAndMaxToSensor()`
was just a call and not a verification on the mock. This miss happened
during the switch from EasyMock to Mockito.

Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2022-06-23 09:45:46 +02:00
Bruno Cadonna 269277f73b
MINOR: Fix static mock usage in ProcessorNodeMetricsTest (#12323)
Before this PR the calls to StreamsMetricsImpl.addInvocationRateAndCountToSensor()
were just calls and not a verification on the mock. This miss happened
during the switch from EasyMock to Mockito.

Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2022-06-23 09:45:22 +02:00