Commit Graph

10209 Commits

Author SHA1 Message Date
Ismael Juma 348474e2ae
MINOR: Upgrade to Gradle 7.5 (#12413)
Highlights:
* The default Scala Zinc version was updated from 1.3.5 to 1.6.1
* Multiple Checkstyle tasks may now run in parallel within a project
* Support for Java 18
* Much more responsive continuous builds on Windows and macOS
* Improved diagnostics for dependency resolution

Some of our tests require java.util and java.lang modules to be open,
so do it explicitly given the following Gradle bug fix:

> When running on Java 9+, Gradle no longer opens the java.base/java.util
> and java.base/java.lang JDK modules for all Test tasks. In some cases,
> this would cause code to pass during testing but fail at runtime.

Release notes: https://docs.gradle.org/7.5/release-notes.html

Reviewers:  Manikumar Reddy <manikumar.reddy@gmail.com>, Luke Chen <showuon@gmail.com>
2022-07-26 05:58:50 -07:00
Bruno Cadonna f191e4781e
MINOR: Use builder for mock task in DefaultStateUpdaterTest (#12436)
Reviewer: Guozhang Wang <wangguoz@gmail.com>
2022-07-26 10:12:20 +02:00
Jason Gustafson a450fb70c1
KAFKA-14078; Do leader/epoch validation in Fetch before checking for valid replica (#12411)
After the fix for https://github.com/apache/kafka/pull/12150, if a follower receives a request from another replica, it will return UNKNOWN_LEADER_EPOCH even if the leader epoch matches. We need to do epoch leader/epoch validation first before we check whether we have a valid replica.

Reviewers: David Jacot <djacot@confluent.io>
2022-07-25 13:24:40 -07:00
Christo Lolov 6b76c01cf8
KAFKA-13158: Migrate ConnectClusterStateImpl to Mockito (#12423)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chris Egerton <fearthecellos@gmail.com>
2022-07-25 19:47:08 +02:00
Chris Egerton 71d225d7c2
KAFKA-14093: Use single-worker Connect cluster when testing fenced leader recovery (#12433)
Reviewers: Mickael Maison <mickael.maison@gmail.com>

, Tom Bentley <tbentley@redhat.com>
2022-07-25 15:30:38 +02:00
Luke Chen 679e9e0cee
KAFKA-13919: expose log recovery metrics (#12347)
Implementation for KIP-831.
1. add remainingLogsToRecover metric for the number of remaining logs for each log.dir to be recovered
2.  add remainingSegmentsToRecover metric for the number of remaining segments for the current log assigned to the recovery thread.
3. remove these metrics after log loaded completely
4. add tests 

Reviewers: Jun Rao <jun@confluent.io>, Tom Bentley <tbentley@redhat.com>
2022-07-22 11:00:15 +08:00
Ron Dagostino 06462c7be1
KAFKA-14051: Create metrics reporters in KRaft remote controllers (#12396)
KRaft remote controllers do not yet support dynamic reconfiguration (https://issues.apache.org/jira/browse/KAFKA-14057). Until we implement that, in the meantime we see that the instantiation of the configured metric reporters is actually performed as part of the wiring for dynamic reconfiguration. Since that wiring does not exist yet for KRaft remote controllers, this patch refactors out the instantiation of the metric reporters from the reconfiguration of them and adjusts the controller startup sequence to explicitly instantiate the reporters if the controller is a remote one.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-07-21 16:59:05 -07:00
Bruno Cadonna 5a52601691
KAFKA-10199: Add tasks to state updater when they are created (#12427)
This PR introduces an internal config to enable the state updater. If the state updater is enabled newly created tasks are added to the state updater. Additionally, this PR introduces a builder for mocks for tasks.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-07-21 12:37:17 -07:00
Hao Li 5e4ae06d12
MINOR: fix flaky test test_standby_tasks_rebalance (#12428)
* Description
In this test, when third proc join, sometimes there are other rebalance scenarios such as followup joingroup request happens before syncgroup response was received by one of the proc and the previously assigned tasks for that proc is then lost during new joingroup request. This can result in standby tasks assigned as 3, 1, 2. This PR relax the expected assignment of 2, 2, 2 to a range of [1-3].

* Some backgroud from Guozhang:
I talked to @hao Li offline and also inspected the code a bit, and tl;dr is that I think the code logic is correct (i.e. we do not really have a bug), but we need to relax the test verification a little bit. The general idea behind the subscription info is that:

When a client joins the group, its subscription will try to encode all its current assigned active and standby tasks, which would be used as prev active and standby tasks by the assignor in order to achieve some stickiness.

When a client drops all its active/standby tasks due to errors, it does not actually report all empty from its subscription, instead it tries to check its local state directory (you can see that from TaskManager#getTaskOffsetSums which populates the taskOffsetSum. For active task, its offset would be “-2” a.k.a. LATEST_OFFSET, for standby task, its offset is an actual numerical number.

So in this case, the proc2 which drops all its active and standby tasks, would still report all tasks that have some local state still, and since it was previously owning all six tasks (three as active, and three as standby), it would report all six as standbys, and when that happens the resulted assignment as @hao Li verified, is indeed the un-even one.

So I think the actual “issue“ happens here, is when proc2 is a bit late sending the sync-group request, when the previous rebalance has already completed, and a follow-up rebalance has already triggered, in that case, the resulted un-even assignment is indeed expected. Such a scenario, though not common, is still legitimate since in practice all kinds of timing skewness across instances can happen. So I think we should just relax our verification here, i.e. just making sure that each instance has at least one standby replica at the end, not exactly evenly as “2, 2, 2”.

Reviewers: Suhas Satish <ssatish@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2022-07-21 12:12:29 -07:00
Christo Lolov 569a358a3f
KAFKA-14001: Migrate streams module to JUnit 5 - Part 1 (#12285)
This pull request addresses https://issues.apache.org/jira/browse/KAFKA-14001. It is the first of a series of pull requests which address the move of Kafka Streams tests from JUnit 4 to JUnit 5.

Reviewers: Divij Vaidya <diviv@amazon.com>, Bruno Cadonna <cadonna@apache.org>
2022-07-21 17:27:53 +02:00
James Hughes ff7cbf264c
KAFKA-14076: Fix issues with KafkaStreams.CloseOptions (#12408)
- used static memberId was incorrect
- need to remove all threads/members from the group
- need to use admit client correctly

Add test to verify fixes.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-07-21 07:35:29 -07:00
Guozhang Wang c9b6e19b3b
KAFKA-10199: Cleanup TaskManager and Task interfaces (#12397)
In order to integrate with the state updater, we would need to refactor the TaskManager and Task interfaces. This PR achieved the following purposes:

    Separate active and standby tasks in the Tasks placeholder, plus adding pendingActiveTasks and pendingStandbyTasks into Tasks. The exposed active/standby tasks from the Tasks set would only be mutated by a single thread, and the pending tasks hold for those tasks that are assigned but cannot be actively managed yet. For now they include two scenarios: a) tasks from unknown sub-topologies and hence cannot be initialized, b) tasks that are pending for being recycled from active to standby and vice versa. Note case b) would be added in a follow-up PR.

    Extract any logic that mutates a task out of the Tasks / TaskCreators. Tasks should only be a place for maintaining the set of tasks, but not for manipulations of a task; and TaskCreators should only be used for creating the tasks, but not for anything else. These logic are all migrated into TaskManger.

    While doing 2) I noticed we have a couple of minor issues in the code where we duplicate the closing logics, so I also cleaned them up in the following way:
    a) When closing a task, we first trigger the corresponding closeClean/Dirty function; then we remove the task from Tasks bookkeeping, and for active task we also remove its task producer if EOS-V1 is used.
    b) For closing dirty, we swallow the exception from close call and the remove task producer call; for closing clean, we store the thrown exception from either close call or the remove task producer, and then rethrow at the end of the caller. The difference though is that, for the exception from close call we need to retry close it dirty; for the exception from the remove task producer we do not need to re-close it dirty.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-07-21 15:11:40 +02:00
Mickael Maison df899a2d08
MINOR: Fix broken link to Streams tutorial (#12426)
Also fix Transforming Data Pt. 2 video title

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-07-21 15:09:31 +02:00
Artem Livshits badfbacdd0
KAFKA-14020: Performance regression in Producer (#12365)
As part of KAFKA-10888 work, there were a couple regressions introduced:

A call to time.milliseconds() got moved under the queue lock, moving it back outside the lock. The call may be expensive and cause lock contention. Now the call is moved back outside of the lock.

The reference to ProducerRecord was held in the batch completion callback, so it was kept alive as long as the batch was alive, which may increase the amount of memory in certain scenario and cause excessive GC work. Now the reference is reset early, so the ProducerRecord lifetime isn't bound to the batch lifetime.

Tested via manually crafted benchmark, lock profile shows ~15% lock contention on the ArrayQueue lock without the fix and ~5% lock contention with the fix (which is also consistent with pre-KAFKA-10888 profile).

Alloc profile shows ~10% spent in ProducerBatch.completeFutureAndFireCallbacks without the fix vs. ~0.25% with the fix (which is also consistent with pre-KAFKA-10888 profile).

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
2022-07-20 08:19:31 -07:00
Federico Valeri 74a18bafae
Minor: replace .kafka with .log in implementation documentation (#12401)
replace .kafka with .log in implementation documentation

Reviewers: Luke Chen <showuon@gmail.com>, Liam Clarke-Hutchinson <liam@steelsky.co.nz>
2022-07-20 18:11:51 +08:00
Elkhan Eminov ed77bebcaf
KAFKA-13702: Connect RestClient overrides response status code on request failure (#12320)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chris Egerton <fearthecellos@gmail.com>
2022-07-20 11:29:00 +02:00
Shawn eee40200df
KAFKA-14024: Consumer keeps Commit offset in onJoinPrepare in Cooperative rebalance (#12349)
In KAFKA-13310, we tried to fix a issue that consumer#poll(duration) will be returned after the provided duration. It's because if rebalance needed, we'll try to commit current offset first before rebalance synchronously. And if the offset committing takes too long, the consumer#poll will spend more time than provided duration. To fix that, we change commit sync with commit async before rebalance (i.e. onPrepareJoin).

However, in this ticket, we found the async commit will keep sending a new commit request during each Consumer#poll, because the offset commit never completes in time. The impact is that the existing consumer will be kicked out of the group after rebalance timeout without joining the group. That is, suppose we have consumer A in group G, and now consumer B joined the group, after the rebalance, only consumer B in the group.

Besides, there's also another bug found during fixing this bug. Before KAFKA-13310, we commitOffset sync with rebalanceTimeout, which will retry when retriable error until timeout. After KAFKA-13310, we thought we have retry, but we'll retry after partitions revoking. That is, even though the retried offset commit successfully, it still causes some partitions offsets un-committed, and after rebalance, other consumers will consume overlapping records.

Reviewers: RivenSun <riven.sun@zoom.us>, Luke Chen <showuon@gmail.com>
2022-07-20 10:03:43 +08:00
Walker Carlson b62d8b975c
KAFKA-12699: Override the default handler for stream threads if the stream's handler is used (#12324)
Override the default handler for stream threads if the stream's handler is used. We do no want the java default handler triggering when a thread is replaced.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-07-19 13:35:26 -07:00
Guozhang Wang 693e283802
KAFKA-10199: Add RESUME in state updater (#12387)
* Need to check enforceRestoreActive / transitToUpdateStandby when resuming a paused task.
* Do not expose another getResumedTasks since I think its caller only need the getPausedTasks.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-07-19 09:44:10 -07:00
Walker Carlson 188b2bf280
Revert "KAFKA-12887 Skip some RuntimeExceptions from exception handler (#11228)" (#12421)
This reverts commit 4835c64f

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-07-19 09:17:46 -07:00
Guozhang Wang 309e0f986e
KAFKA-10199: Add PAUSE in state updater (#12386)
* Add pause action to task-updater.
* When removing a task, also check in the paused tasks in addition to removed tasks.
* Also I realized we do not check if tasks with the same id are added, so I add that check in this PR as well.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-07-18 16:42:48 -07:00
Christopher L. Shannon 8142822633
KAFKA-14079 - Ack failed records in WorkerSourceTask when error tolerance is ALL (#12415)
Make sure to ack all records where produce failed, when a connector's `errors.tolerance` config property is set to `all`. Acking is essential so that the task will continue to commit future record offsets properly and remove the records from internal tracking, preventing a memory leak.

(cherry picked and slightly modified from commit 63e06aafd0)

Reviewers: Chris Egerton <fearthecellos@gmail.com>, Randall Hauch <rhauch@gmail.com>
2022-07-18 17:07:20 -05:00
Alex Sorokoumov 4eef28018a
KAFKA-13769 Fix version check in SubscriptionJoinForeignProcessorSupplier (#12420)
This commit changes the version check from != to > as the process method
works correctly on both version 1 and 2. != incorrectly throws on v1
records.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-07-18 14:10:02 -07:00
Levani Kokhreidze edad31811c
MINOR: Fix QueryResult Javadocs (#12404)
Fixes the QueryResult javadocs.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-07-18 13:39:34 +02:00
Okada Haruki ab9aaea3ce
KAFKA-13572 Fix negative preferred replica imbalanced count metric (#12405)
Currently, preferredReplicaImbalanceCount calculation has a race that becomes negative when topic deletion is initiated simultaneously. This PR addresses the problem by fixing cleanPreferredReplicaImbalanceMetric to be called only once per topic-deletion procedure

Reviewers: Luke Chen <showuon@gmail.com>
2022-07-18 14:19:01 +08:00
David Arthur c020c94e04
KAFKA-14039 Fix AlterConfigPolicy usage in KRaft (#12374)
Only pass configs from the request to the AlterConfigPolicy. This changes the KRaft usage of the AlterConfigPolicy to match the usage in ZK mode.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-07-15 15:48:35 -04:00
Rajini Sivaram ddbc030036
MINOR: Fix options for old-style Admin.listConsumerGroupOffsets (#12406)
Reviewers: David Jacot <djacot@confluent.io>
2022-07-15 09:21:35 +01:00
Sanjana Kaundinya beac86f049
KAFKA-13043: Implement Admin APIs for offsetFetch batching (#10964)
This implements the AdminAPI portion of KIP-709: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=173084258. The request/response protocol changes were implemented in 3.0.0. A new batched API has been introduced to list consumer offsets for different groups. For brokers older than 3.0.0, separate requests are sent for each group.

Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com>
Co-authored-by: David Jacot <djacot@confluent.io>

Reviewers: David Jacot <djacot@confluent.io>,  Rajini Sivaram <rajinisivaram@googlemail.com>
2022-07-14 13:47:34 +01:00
Christo Lolov 94d4fdeb28
KAFKA-14008: Add docs for Streams throughput metrics introduced in KIP-846 (#12377)
Reviewers: Walker Carlson <wcarlson@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2022-07-13 17:47:34 -07:00
Alyssa Huang 8e9869a777
MINOR: Run MessageFormatChangeTest in ZK mode only (#12395)
KRaft mode will not support writing messages with an older message format (2.8) since the min supported IBP is 3.0 for KRaft. Testing support for reading older message formats will be covered by https://issues.apache.org/jira/browse/KAFKA-14056.

Reviewers: David Jacot <djacot@confluent.io>
2022-07-13 08:46:04 +02:00
Hao Li b5d4fa7645
KAFKA-13785: [10/N][emit final] more unit test for session store and disable cache for emit final sliding window (#12370)
1. Added more unit test for RocksDBTimeOrderedSessionStore and RocksDBTimeOrderedSessionSegmentedBytesStore
2. Disable cache for sliding window if emit strategy is ON_WINDOW_CLOSE

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2022-07-12 10:57:11 -07:00
dengziming 98726c2bac
KAFKA-13968: Fix 3 major bugs of KRaft snapshot generating (#12265)
There are 3 bugs when a broker generates a snapshot.

1. Broker should not generate snapshots until it starts publishing.
    Before a broker starts publishing, BrokerMetadataListener._publisher=None, so _publisher.foreach(publish) will do nothing, so featuresDelta.metadataVersionChange().isPresent is always true, so we will generating a snapshot on every commit since we believe metadata version has changed, here are the logs, note offset 1 is a LeaderChangeMessage so there is no snapshot:

[2022-06-08 13:07:43,010] INFO [BrokerMetadataSnapshotter id=0] Creating a new snapshot at offset 0... (kafka.server.metadata.BrokerMetadataSnapshotter:66)
[2022-06-08 13:07:43,222] INFO [BrokerMetadataSnapshotter id=0] Creating a new snapshot at offset 2... (kafka.server.metadata.BrokerMetadataSnapshotter:66)
[2022-06-08 13:07:43,727] INFO [BrokerMetadataSnapshotter id=0] Creating a new snapshot at offset 3... (kafka.server.metadata.BrokerMetadataSnapshotter:66)
[2022-06-08 13:07:44,228] INFO [BrokerMetadataSnapshotter id=0] Creating a new snapshot at offset 4... (kafka.server.metadata.BrokerMetadataSnapshotter:66)

2. We should compute metadataVersionChanged before _publisher.foreach(publish)
    After _publisher.foreach(publish) the BrokerMetadataListener_delta is always Empty, so metadataVersionChanged is always false, this means we will never trigger snapshot generating even metadata version has changed.

3. We should try to generate a snapshot when starting publishing
    When we started publishing, there may be a metadata version change, so we should try to generate a snapshot before first publishing.

Reviewers: Jason Gustafson <jason@confluent.io>, Divij Vaidya <diviv@amazon.com>, José Armando García Sancio <jsancio@users.noreply.github.com>
2022-07-12 08:02:43 -07:00
RivenSun 7ec759d67c
MINOR: Mention switch to reload4j in Notable changes in 3.1.1 (#12313)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Kvicii
2022-07-12 16:51:18 +02:00
Kirk True d3130f2e91
KAFKA-14062: OAuth client token refresh fails with SASL extensions (#12398)
- Different objects should be considered unique even with same content to support logout
- Added comments for SaslExtension re: removal of equals and hashCode
- Also swapped out the use of mocks in exchange for *real* SaslExtensions so that we exercise the use of default equals() and hashCode() methods.
- Updates to implement equals and hashCode and add tests in SaslExtensionsTest to confirm

Co-authored-by: Purshotam Chauhan <pchauhan@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2022-07-12 14:28:19 +05:30
Eugene Tolbakov a3f06d8814
KAFKA-14013: Limit the length of the `reason` field sent on the wire (#12388)
KIP-800 added the `reason` field to the JoinGroupRequest and the LeaveGroupRequest as I mean to provide more information to the group coordinator. In https://issues.apache.org/jira/browse/KAFKA-13998, we discovered that the size of the field is limited to 32767 chars by our serialisation mechanism. At the moment, the field either provided directly by the user or constructed internally is directly set regardless of its length.

This patch sends only the first 255 chars of the used provided or internally generated reason on the wire. Given the purpose of this field, that seems acceptable and that should still provide enough information to operators to understand the cause of a rebalance.

Reviewers: David Jacot <djacot@confluent.io>
2022-07-12 09:31:16 +02:00
SC 23c92ce793
MINOR: Use String#format for niceMemoryUnits result (#12389)
Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>
2022-07-11 10:36:56 +08:00
Jason Gustafson 0bc8da7aec
KAFKA-14055; Txn markers should not be removed by matching records in the offset map (#12390)
When cleaning a topic with transactional data, if the keys used in the user data happen to conflict with the keys in the transaction markers, it is possible for the markers to get removed before the corresponding data from the transaction is removed. This results in a hanging transaction or the loss of the transaction's atomicity since it would effectively get bundled into the next transaction in the log. Currently control records are excluded when building the offset map, but not when doing the cleaning. This patch fixes the problem by checking for control batches in the `shouldRetainRecord` callback.

Reviewers: Jun Rao <junrao@gmail.com>
2022-07-10 10:16:39 -07:00
Divij Vaidya fc6e91e199
KAFKA-13474: Allow reconfiguration of SSL certs for broker to controller connection (#12381)
What:
When a certificate is rotated on a broker via dynamic configuration and the previous certificate expires, the broker to controller connection starts failing with SSL Handshake failed.

Why:
A similar fix was earlier performed in #6721 but when BrokerToControllerChannelManager was introduced in v2.7, we didn't enable dynamic reconfiguration for it's channel.

Summary of testing strategy (including rationale)
Add a test which fails prior to the fix done in the PR and succeeds afterwards. The bug wasn't caught earlier because there was no test coverage to validate the scenario.

Reviewers: Luke Chen <showuon@gmail.com>
2022-07-09 18:06:02 +08:00
Tomonari Yamashita e85500bbbe
KAFKA-13996: log.cleaner.io.max.bytes.per.second can be changed dynamically (#12296)
log.cleaner.io.max.bytes.per.second cannot be changed dynamically using bin/kafka-configs.sh. Call updateDesiredRatePerSec() of Throttler with new log.cleaner.io.max.bytes.per.second value in reconfigure() of Log Cleaner to fix the issue.

Reviewers: Tom Bentley <tbentley@redhat.com>, Luke Chen <showuon@gmail.com>
2022-07-08 20:41:47 +08:00
Aman Singh dc6f555492
KAFKA-13983: Fail the creation with "/" in resource name in zk ACL (#12359)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2022-07-08 15:47:48 +05:30
Marco Aurelio Lotz 63a6130af3
KAFKA-12943: update aggregating documentation (#12091)
Reviewers: Luke Chen <showuon@gmail.com>, Andrew Eugene Choi <andrew.choi@uwaterloo.ca>, Matthias J. Sax <matthias@confluent.io>
2022-07-07 14:00:05 -07:00
vamossagar12 5a1bac2608
KAFKA-13846: Follow up PR to address review comments (#12297)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-07-07 11:43:38 -07:00
Matthias J. Sax 38b08dfd33
MINOR: revert KIP-770 (#12383)
KIP-770 introduced a performance regression and needs some re-design.

Needed to resolve some conflict while reverting.

This reverts commits 1317f3f77a and 0924fd3f9f.

Reviewers:  Sagar Rao <sagarmeansocean@gmail.com>, Guozhang Wang <guozhang@confluent.io>
2022-07-07 11:19:37 -07:00
Lucas Bradstreet a521bbd755
MINOR: kafka system tests should support larger EBS volumes for newer instances (#12382)
When running with 4th generation instances supporting EBS only, we need
to use a larger volume or else we run out of  disk space during a system
test run.

This change also parameterizes the instance type as an env variable for
easier testing.

Reviewers: David Jacot <djacot@confluent.io>
2022-07-07 09:14:05 +02:00
Guozhang Wang ca8135b242 HOTFIX: KIP-851, rename requireStable in ListConsumerGroupOffsetsOptions 2022-07-06 22:00:31 -07:00
Guozhang Wang 915c781243
KAFKA-10199: Remove main consumer from store changelog reader (#12337)
When store changelog reader is called by a different thread than the stream thread, it can no longer use the main consumer to get committed offsets since consumer is not thread-safe. Instead, we would remove main consumer and leverage on the existing admin client to get committed offsets.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2022-07-06 17:23:18 -07:00
YU 6495a0768c
KAFKA-14032; Dequeue time for forwarded requests is unset (#12360)
When building a forwarded request, we need to override the dequeue time of the underlying request to match the same value as the envelope. Otherwise, the field is left unset, which causes inaccurate reporting.

Reviewers; Jason Gustafson <jason@confluent.io>
2022-07-06 13:21:28 -07:00
Bruno Cadonna 00f395bb88
KAFKA-10199: Remove call to Task#completeRestoration from state updater (#12379)
The call to Task#completeRestoration calls methods on the main consumer.
The state updater thread should not access the main consumer since the
main consumer is not thread-safe. Additionally, Task#completeRestoration
changed the state of active tasks, but we decided to keep task life cycle
management outside of the state updater.

Task#completeRestoration should be called by the stream thread on
restored active tasks returned by the state udpater.

Reviewer: Guozhang Wang <guozhang@apache.org>
2022-07-06 12:36:15 +02:00
Divij Vaidya 5e4c8f704c
KAFKA-13943; Make `LocalLogManager` implementation consistent with the `RaftClient` contract (#12224)
Fixes two issues in the implementation of `LocalLogManager`:

- As per the interface contract for `RaftClient.scheduleAtomicAppend()`, it should throw a `NotLeaderException` exception when the provided current leader epoch does not match the current epoch. However, the current `LocalLogManager`'s implementation of the API returns a LONG_MAX instead of throwing an exception. This change fixes the behaviour and makes it consistent with the interface contract.
-  As per the interface contract for `RaftClient.resign(epoch)`if the parameter epoch does not match the current epoch, this call will be ignored. But in the current `LocalLogManager` implementation the leader epoch might change when the thread is waiting to acquire a lock on `shared.tryAppend()` (note that tryAppend() is a synchronized method). In such a case, if a NotALeaderException is thrown (as per code change in above), then resign should be ignored.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Tom Bentley <tbentley@redhat.com>, Jason Gustafson <jason@confluent.io>
2022-07-05 20:08:28 -07:00
Chris Egerton 3ae1afa438
KAFKA-10000: Integration tests (#11782)
Implements embedded end-to-end integration tests for KIP-618, and brings together previously-decoupled logic from upstream PRs.

Reviewers: Luke Chen <showuon@gmail.com>, Tom Bentley <tbentley@redhat.com>, Mickael Maison <mickael.maison@gmail.com>
2022-07-06 10:35:05 +08:00