When we hit an exception when processing tasks we should save the work we have done so far.
This will only be relevant with ALOS and EOS-v1, not EOS-v2. It will actually reduce the number of duplicated record in ALOS because we will not be successfully processing tasks successfully more than once in many cases.
This is currently enabled only for named topologies.
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang@confluent.io>
* MINOR: Provide valid examples in README page.
- `testMetadataUpdateWaitTime` method is removed from MetadataTest class.
- Removed the travis CI documentation.
Reviewers: Luke Chen <showuon@gmail.com>
Tests are swallowing exceptions for supported operating systems, which could hide regressions.
Co-authored-by: Rob Leland <rleland@apache.org>
Reviewer: Bruno Cadonna <cadonna@apache.org>
Basic refactoring with no logical changes to lay the groundwork & facilitate reviews for error handling work.
This PR just moves all methods that go beyond the management of tasks into a new TaskExecutor class, such as processing, committing, and punctuating. This breaks up the ever-growing TaskManager class so it can focus on the tracking and updating of the tasks themselves, while the TaskExecutor can focus on the actual processing. In addition to cleaning up this code this should make it easier to test this part of the code.
Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
Lower the log level of a message in `WorkerSourceTask` which indicates that no messages have been produced by the task since it is spammy and causing users confusion.
Reviewers: Jason Gustafson <jason@confluent.io>
For debugging it is useful to see the actual state directory when
an exception regarding the state directory is thrown.
Reviewer: Bill Bejeck <bbejeck@apache.org>
The test case `MetricTest.testRemoveInactiveMetrics` attempts to test removal of inactive sensors, but one of the assertions is checking the wrong sensor name ("test.s1" instead of "test.s2"). The patch fixes the assertion to use the right sensor name.
Reviewers: Jason Gustafson <jason@confluent.io>
Co-authored-by: zhonghou3 <zhonghou3@jd.com>
There are a few integration tests for the forwarding logic which were added prior to kraft being ready for integration testing. Now that we have enabled kraft in integration tests, these tests are redundant and can be removed.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
In `KafkaServer, `ZkConfigRepository` is just a wrapper of `zkClient`, so we don't need to create a new one.
Reviewers: Jason Gustafson <jason@confluent.io>
Previously we were only verifying the new query could be added after we had already inserted it into the TopologyMetadata, so we need to move the validation upfront.
Also adds a test case for this and improves handling of NPE in case of future or undiscovered bugs.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This patch enables `ApiVersionsTest` to test both kraft brokers and controllers. It fixes a minor bug in which the `Envelope` request to be exposed from `ApiVersions` requests to the kraft broker.
Reviewers: Jason Gustafson <jason@confluent.io>
* Mention `acks=1` to `acks=all` change in 3.0.0 upgrade docs
* Have a separate section for 3.0.1 and 3.1.1 as some may skip the
3.0.0/3.1.0 section when upgrading to a bug fix.
* Move the 3.0.0 note to the top since it's more impactful than the
other changes.
Reviewers: Jason Gustafson <jason@confluent.io>
MetadataShell should be able to display contents from `ProducerIdsRecord`.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>
The current naming of the fields in `ProducerIdsRecord` is a little confusing in regard to whether the block range was inclusive or exclusive. This patch tries to improve naming to make this clearer. In the record class, instead of `ProducerIdsEnd`, we use `NextProducerId`. We have also updated related classes such as `ProducerIdsBlock.java` with similar changes.
Reviewers: dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>
In #11649, we fixed one permission inconsistency between kraft and zk authorization for the `CreatePartitions` request. Previously kraft was requiring `CREATE` permission on the `Topic` resource when it should have required `ALTER`. A second inconsistency is that kraft was also allowing `CREATE` on the `Cluster` resource, which is not supported in zk clusters and was not documented in KIP-195: https://cwiki.apache.org/confluence/display/KAFKA/KIP-195%3A+AdminClient.createPartitions. This patch fixes this inconsistency and adds additional test coverage for both cases.
Reviewers: José Armando García Sancio <jsancio@gmail.com>
This patch adds null value check to the connector config validation, and extends unit tests to cover this functionality.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chris Egerton <chrise@confluent.io>, Boyang Chen <bchen11@outlook.com>, Andras Katona <akatona@cloudera.com>
Fixes a misspelled variable name in `KafkaConsumer`: `cachedSubscriptionHashAllFetchPositions` -> `cachedSubscriptionHasAllFetchPositions`.
Reviewers: Kvicii <Karonazaba@gmail.com>, Jason Gustafson <jason@confluent.io>
This PR follows #11629 to enable `CreateTopicsRequestWithForwardingTest` and `CreateTopicsRequestWithPolicyTest` in KRaft mode.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch adds missing `equals` and `hashCode` implements for `RawMetaProperties`. This is relied on by the storage tool for detecting when two log directories have different `meta.properties` files.
Reproduce current issue:
```shell
$ sed -i 's|log.dirs=/tmp/kraft-combined-logs|+log.dirs=/tmp/kraft-combined-logs,/tmp/kraft-combined-logs2' ./config/kraft/server.properties
$ ./bin/kafka-storage.sh format -t R19xNyxMQvqQRGlkGDi2cg -c ./config/kraft/server.properties
Formatting /tmp/kraft-combined-logs
Formatting /tmp/kraft-combined-logs2
$ ./bin/kafka-storage.sh info -c ./config/kraft/server.properties
Found log directories:
/tmp/kraft-combined-logs
/tmp/kraft-combined-logs2
Found metadata: {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1}
Found problem:
Metadata for /tmp/kraft-combined-logs2/meta.properties was {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1}, but other directories featured {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1}
```
It's reporting that same metadata are not the same...
With this fix:
```shell
$ ./bin/kafka-storage.sh info -c ./config/kraft/server.properties
Found log directories:
/tmp/kraft-combined-logs
/tmp/kraft-combined-logs2
Found metadata: {cluster.id=R19xNyxMQvqQRGlkGDi2cg, node.id=1, version=1}
```
Reviewers: Igor Soarez <soarez@apple.com>, Jason Gustafson <jason@confluent.io>
This patch ensures that the committed offsets are not expired while the group is rebalancing. The issue is that we can't rely on the subscribed topics if the group is not stable.
Reviewers: David Jacot <djacot@confluent.io>
Currently, when using KRaft mode, users still have to have an Apache ZooKeeper instance if they want to use AclAuthorizer. We should have a built-in Authorizer for KRaft mode that does not depend on ZooKeeper. This PR introduces such an authorizer, called StandardAuthorizer. See KIP-801 for a full description of the new Authorizer design.
Authorizer.java: add aclCount API as described in KIP-801. StandardAuthorizer is currently the only authorizer that implements it, but eventually we may implement it for AclAuthorizer and others as well.
ControllerApis.scala: fix a bug where createPartitions was authorized using CREATE on the topic resource rather than ALTER on the topic resource as it should have been.
QuorumTestHarness: rename the controller endpoint to CONTROLLER for consistency (the brokers already called it that). This is relevant in AuthorizerIntegrationTest where we are examining endpoint names. Also add the controllerServers call.
TestUtils.scala: adapt the ACL functions to be usable from KRaft, by ensuring that they use the Authorizer from the current active controller.
BrokerMetadataPublisher.scala: add broker-side ACL application logic.
Controller.java: add ACL APIs. Also add a findAllTopicIds API in order to make junit tests that use KafkaServerTestHarness#getTopicNames and KafkaServerTestHarness#getTopicIds work smoothly.
AuthorizerIntegrationTest.scala: convert over testAuthorizationWithTopicExisting (more to come soon)
QuorumController.java: add logic for replaying ACL-based records. This means storing them in the new AclControlManager object, and integrating them into controller snapshots. It also means applying the changes in the Authorizer, if one is configured. In renounce, when reverting to a snapshot, also set newBytesSinceLastSnapshot to 0.
Reviewers: YeonCheol Jang <YeonCheolGit@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
Title: KafkaConsumer cannot jump out of the poll method, and cpu and traffic on the broker side increase sharply
description: The local test has been passed, the problem described by jira can be solved
JIRA link : https://issues.apache.org/jira/browse/KAFKA-13310
Reviewers: Luke Chen <showuon@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
`ext` contains definitions that should be accessible in `allprojects`
(even though we don't use any right now).
Reviewers: Jason Gustafson <jason@confluent.io>
Introduce `maxScalacThreads` and set the default to the lowest of `8`
and the number of processors available to the JVM. The number `8` was
picked empirically, the sweet spot is between 6 and 10.
On my desktop, `./gradlew clean core:compileScala core:compileTestScala`
improved from around 60s to 51s ( with this change.
While at it, we improve the build output to include more useful
information at the start: build id, max parallel forks, max scala
threads and max test retries.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Sean Li
This PR is for KAFKA-7572, which fixes the issue that producers will throw confusing exceptions when a custom Partitioner returns a negative partition. Since the PR #5858 is not followed by anyone currently, I reopen this one to continue the work.
Reviewers: Luke Chen <showuon@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
We only use it in the legacy record formats (V0 and V1) and the CRC32
implementation in the standard library has received various performance
improvements over the years
(https://bugs.openjdk.java.net/browse/JDK-8245512 is a recent example).
Also worth noting that record formats V0 and V1 have been deprecated
since Apache Kafka 3.0.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Kvicii <Karonazaba@gmail.com>
After KAFKA-10793, we clear the findCoordinatorFuture in 2 places:
1. heartbeat thread
2. AbstractCoordinator#ensureCoordinatorReady
But in non consumer group mode with group id provided (for offset commitment. So that there will be consumerCoordinator created), there will be no (1)heartbeat thread , and it only call (2)AbstractCoordinator#ensureCoordinatorReady when 1st time consumer wants to fetch committed offset position. That is, after 2nd lookupCoordinator call, we have no chance to clear the findCoordinatorFuture , and causes the offset commit never succeeded.
To avoid the race condition as KAFKA-10793 mentioned, it's not safe to clear the findCoordinatorFuture in the future listener. So, I think we can fix this issue by calling AbstractCoordinator#ensureCoordinatorReady when coordinator unknown in non consumer group case, under each ConsumerCoordinator#poll.
Reviewers: Guozhang Wang <wangguoz@gmail.com>