The config has been deprecated since Kafka 2.6 (released ~1 year before
3.0), but it was the default before it got deprecated. As such, it's
reasonably unlikely that people would have set it explicitly.
Given the confusing `default` name even though it's _not_ the default, I
think we should remove it in 3.0.
Also remove `ClientDnsLookup.DEFAULT` (not public API), which unlocks
a number of code simplications.
Reviewers: David Jacot <djacot@confluent.io>
Deprecated since 1.0.0 for misleading name. Not a public API technically,
but we were conservative originally.
Reviewers: David Jacot <djacot@confluent.io>
This PR is a precursor to the recovery logic refactor work (KAFKA-12553).
I've renamed the file: core/src/test/scala/unit/kafka/log/LogUtils.scala to core/src/test/scala/unit/kafka/log/LogTestUtils.scala. Also I've renamed the underlying lass from LogUtils to LogTestUtils. This is going to help avoid a naming conflict with a new file called LogUtils.scala that I plan to introduce in core/src/main/scala/kafka/log/ as part of the recovery logic refactor. The new file will also contain a bunch of static functions.
Tests:
Relying on existing tests to catch regressions (if any) since this is a simple change.
Reviewers: Satish Duggana <satishd@apache.org>, Dhruvil Shah <dhruvil@confluent.io>, Jun Rao <junrao@gmail.com>
This PR is a precursor to the recovery logic refactor work (KAFKA-12553).
I have made a change to eliminate Log.isLogDirOffline attribute. This boolean also comes in the way of refactoring the recovery logic. This attribute was added in #9676. But it is redundant and can be eliminated in favor of looking up LogDirFailureChannel to check if the logDir is offline. The performance/latency implication of such a ConcurrentHashMap lookup inside LogDirFailureChannel should be very low given that ConcurrentHashMap reads are usually lock free.
Tests:
Relying on existing unit/integration tests.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Jun Rao <junrao@gmail.com>
The `kafka-preferred-replica-election` command was deprecated in 2.4. This path removes it for 3.0. `kafka-leader-election` can be used instead.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
Replaced File with Path in LogSegment Data.
This is a followup of #10173
Reviewers: Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>
ConfigEntry's public constructor was deprecated in 1.1.0. This patch removes it in AK 3.0.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>
`Admin.electLeaders` is the replacement since the deprecation in Apache Kafka 2.4.0.
The methods were originally introduced in Apache Kafka 2.2.0, so they were only
non deprecated for two releases.
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
More specifically, remove deprecated:
- Constants in SslConfigs
- Constants in SaslConfigs
- AclBinding constructor
- AclBindingFilter constructor
- PrincipalBuilder and DefaultPrincipalBuilder classes
- ResourceFilter
Also simplify tests and code that no longer have to handle the removed `PrincipalBuilder`.
These removals seem non controversial. There is a straightforward alternative. The
deprecations happened in 1.0.0 and 2.0.0.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Minor followup to #10407 -- we need to extract the rebalanceInProgress check down into the commitAndFillInConsumedOffsetsAndMetadataPerTaskMap method which is invoked during handleCorrupted, otherwise we may attempt to commit during a a rebalance which will fail
Reviewers: Matthias J. Sax <mjsax@confluent.io>
This PR is a precursor to the recovery logic refactor work (KAFKA-12553).
Problems:
For refactoring the recovery logic (KAFKA-12553), we would like to move the logic to initialize LeaderEpochFileCache out of the Log class and into a separate static function. In the future, once we successfully initialize LeaderEpochFileCache outside Log, we will be able pass it as a dependency into both the Log recovery module and Log class constructor. However, currently the LeaderEpochFileCache constructor takes a dependency on logEndOffset (via a callback), which poses the following problems:
Blocks the instantiation of LeaderEpochFileCache outside Log class. Because, outside Log the logEndOffset is unavailable to be passed into LeaderEpochFileCache constructor. As a result, this situation blocks the recovery logic (KAFKA-12553) refactor work.
It turns out the logEndOffset dependency is used only in 1 of the LeaderEpochFileCache methods: LeaderEpochFileCache.endOffsetFor, and just for 1 particular case. Therefore, it is overkill to pass it in the constructor as a dependency. Also a callback is generally not a neat way to access dependencies and it poses code readability problems too.
Solution:
This PR modifies the code such that we only pass the logEndOffset as a parameter into LeaderEpochFileCache.endOffsetFor whenever the method is called, thus eliminating the constructor dependency. This will also unblock the recovery logic refactor work (KAFKA-12553).
Tests:
I have modified the existing tests to suit the above refactor.
Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Jun Rao <junrao@gmail.com>
The filesystem locks don't protect access between StreamThreads, only across different instances of the same Streams application. Running multiple processes in the same physical state directory is not supported, and as of PR #9978 it's explicitly guarded against), so there's no reason to continue locking the task directories with anything heavier than an in-memory map.
Reviewers: Rohan Desai <rodesai@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
This patch fixes a race condition between the background thread calling `ready` and the call to `MockTime.sleep` in the test. If the call to `sleep` happens first, then the test hangs. I fixed it by giving `MockClient` a way to listen to `ready` calls. This combined with a latch fixes the race.
This patch also fixes a similar race condition in `testClientSideTimeoutAfterFailureToReceiveResponse`. After the disconnect, there is a race between the background thread sending the retry request and the foreground sleeping for the needed backoff delay.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, David Arthur <mumrah@gmail.com>
This PR is a precursor to the recovery logic refactor work (KAFKA-12553).
In this PR, I've extracted the behavior surrounding segments map access within kafka.log.Log class into a new class: kafka.log.LogSegments. This class encapsulates a thread-safe navigable map of kafka.log.LogSegment instances and provides the required read and write behavior on the map. The Log class now encapsulates an instance of the LogSegments class.
Couple advantages of this PR:
Makes the Log class a bit more modular as it moves out certain private behavior thats otherwise within the Log class.
This is a precursor to refactoring the recovery logic (KAFKA-12553). In the future, the logic for recovery and loading of segments from disk (during Log) init will reside outside the Log class. Such logic would need to instantiate and access an instance of the newly added LogSegments class.
Tests:
Added a new test suite: kafka.log.LogSegmentsTest covering the APIs of the newly introduced class.
Reviewers: Satish Duggana <satishd@apache.org>, Dhruvil Shah <dhruvil@confluent.io>, Jun Rao <junrao@gmail.com>
Fix a hanging test in KafkaAdminClientTest by forcing the admin client to shut down
whether or not there are pending requests once the test harness enters shutdown.
Reviewers: Ismael Juma <ijuma@apache.org>, Guozhang Wang <guozhang@apache.org>
`Self-managed` is also used in the context of Cloud vs on-prem and it can
be confusing.
`KRaft` is a cute combination of `Kafka Raft` and it's pronounced like `craft`
(as in `craftsmanship`).
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jose Sancio <jsancio@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Ron Dagostino <rdagostino@confluent.io>
Remove the default close implementation for RocksDBConfigSetter to avoid accidental memory leaks via C++ backed objects which are constructed but not closed by the user
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Need to handle TaskCorruptedException and TimeoutException that can be thrown from offset commit during handleRevocation or handleCorruption
Reviewers: Matthias J. Sax <mjsax@confluent.org>, Guozhang Wang <guozhang@confluent.io>
When in EOS the run loop terminates on that thread before the shutdown can be called. This is a problem for EOS single thread applications using the application shutdown feature.
I changed it so in all cases with a single thread, the dying thread will spin up a new thread to communicate the shutdown and terminate the dying thread. Also @ableegoldman refactored the catch blocks in runloop.
Co-authored-by: A. Sophie Blee-Goldman <ableegoldman@gmail.com>
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Java 11 generates html5 pages with search support, which provides a better user experience.
Fixed `get_jdk` bugs found during testing and updated `release.py` blurb to indicate that
both JDK 8 and JDK 11 are required to perform a release.
Tested by running `python2 release.py stage-docs`, which triggers the `aggregateJavadoc`
path without some of the undesired (for testing) release steps.
Reviewers: John Roesler <vvcephei@apache.org>
ProcessorContext#forward was changed via KIP-251 in 2.0.0 release. This PR removes the old and deprecated overloads.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
A major issue has been raised that this implementation of
emit-on-change is vulnerable to a number of data-loss bugs
in the presence of recovery with dirty state under at-least-once
semantics. This should be fixed in the future when we implement
a way to avoid or clean up the dirty state under at-least-once,
at which point it will be safe to re-introduce KIP-557 and
complete it.
Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>
As @chia7712 found, we currently apply the `java` plugin indirectly via `rat.gradle`
if the `.git` folder exists (7071ded2a6/gradle/rat.gradle (L101)).
This led to inconsistent behavior if the `.git` directory was not present. With
this change, the `java-library` plugin (which has slightly more features than
the `java` plugin) is always applied.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, John Roesler <vvcephei@apache.org>
This patch adds the new `Admin` API to describe producer state as described by KIP-664: https://cwiki.apache.org/confluence/display/KAFKA/KIP-664%3A+Provide+tooling+to+detect+and+abort+hanging+transactions.
The three new APIs added by KIP-664 require different lookup and request patterns:
- DescribeProducers: send to partition leaders
- DescribeTransactions: send to coordinators
- ListTransactions: send to all brokers
Our method of handling complex workflows such as these in `KafkaAdminClient` by chaining together `Call` instances has been clumsy and error-prone at best. I have attempted to introduce a new pattern which separates the lookup stage (e.g. finding partition leaders) from the fulfillment stage (e.g. sending `DescribeProducers`). The lookup stage is implemented by `AdminApiLookupStrategy` and the fulfillment stage is implemented by `AdminApiHandler`. There is a new class `AdminApiDriver` which manages the bookkeeping for these two stages. See the corresponding javadocs for more detail.
This PR provides an example of usage through `DescribeProducersHandler`, which is an implementation of `AdminApiHandler`. It relies on `PartitionLeaderStrategy` which implements `AdminApiLookupStrategy`. In addition to allowing for easier reuse of lookup strategies, this approach provides a more convenient way for testing since all of the logic is not crammed into `KafkaAdminClient`. Follow-up PRs for the rest of KIP-664 will flesh out additional lookup strategies such as for coordinator APIs.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
There were errors while generating javadoc for the streams:test-utils module
because the included TopologyTestDriver imported some excluded classes.
This fixes the errors by inlining the previously excluded packages.
Reviewers: Chia-Ping Tsai <chia7712@apache.org>, Ismael Juma <ijuma@apache.org>
Previously, if we assigned one or more calls to a remote node, but it
never became available, AdminClient would block until the calls hit
their the API timeout. This was particularly unfortunate in the case
where the calls could have been sent to a different node in the cluster.
This PR fixes this behavior by timing out pending connections to
remote nodes if they take longer than the request timeout.
There are a few other small cleanups in this PR: it removes the
unecessary Call#aborted, sets Call#curNode to null after the call has
failed to avoid confusion when debugging or logging, and adds a
"closing" boolean rather than setting newCalls to null when the client
is closed. Also, it increases the log level of the log message that
indicates that we timed out some calls because AdminClient closed,
and simplifies the type of callsInFlight.
Reviewers: Jason Gustafson <jason@confluent.io>
1. replace org.junit.Assert by org.junit.jupiter.api.Assertions
2. replace org.junit by org.junit.jupiter.api
3. replace Before by BeforeEach
4. replace After by AfterEach
5. remove ExternalResource from all scala modules
6. add explicit AfterClass/BeforeClass to stop/start EmbeddedKafkaCluster
Noted that this PR does not migrate stream module to junit 5 so it does not introduce callback of junit 5 to deal with beforeAll/afterAll. The next PR of migrating stream module can replace explicit beforeAll/afterAll by junit 5 extension. Or we can keep the beforeAll/afterAll if it make code more readable.
Reviewers: John Roesler <vvcephei@apache.org>
This patch removes the duplicated `createKafkaMetricsContext` from `KafkaBroker`. It is already present in `Server`. Also added some small style cleanups.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
Currently the Raft leader raises an exception if there is a non-monotonic update to the fetch offset of a replica. In a situation where the replica had lost it disk state, this would prevent the replica from being able to recover. In this patch, we relax the validation to address this problem. It is worth pointing out that this validation could not be relied on to protect from data loss after a voter has lost committed state.
Reviewers: José Armando García Sancio <jsancio@gmail.com>, Boyang Chen <boyang@confluent.io>
This PR was removed by accident in trunk and 2.8, bringing it back.
Co-authored-by: Matthias J. Sax <matthias@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>
Introduce "testkit" package which includes KafkaClusterTestKit class for enabling integration tests of self-managed clusters. Also make use of this new integration test harness in the ClusterTestExtentions JUnit extension.
Adds RaftClusterTest for basic self-managed integration test.
Reviewers: Jason Gustafson <jason@confluent.io>, Colin P. McCabe <cmccabe@apache.org>
Co-authored-by: Colin P. McCabe <cmccabe@apache.org>
1. When the 2nd argument is an exception we don't need a placeholder
2. Placeholders should equal to arguments.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Return null for partitionLag if there is no current position.
This was the desired semantics, the lack of the check was an
oversight.
Patches: KIP-695
Patches: a92b986c85
Reviewers: Walker Carlson <wcarlson@confluent.io>, A. Sophie Blee-Goldman <ableegoldman@apache.org>