Commit Graph

10511 Commits

Author SHA1 Message Date
Guozhang Wang d62a42df2e
KAFKA-10199: Integrate Topology Pause/Resume with StateUpdater (#12659)
When a topology is paused / resumed, we also need to pause / resume its corresponding tasks inside state updater.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-09-28 16:26:01 -07:00
Chase Thomas d2f900b055
MINOR: Small update docs/design.html grammar and typo (#12691)
Reviewers: Jason Gustafson <jason@confluent.io>
2022-09-27 11:46:25 -07:00
Orsák Maroš 5e322deb9a
MINOR: Improve unit test coverage of LeaderAndIsr class (#12689) 2022-09-27 16:08:05 +02:00
Jason Gustafson 017868d8ac
MINOR: Add section on listener configuration (including kraft) to security docs (#12682)
This patch adds a section in security.html about listener configuration. This includes the basics of how to define the security mapping of each listener as well as the configurations to control inter-cluster traffic.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Luke Chen <showuon@gmail.com>
2022-09-27 10:33:32 +08:00
Jason Gustafson 1c0f8f90e2
MINOR: Update design docs to avoid zookeeper-specific assumptions (#12690)
Update a few cases in the documentation which do not make sense for KRaft.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-09-26 13:01:07 -07:00
Bruno Cadonna 07a31599c3
KAFKA-10199: Fix switching to updating standbys if standby is removed (#12687)
When the state updater only contains standby tasks and then a
standby task is removed, an IllegalStateException is thrown
because the changelog reader does not allow to switch to standby
updating mode more than once in a row.

This commit fixes this bug by checking that the removed task is
an active one before trying to switch to standby updating mode.
If the task to remove is a standby task then either we are already
in standby updating mode and we should not switch to it again or
we are not in standby updating mode which implies that there are
still active tasks that would prevent us to switch to standby
updating mode.

Reviewer: Guozhang Wang <wangguoz@gmail.com>
2022-09-26 20:34:09 +02:00
José Armando García Sancio 4dec656699
KAFKA-14207; KRaft Operations documentation (#12642)
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Chase Thomas <forlack@users.noreply.github.com>
2022-09-26 11:19:48 -07:00
Niket eb8f0bd5e4
MINOR: Adding KRaft Monitoring Related Metrics to docs/ops.html (#12679)
This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html).

Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>
2022-09-26 14:25:36 +08:00
Ahmed Sobeh b0ace18035
KAFKA-14239: Merge StateRestorationIntegrationTest into RestoreIntegrationTest (#12670)
This PR makes the following changes:

* Moves the only test in StateRestorationIntegrationTest into RestoreIntegrationTest
* Deletes StateRestorationIntegrationTest

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-09-23 17:15:25 -07:00
Colin Patrick McCabe 7496e62434
KAFKA-14259: BrokerRegistration#toString throws an exception, terminating metadata replay (#12681)
Previously, BrokerRegistration#toString sould throw an exception, terminating metadata replay,
because the sorted() method is used on an entry set rather than a key set.

Reviewers: David Arthur <mumrah@gmail.com>
2022-09-23 15:39:50 -07:00
Divij Vaidya 9b2e290423
KAFKA-14132: Replace PowerMock/Easymock with Mockito for WorkerMetricsGroupTest (#12677)
Reviewers: Chris Egerton <chrise@aiven.io>
2022-09-23 16:14:09 -04:00
Kirk True 8e43548175
KAFKA-13725: KIP-768 OAuth code mixes public and internal classes in same package (#12039)
* KAFKA-13725: KIP-768 OAuth code mixes public and internal classes in same package

Move classes into a sub-package of "internal" named "secured" that
matches the layout more closely of the "unsecured" package.

Replaces the concrete implementations in the former packages with
sub-classes of the new package layout and marks them as deprecated. If
anyone is already using the newer OAuth code, this should still work.

* Fix checkstyle and spotbugs violations

Co-authored-by: Kirk True <kirk@mustardgrain.com>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2022-09-23 13:15:15 +05:30
Justine Olshan a7caed39b7
KAFKA-14097: Make producer ID expiration a dynamic config (#12643)
Make `producer.id.expiration.ms` a dynamic configuration as described in KIP-854.

Reviewers: David Jacot <djacot@confluent.io>
2022-09-23 09:19:48 +02:00
Vicky Papavasileiou cda5da9b65
KAFKA-14209: Change Topology optimization to accept list of rules 1/3 (#12641)
This PR is part of a series implementing the self-join rewriting. As part of it, we decided to clean up the TOPOLOGY_OPTIMIZATION_CONFIG and make it a list of optimization rules. Acceptable values are: NO_OPTIMIZATION, OPTIMIZE which applies all optimization rules or a comma separated list of specific optimizations.

Reviewers: Guozhang Wang <guozhang@apache.org>, John Roesler <vvcephei@apache.org>
2022-09-22 11:20:37 -05:00
Jason Gustafson 3549a5524e
MINOR: Update security docs for kraft Authorizer configuration (#12673)
Update security documentation to describe how to configure the KRaft `Authorizer` implementation and include a note about principal forwarding.

Additionally, this patch renames `KafkaConfig.Defaults.DefaultPrincipalSerde` to `DefaultPrincipalBuilder` since the former is somewhat misleading.

Reviewers: David Arthur <mumrah@gmail.com>
2022-09-21 19:38:59 -07:00
Luke Chen bf7ddf73af
MINOR: use addExact to avoid overflow and some cleanup (#12660)
What changes in this PR:
1. Use addExact to avoid overflow in BatchAccumulator#bytesNeeded. We did use addExact in bytesNeededForRecords method, but forgot that when returning the result.
2. javadoc improvement

Reviewers: Jason Gustafson <jason@confluent.io>
2022-09-22 09:22:58 +08:00
aLeX cb9557a990
KAFKA-14236; ListGroups request produces too much Denied logs in authorizer (#12652)
Avoid `is Denied Operation` logs when calling ListGroups api, by adding `logIfDenied = false` to `authorize` calls.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-09-21 17:42:30 -07:00
Jason Gustafson 695424fa9d
MINOR: Mention deprecation of authorizer flags in security documentation (#12668)
The following options are deprecated in kafka-acls.sh: `--authorizer`, `--authorizer-properties`, and `--zk-tls-config-file`. This patch updates the security documentation to mention the deprecation and changes examples to use `--bootstrap-server` when possible.

Reviewers: Luke Chen <showuon@gmail.com>
2022-09-21 09:34:17 -07:00
Manikumar Reddy 5587c65fd3 MINOR: Add configurable max receive size for SASL authentication requests
This adds a new configuration `sasl.server.max.receive.size` that sets the maximum receive size for requests before and during authentication.

Reviewers: Tom Bentley <tbentley@redhat.com>, Mickael Maison <mickael.maison@gmail.com>

Co-authored-by: Manikumar Reddy <manikumar.reddy@gmail.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
2022-09-21 20:58:33 +05:30
Colin Patrick McCabe b401fdaefb MINOR: Add more validation during KRPC deserialization
When deserializing KRPC (which is used for RPCs sent to Kafka, Kafka Metadata records, and some
    other things), check that we have at least N bytes remaining before allocating an array of size N.

    Remove DataInputStreamReadable since it was hard to make this class aware of how many bytes were
    remaining. Instead, when reading an individual record in the Raft layer, simply create a
    ByteBufferAccessor with a ByteBuffer containing just the bytes we're interested in.

    Add SimpleArraysMessageTest and ByteBufferAccessorTest. Also add some additional tests in
    RequestResponseTest.

    Reviewers: Tom Bentley <tbentley@redhat.com>, Mickael Maison <mickael.maison@gmail.com>, Colin McCabe <colin@cmccabe.xyz>

    Co-authored-by: Colin McCabe <colin@cmccabe.xyz>
    Co-authored-by: Manikumar Reddy <manikumar.reddy@gmail.com>
    Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
2022-09-21 20:58:23 +05:30
Divij Vaidya 8e522c56d1
MINOR: Fix ConsumerPerformanceTest to work with non-default locales (#12623)
ConsumerPerformanceTest fails when running on a machine with a Locale where decimal is represented by a comma. E.g. in locale of Germany, one point two is written as 1,2 and not 1.2 as with the default locale.

The test fails because it validates that each header has a corresponding value by assuming that comma is a separator between values & headers. This assumption fails for Germany Locale because comma could be part of a float number.

Reviewers: David Jacot <djacot@confluent.io>
2022-09-21 15:54:25 +02:00
Philipp Trulson 9df925bf9e
MINOR: Fix typo in info message (#12665)
Reviewers: Luke Chen <showuon@gmail.com>
2022-09-21 10:05:15 +08:00
Akhilesh C 6c6b8e2f96
KAFKA-14214: Introduce read-write lock to StandardAuthorizer for consistent ACL reads. (#12628)
Fixes an issue with StandardAuthorizer#authorize that allowed inconsistent results. The underlying 
concurrent data structure (ConcurrentSkipListMap) had weak consistency guarantees. This meant
that a concurrent update to the authorizer data could result in the authorize function processing 
ACL updates out of order.

This patch replaces the concurrent data structures with regular non-thread safe equivalents and uses
a read/write lock for thread safety and strong consistency.

Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, Luke Chen <showuon@gmail.com>
2022-09-20 16:54:18 -04:00
Colin Patrick McCabe ae4bb0c6fa
KAFKA-14243: Temporarily disable unsafe downgrade (#12664)
Reviewers: David Arthur <mumrah@gmail.com>
2022-09-20 15:32:52 -04:00
Jordan Bull 8ddc9509cf
KAFKA-13927: Fix sink task offset tracking during exception retries (#12566)
Reviewers: Chris Egerton <chrise@aiven.io>
2022-09-20 13:46:35 -04:00
David Jacot ff4c4d1365
MINOR: Log controller id/epoch when LeaderAndIsr, StopReplica and UpdateMetadata requests are fenced (#12645)
Reviewers: Luke Chen <showuon@gmail.com>
2022-09-20 15:26:37 +02:00
Lucas Brutschy 8f5c234224
MINOR: Fixes in release.py (#12648)
1. Permissions for mkdir set incorrectly, probably because it used incorrect python3 syntax (octals should use 0o prefix).

2. JAVA_HOME logic didn't seem to hold up to its promise. The original logic was broken when user input was empty. It was supposed to use JAVA_HOME system property to find java, but it wouldn't set jdk_java_home, so the following version check

java_version = cmd_output("%s/bin/java -version" % jdk_java_home, env=jdk_env)

would access /bin/java which does not exist on any system I know.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2022-09-20 12:26:11 +02:00
Sushant Mahajan f8e0a6d924
KAFKA-14212: Enhance HttpAccessTokenRetriever to retrieve error message (#12651)
Currently HttpAccessTokenRetriever client side class does not retrieve error response from the token e/p. As a result, seemingly trivial config issues could take a lot of time to diagnose and fix. For example, client could be sending invalid client secret, id or scope.
This PR aims to remedy the situation by retrieving the error response, if present and logging as well as appending to any exceptions thrown.
New unit tests have also been added.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2022-09-20 12:33:19 +05:30
Jason Gustafson 8c8b5366a6
KAFKA-14240; Validate kraft snapshot state on startup (#12653)
We should prevent the metadata log from initializing in a known bad state. If the log start offset of the first segment is greater than 0, then must be a snapshot an offset greater than or equal to it order to ensure that the initialized state is complete.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-09-19 11:52:48 -07:00
Bruno Cadonna b4fa3496e1
KAFKA-10199: Adapt restoration integration tests to state updater (#12650)
Transforms the integration test that verifies restoration in a
parametrized test. The parametrized test runs once with
state updater enabled and once with state updater disabled.

Reviewer: Guozhang Wang <wangguoz@gmail.com>
2022-09-19 19:27:17 +02:00
Federico Valeri 6d463c1733
MINOR: Update offset.storage.topic description (#12656)
Reviewers: Chris Egerton <chrise@aiven.io>
2022-09-19 13:21:59 -04:00
Nikolay 51b079dca7
KAFKA-12878: Support --bootstrap-server in kafka-streams-application-reset tool (#12632)
Reviewers: Chris Egerton <chrise@aiven.io>
2022-09-19 13:20:41 -04:00
Manikumar Reddy 3e8e082fab MINOR: Bump latest 2.8 version to 2.8.2 2022-09-19 17:18:47 +05:30
Tom Bentley 352c71ffb5
MINOR: Update release versions for upgrade tests with 3.0.2, 3.1.2, 3.2.3 release (#12661)
Updates release versions in files that are used for upgrade test with the 3.0.2, 3.1.2, 3.2.3 release version.
2022-09-19 17:13:40 +05:30
Luke Chen 09011da76d
KAFKA-14233: disable testReloadUpdatedFilesWithoutConfigChange first to fix the build (#12658)
disable testReloadUpdatedFilesWithoutConfigChange first to fix the build

Reviewers: Jason Gustafson <jason@confluent.io>
2022-09-19 12:22:25 +08:00
Jason Gustafson a8fcbcc08f
MINOR: Set display granularity in gradle test logging (#12657)
We sometimes see build failures where the code encounters an exit condition and fails abruptly. For example:
```
[2022-09-18T10:01:25.947Z] * What went wrong:
[2022-09-18T10:01:25.947Z] Execution failed for task ':core:unitTest'.
[2022-09-18T10:01:25.947Z] > Process 'Gradle Test Executor 116' finished with non-zero exit value 1
```
When this happens, it can be difficult to track the failure back to a specific test from the build output because we don't know which test was executing on 'Gradle Test Executor 116.' 

There is a test logging property in gradle called `displayGranularity`, which lets us see the executor for each test run: https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.logging.TestLogging.html#org.gradle.api.tasks.testing.logging.TestLogging:displayGranularity.  When `displayGranularity` is set to 2 (the default), we get the following:
```
AdminZkClientTest > testGetBrokerMetadatas() PASSED
```
When set to 0, it looks like this instead:
```
Gradle Test Run :core:test > Gradle Test Executor 76 > AdminZkClientTest > testGetBrokerMetadatas() PASSED
```
Having this extra information should make it easier to debug failures.

Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2022-09-18 18:24:27 -07:00
José Armando García Sancio b166ac43cb
KAFKA-14238; KRaft metadata log should not delete segment past the latest snapshot (#12655)
Disable segment deletion based on size and time by setting the KRaft metadata log's `RetentionMsProp` and `RetentionBytesProp` to `-1`. This will cause `UnifiedLog.deleteRetentionMsBreachedSegments` and `UnifiedLog.deleteRetentionSizeBreachedSegments` to short circuit instead of deleting segments.

Without this changes the included test would fail. This happens because `deleteRetentionMsBreachedSegments` is able to delete past the `logStartOffset`. Deleting past the `logStartOffset` would violate the invariant that if the `logStartOffset` is greater than 0 then there is a snapshot with an end offset greater than or equal to the log start offset.

Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-09-17 06:30:50 -07:00
Bruno Cadonna a1f3c6d160
KAFKA-10199: Register and unregister changelog topics in state updater (#12638)
Registering and unregistering the changelog topics in the
changelog reader outside of the state updater leads to
race conditions between the stream thread and the state
updater thread. Thus, this PR moves registering and
unregistering of changelog topics in the changelog
reader into the state updater if the state updater
is enabled.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Hao Li <1127478+lihaosky@users.noreply.github.com>
2022-09-16 09:05:11 +02:00
Matthew Stidham fdcde1fb78
MINOR: replace deprecated egrep in kafka-run-class (#12649)
The egrep is deprecated in 2007 and be replaced with grep -E

Signed-off-by: Matthew Stidham <stidmatt@gmail.com>

Reviewers: Luke Chen <showuon@gmail.com>
2022-09-16 14:32:53 +08:00
Rens Groothuijsen b09cadcaa7
KAFKA-13985: Skip committing MirrorSourceTask records without metadata (#12602)
Reviewers: Chris Egerton <chrise@aiven.io>
2022-09-15 09:55:44 -04:00
Artem Livshits 2b2039f0ba
KAFKA-14156: Built-in partitioner may create suboptimal batches (#12570)
Now the built-in partitioner defers partition switch (while still
accounting produced bytes) if there is no ready batch to send, thus
avoiding switching partitions and creating fractional batches.

Reviewers: Jun Rao <jun@confluent.io>
2022-09-14 17:39:14 -07:00
José Armando García Sancio fc6a814391
MINOR; Add missing li end tag (#12640)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2022-09-14 08:55:28 -07:00
Ismael Juma fcbc9c49d0
MINOR: Mention that kraft is production ready in upgrade notes (#12635)
Reviewers: José Armando García Sancio <jsancio@apache.org>
2022-09-14 08:30:39 -07:00
Yash Mayya bdf2cdb27f
KAFKA-14132: Migrate some Connect tests from EasyMock/PowerMock to Mockito (#12615)
Reviewers: Chris Egerton <chrise@aiven.io>
2022-09-14 11:00:32 -04:00
Nandini Anagondi 21eae2f29a
MINOR: Use MessageDigest equals when comparing signature (#11516)
The motivation for this change is to guard against timing attacks when using InternalRequestSignature.equals()

Pros of this PR

    if the InternalRequestSignature.equal() method could be used for a timing attack, then this PR fixes a security vulnerability

Cons of this PR

    MessageDigest.isEquals() is slower than Arrays.equal since the former is time constant i.e. it runs for a fixed time irrespective of the length of original signature. The execution time of MessageDigest.isEquals() is a function of length of the byte array that it is being tested against.

Even if InternalRequestSignature.equals() is not being used anywhere in code today where it may cause a timing attack, we should still guard against the possibility where a future change might start using it (especially in an open source project where changes might be contributed & reviewed by multiple group of people). The downside of slower equality comparison over a signature is risk worth accepting given the upside we get to safeguard future use cases.

Co-authored-by: Nandini Krishna Anagondi <nandini@mac.local>

Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>,  Karan Kumar <karankumar1100@gmail.com>, Andrew Eugene Choi <andrew.choi@uwaterloo.ca>
2022-09-14 21:06:43 +08:00
Bruno Cadonna 1ab4596ee6
KAFKA-10199: Suspend tasks in the state updater on revocation (#12600)
In the first attempt to handle revoked tasks in the state updater
we removed the revoked tasks from the state updater and added it to
the set of pending tasks to close cleanly. This is not correct since
a revoked task that is immediately reassigned to the same stream thread
would neither be re-added to the state updater nor be created again.
Also a revoked active task might be added to more than one bookkeeping
set in the tasks registry since it might still be returned from
stateUpdater.getTasks() after it was removed from the state updater.
The reason is that the removal from the state updater is done
asynchronously.

This PR solves this issue by introducing a new bookkeeping set
in the tasks registry to bookkeep revoked active tasks (actually
suspended active tasks).

Additionally this PR closes some testing holes around the modified
code.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Hao Li <1127478+lihaosky@users.noreply.github.com>
2022-09-14 09:03:43 +02:00
Artem Livshits 7b49f175b9
MINOR: Add upgrade note regarding the Strictly Uniform Sticky Partitioner (KIP-794) (#12630)
Reviewers: Ismael Juma <ismael@juma.me.uk>, David Jacot <djacot@confluent.io>
2022-09-13 11:56:44 -07:00
Philip Nee 5f01fed206
MINOR: Small cleanups in FetcherTest following KAFKA-14196 (#12629)
Minor cleanups in `FetcherTest` following https://github.com/apache/kafka/pull/12603.

Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-09-13 11:10:41 -07:00
Ashmeet Lamba 86645cb40a
KAFKA-14073; Log the reason for snapshot (#12414)
When a snapshot is taken it is due to either of the following reasons -

    Max bytes were applied
    Metadata version was changed

Once the snapshot process is started, it will log the reason that initiated the process.

Updated existing tests to include code changes required to log the reason. I was not able to check the logs when running tests - could someone guide me on how to enable logs when running a specific test case.

Reviewers: dengziming <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@apache.org>
2022-09-13 10:03:47 -07:00
José Armando García Sancio c5954175a4
KAFKA-14222; KRaft's memory pool should always allocate a buffer (#12625)
Because the snapshot writer sets a linger ms of Integer.MAX_VALUE it is
possible for the memory pool to run out of memory if the snapshot is
greater than 5 * 8MB.

This change allows the BatchMemoryPool to always allocate a buffer when
requested. The memory pool frees the extra allocated buffer when released if
the number of pooled buffers is greater than the configured maximum
batches.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-09-13 08:04:40 -07:00