Commit Graph

10292 Commits

Author SHA1 Message Date
José Armando García Sancio 453bd44d0e
MINOR; Bump to version 3.3.2 (#12708)
Reviewers: Luke Chen <showuon@gmail.com>
2022-10-04 09:49:33 -07:00
José Armando García Sancio 8c98308caf Merge tag '3.3.1' into 3.3 2022-10-03 14:20:21 -07:00
David Arthur c2d7984c8e MINOR: Fix delegation token system test (#12693)
KIP-373 added a "token requester" field to the output of kafka-delegation-tokens.sh. The system test was failing since it was not expecting this new field. This patch adds support for this field and improves the error output if we can't parse.

Reviewers: José Armando García Sancio <jsancio@apache.org>, Manikumar Reddy <manikumar.reddy@gmail.com>
2022-10-01 19:23:42 -07:00
José Armando García Sancio cdb25e10dc MINOR; Add missing code end tag (#12702)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2022-09-29 15:10:04 -07:00
José Armando García Sancio a4f72c47f1 MINOR; Update upgrade documentation for 3.3.1 (#12701)
Reviewers: David Arthur <mumrah@gmail.com>
2022-09-29 15:09:50 -07:00
José Armando García Sancio e23c59d00e Bump version to 3.3.1 2022-09-29 12:03:49 -07:00
Colin P. McCabe 1780f2660e KAFKA-14265: Prefix ACLs may shadow other prefix ACLs 2022-09-29 09:26:47 -07:00
David Arthur 4b35f247d1 Bump 3.3 branch to 3.3.1-SNAPSHOT 2022-09-28 09:56:44 -04:00
David Arthur da96ac933c Merge tag '3.3.0-rc2' into 3.3
3.3.0-rc2
2022-09-28 09:51:55 -04:00
Chase Thomas 07a448295f MINOR: Small update docs/design.html grammar and typo (#12691)
Reviewers: Jason Gustafson <jason@confluent.io>
2022-09-27 12:10:01 -07:00
Jason Gustafson 03ddb27210 MINOR: Add section on listener configuration (including kraft) to security docs (#12682)
This patch adds a section in security.html about listener configuration. This includes the basics of how to define the security mapping of each listener as well as the configurations to control inter-cluster traffic.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Luke Chen <showuon@gmail.com>
2022-09-27 10:36:27 +08:00
Jason Gustafson 1ce7bd7f29 MINOR: Update design docs to avoid zookeeper-specific assumptions (#12690)
Update a few cases in the documentation which do not make sense for KRaft.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-09-26 13:08:37 -07:00
José Armando García Sancio 9b8a48ca2a KAFKA-14207; KRaft Operations documentation (#12642)
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Chase Thomas <forlack@users.noreply.github.com>
2022-09-26 11:27:24 -07:00
David Arthur 6174f95d61
MINOR: update configuration.html with KRaft details (#12678) 2022-09-26 10:16:12 -04:00
Niket 2e95280543 MINOR: Adding KRaft Monitoring Related Metrics to docs/ops.html (#12679)
This commit adds KRaft monitoring related metrics to the Kafka docs (docs/ops.html).

Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>
2022-09-26 14:26:29 +08:00
Colin Patrick McCabe 88ec4d0d60 KAFKA-14259: BrokerRegistration#toString throws an exception, terminating metadata replay (#12681)
Previously, BrokerRegistration#toString sould throw an exception, terminating metadata replay,
because the sorted() method is used on an entry set rather than a key set.

Reviewers: David Arthur <mumrah@gmail.com>
2022-09-23 15:40:32 -07:00
Jason Gustafson 46e6269a5b MINOR: Remove ARM/PowerPC builds from Jenkinsfile (#12380)
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
2022-09-23 13:48:39 -04:00
David Arthur 9d1f9f7764 Bump version to 3.3.0 2022-09-20 17:51:17 -04:00
Akhilesh C f5f8ff0d24 KAFKA-14214: Introduce read-write lock to StandardAuthorizer for consistent ACL reads. (#12628)
Fixes an issue with StandardAuthorizer#authorize that allowed inconsistent results. The underlying 
concurrent data structure (ConcurrentSkipListMap) had weak consistency guarantees. This meant
that a concurrent update to the authorizer data could result in the authorize function processing 
ACL updates out of order.

This patch replaces the concurrent data structures with regular non-thread safe equivalents and uses
a read/write lock for thread safety and strong consistency.

Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, Luke Chen <showuon@gmail.com>
2022-09-20 16:54:44 -04:00
Colin Patrick McCabe ebf3cfe136 KAFKA-14243: Temporarily disable unsafe downgrade (#12664)
Reviewers: David Arthur <mumrah@gmail.com>
2022-09-20 15:49:25 -04:00
Jason Gustafson 0c08c80afa KAFKA-14240; Validate kraft snapshot state on startup (#12653)
We should prevent the metadata log from initializing in a known bad state. If the log start offset of the first segment is greater than 0, then must be a snapshot an offset greater than or equal to it order to ensure that the initialized state is complete.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-09-19 11:53:51 -07:00
Luke Chen c2c71efddc KAFKA-14233: disable testReloadUpdatedFilesWithoutConfigChange first to fix the build (#12658)
disable testReloadUpdatedFilesWithoutConfigChange first to fix the build

Reviewers: Jason Gustafson <jason@confluent.io>
2022-09-19 12:28:47 +08:00
José Armando García Sancio 74c4bbfaf9 KAFKA-14238; KRaft metadata log should not delete segment past the latest snapshot (#12655)
Disable segment deletion based on size and time by setting the KRaft metadata log's `RetentionMsProp` and `RetentionBytesProp` to `-1`. This will cause `UnifiedLog.deleteRetentionMsBreachedSegments` and `UnifiedLog.deleteRetentionSizeBreachedSegments` to short circuit instead of deleting segments.

Without this changes the included test would fail. This happens because `deleteRetentionMsBreachedSegments` is able to delete past the `logStartOffset`. Deleting past the `logStartOffset` would violate the invariant that if the `logStartOffset` is greater than 0 then there is a snapshot with an end offset greater than or equal to the log start offset.

Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-09-17 06:32:31 -07:00
Artem Livshits 3b080a26fc KAFKA-14156: Built-in partitioner may create suboptimal batches (#12570)
Now the built-in partitioner defers partition switch (while still
accounting produced bytes) if there is no ready batch to send, thus
avoiding switching partitions and creating fractional batches.

Reviewers: Jun Rao <jun@confluent.io>
2022-09-14 17:40:33 -07:00
Alan Sheinberg 2b53b60958 MINOR: Adds KRaft versions of most streams system tests (#12458)
Migrates Streams sustem tests to either use kraft brokers or to use both kraft and zk in a testing matrix.

This skips tests which use various forms of Kafka versioning since those seem to have issues with KRaft at the moment. Running these tests with KRaft will require a followup PR.

Reviewers: Guozhang Wang <guozhang@apache.org>, John Roesler <vvcephei@apache.org>
2022-09-14 16:09:33 -07:00
José Armando García Sancio 545a0063fa MINOR; Add missing li end tag (#12640)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2022-09-14 08:58:47 -07:00
Ismael Juma 42bbc95c76 MINOR: Mention that kraft is production ready in upgrade notes (#12635)
Reviewers: José Armando García Sancio <jsancio@apache.org>
2022-09-14 08:31:57 -07:00
Artem Livshits 6c745b53bf MINOR: Add upgrade note regarding the Strictly Uniform Sticky Partitioner (KIP-794) (#12630)
Reviewers: Ismael Juma <ismael@juma.me.uk>, David Jacot <djacot@confluent.io>
2022-09-13 12:18:31 -07:00
José Armando García Sancio 9633c01d2f KAFKA-14222; KRaft's memory pool should always allocate a buffer (#12625)
Because the snapshot writer sets a linger ms of Integer.MAX_VALUE it is
possible for the memory pool to run out of memory if the snapshot is
greater than 5 * 8MB.

This change allows the BatchMemoryPool to always allocate a buffer when
requested. The memory pool frees the extra allocated buffer when released if
the number of pooled buffers is greater than the configured maximum
batches.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-09-13 08:06:26 -07:00
Jason Gustafson 389bb2004f KAFKA-14208; Do not raise wakeup in consumer during asynchronous offset commits (#12626)
Asynchronous offset commits may throw an unexpected WakeupException following #11631 and #12244. This patch fixes the problem by passing through a flag to ensureCoordinatorReady to indicate whether wakeups should be disabled. This is used to disable wakeups in the context of asynchronous offset commits. All other uses leave wakeups enabled.

Note: this patch builds on top of #12611.

Co-Authored-By: Guozhang Wang wangguoz@gmail.com

Reviewers: Luke Chen <showuon@gmail.com>
2022-09-13 15:44:44 +08:00
Philip Nee 14df199af0 KAFKA-14196; Do not continue fetching partitions awaiting auto-commit prior to revocation (#12603)
When auto-commit is enabled with the "eager" rebalance strategy, the consumer will commit all offsets prior to revocation. Following recent changes, this offset commit is done asynchronously, which means there is an opportunity for fetches to continue returning data to the application. When this happens, the progress is lost following revocation, which results in duplicate consumption. This patch fixes the problem by adding a flag in `SubscriptionState` to ensure that partitions which are awaiting revocation will not continue being fetched.

Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-09-12 21:02:52 -07:00
Jason Gustafson 85fc267100 KAFKA-14215; Ensure forwarded requests are applied to broker request quota (#12624)
Currently forwarded requests are not applied to any quotas on either the controller or the broker. The controller-side throttling requires the controller to apply the quota changes from the log to the quota managers, which will be done separately. In this patch, we change the response logic on the broker side to also apply the broker's request quota. The enforced throttle time is the maximum of the throttle returned from the controller (which is 0 until we fix the aforementioned issue) and the broker's request throttle time.

Reviewers: David Arthur <mumrah@gmail.com>
2022-09-12 20:51:32 -07:00
José Armando García Sancio 6f1c1b04bc MINOR; Remove end html tag from upgrade (#12605)
The </html> tag doesn't have a matching <html> tag. Those tags are added by
the server side include and are not needed in docs/upgrade.html

Reviewers: Ismael Juma <ismael@juma.me.uk>
2022-09-12 17:11:03 -07:00
José Armando García Sancio b2639c8d3e Remove the html end tag from upgrade.html 2022-09-12 17:10:03 -07:00
José Armando García Sancio 96869af7af KAFKA-14205; Document how to replace the disk for the KRaft Controller (#12597)
Document process for recovering and formatting the metadata log directory for the KRaft controller.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
2022-09-12 17:06:31 -07:00
David Arthur 89f7f31ac2 KAFKA-14203 Disable snapshot generation on broker after metadata errors (#12596) 2022-09-12 17:01:01 -07:00
Colin Patrick McCabe e0297e3ba6 KAFKA-14216: Remove ZK reference from org.apache.kafka.server.quota.ClientQuotaCallback javadoc (#12617)
Reviewers: Luke Chen <showuon@gmail.com>
2022-09-12 08:35:05 -07:00
Colin Patrick McCabe cb31f9ade0 KAFKA-14217: app-reset-tool.html should not show --zookeeper flag that no longer exists (#12618)
Reviewers: Luke Chen <showuon@gmail.com>
2022-09-12 08:33:32 -07:00
Ismael Juma 1eedaca8fb KAFKA-14198; swagger-jaxrs2 dependency should be compileOnly (#12609)
Verified that the artifact generated by `releaseTarGz` no longer includes
swagger-jaxrs2 or its dependencies (like snakeyaml).

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Chris Egerton <fearthecellos@gmail.com>
2022-09-09 07:50:26 -07:00
Andrew Dean 1df0220dfd KAFKA-14194: Fix NPE in Cluster.nodeIfOnline (#12584)
When utilizing the rack-aware consumer configuration and rolling updates are being applied to the Kafka brokers the metadata updates can be in a transient state and a given topic-partition can be missing from the metadata. This seems to resolve itself after a bit of time but before it can resolve the `Cluster.nodeIfOnline` method throws an NPE. This patch checks to make sure that a given topic-partition has partition info available before using that partition info.

Reviewers: David Jacot <djacot@confluent.io>
2022-09-09 09:10:16 +02:00
Jason Gustafson c5499a6300 MINOR; Remove redundant version system test (#12612)
This patch removes test_kafka_version.py, which contains two tests at the moment. The first test verifies we can start a 0.8.2 cluster. The second verifies we can start a cluster with one node on 0.8.2 and another on the latest. These test are covered in greater depth by upgrade_test.py and downgrade_test.py.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-09-08 18:15:15 -07:00
José Armando García Sancio 408a17ad1f KAFKA-14188; Getting started for Kafka with KRaft (#12604)
Update the quickstart HTML pages for Kafka and Kafka Stream to include how to quickly start and
experiment with a Kafka cluster using KRaft in addition to ZooKeeper.

Reviews: Colin Patrick McCabe <cmccabe@apache.org>,  Chase Thomas <forlack@users.noreply.github.com>, Luke Chen <showuon@gmail.com>
2022-09-08 16:23:22 -07:00
David Jacot 7131724819 KAFKA-14201; Consumer should not send group instance ID if committing with empty member ID (#12599)
The consumer group instance ID is used to support a notion of "static" consumer groups. The idea is to be able to identify the same group instance across restarts so that a rebalance is not needed. However, if the user sets `group.instance.id` in the consumer configuration, but uses "simple" assignment with `assign()`, then the instance ID nevertheless is sent in the OffsetCommit request to the coordinator. This may result in a surprising UNKNOWN_MEMBER_ID error.

This PR fixes the issue on the client side by not setting the group instance id if the member id is empty (no generation).

Reviewers: Jason Gustafson <jason@confluent.io>
2022-09-08 15:06:29 -07:00
David Jacot cfa1f098d6 KAFKA-14201; Consumer should not send group instance ID if committing with empty member ID (server side) (#12598)
The consumer group instance ID is used to support a notion of "static" consumer groups. The idea is to be able to identify the same group instance across restarts so that a rebalance is not needed. However, if the user sets `group.instance.id` in the consumer configuration, but uses "simple" assignment with `assign()`, then the instance ID nevertheless is sent in the OffsetCommit request to the coordinator. This may result in a surprising UNKNOWN_MEMBER_ID error.

This PR attempts to fix this issue for existing consumers by relaxing the validation in this case. One way is to simply ignore the member id and the static id when the generation id is -1. -1 signals that the request comes from either the admin client or a consumer which does not use the group management. This does not apply to transactional offsets commit.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-09-08 14:36:57 -07:00
Colin Patrick McCabe e8b0dc96db KAFKA-14204: QuorumController must correctly handle overly large batches (#12595)
Originally, the QuorumController did not try to limit the number of records in a batch that it sent
to the Raft layer.  This caused two problems. Firstly, we were not correctly handling the exception
that was thrown by the Raft layer when a batch of records was too large to apply atomically. This
happened because the Raft layer threw an exception which was a subclass of ApiException. Secondly,
by letting the Raft layer split non-atomic batches, we were not able to create snapshots at each of
the splits. This led to O(N) behavior during controller failovers.

This PR fixes both of these issues by limiting the number of records in a batch. Atomic batches
that are too large will fail with a RuntimeException which will cause the active controller to
become inactive and revert to the last committed state. Non-atomic batches will be split into
multiple batches with a fixed number of records in each.

Reviewers: Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@gmail.com>
2022-09-08 14:23:04 -07:00
Chris Egerton 51ace6306f KAFKA-14143: Exactly-once source connector system tests (#11783)
Also includes a minor quality-of-life improvement to clarify why some internal REST requests to workers may fail while that worker is still starting up.

Reviewers: Tom Bentley <tbentley@redhat.com>, Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@gmail.com>, Mickael Maison <mickael.maison@gmail.com>
2022-09-08 15:14:35 -04:00
Manikumar Reddy 015d7aede6 MINOR: Add configurable max receive size for SASL authentication requests
This adds a new configuration `sasl.server.max.receive.size` that sets the maximum receive size for requests before and during authentication.

Reviewers: Tom Bentley <tbentley@redhat.com>, Mickael Maison <mickael.maison@gmail.com>

Co-authored-by: Manikumar Reddy <manikumar.reddy@gmail.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
2022-09-08 23:44:46 +05:30
Colin Patrick McCabe b2b928338c MINOR: Add more validation during KRPC deserialization
When deserializing KRPC (which is used for RPCs sent to Kafka, Kafka Metadata records, and some
    other things), check that we have at least N bytes remaining before allocating an array of size N.

    Remove DataInputStreamReadable since it was hard to make this class aware of how many bytes were
    remaining. Instead, when reading an individual record in the Raft layer, simply create a
    ByteBufferAccessor with a ByteBuffer containing just the bytes we're interested in.

    Add SimpleArraysMessageTest and ByteBufferAccessorTest. Also add some additional tests in
    RequestResponseTest.

    Reviewers: Tom Bentley <tbentley@redhat.com>, Mickael Maison <mickael.maison@gmail.com>, Colin McCabe <colin@cmccabe.xyz>

    Co-authored-by: Colin McCabe <colin@cmccabe.xyz>
    Co-authored-by: Manikumar Reddy <manikumar.reddy@gmail.com>
    Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
2022-09-08 23:44:46 +05:30
Ismael Juma fa3834618c MINOR; Retry on test failure for branch builds and increase max test retry to 10 (#12601)
Originally, we only enabled retries for PR builds to avoid hiding timing
related issues. In practice, however, the results are too noisy without
any retry due to various environmental issues.

Enable 1 retry for all builds and increase the max test retry to 10.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>
2022-09-08 08:01:22 -07:00
Colin Patrick McCabe 9cd54f5407 KAFKA-14200: kafka-features.sh must exit with non-zero error code on error (#12586)
kafka-features.sh must exit with a non-zero error code on error. We must do this in order to catch
regressions like KAFKA-13990.

Reviewers: David Arthur <mumrah@gmail.com>
2022-09-07 09:04:31 -07:00