Commit Graph

9952 Commits

Author SHA1 Message Date
Matthias J. Sax e4ca066680 MINOR: update Kafka Streams docs with 3.2 KIP information (#16313)
Reviewers: Bruno Cadonna <bruno@confluent.io>, Jim Galasyn <jim.galasyn@confluent.io>
2024-06-13 15:09:35 -07:00
Matthias J. Sax 6bbfccd281
HOTIFX: fix Kafka versions for system tests (#14497)
Reviewers: Bill Bejeck <bill@confluent.io>
2023-10-05 10:59:10 -07:00
Omnia G.H Ibrahim 01f56ec7df
KAFKA-15102: Add replication.policy.internal.topic.separator.enabled property to MirrorMaker 2 (KIP-949) (#14082)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-08-15 20:42:33 -04:00
Matthias J. Sax f332439f1f MINOR: update Kafka Streams state.dir doc (#14155)
Default state directory was changes in 2.8.0 release (cf KAFKA-10604)

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2023-08-07 10:39:01 -07:00
Victoria Xia 3bf7f93215 MINOR: update docs note about spurious stream-stream join results (#13642)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-04-25 19:42:29 -07:00
Jeff Kim 1fc9796357
KAFKA-14869: Bump coordinator value records to flexible versions (KIP-915, Part-2) (#13526) (#13602)
This patch implemented the second part of KIP-915. It bumps the versions of the value records used by the group coordinator and the transaction coordinator to make them flexible versions. The new versions are not used when writing to the partitions but only when reading from the partitions. This allows downgrades from future versions that will include tagged fields.

Reviewers: David Jacot <djacot@confluent.io>
2023-04-25 10:41:35 +02:00
Jeff Kim f5a5bc8418
KAFKA-14869: Ignore unknown record types for coordinators (KIP-915, Part-1) (#13598)
This patch implemented the first part of KIP-915. It updates the group coordinator and the transaction coordinator to ignores unknown record types while loading their respective state from the partitions. This allows downgrades from future versions that will include new record types.

Reviewers: Alexandre Dupriez <alexandre.dupriez@gmail.com>, David Jacot <djacot@confluent.io>
2023-04-21 18:28:20 +02:00
Ron Dagostino d62859274a
KAFKA-14887: FinalizedFeatureChangeListener should not shut down when ZK session expires
FinalizedFeatureChangeListener shuts the broker down when it encounters an issue trying to process feature change
events. However, it does not distinguish between issues related to feature changes actually failing and other
exceptions like ZooKeeper session expiration. This introduces the possibility that Zookeeper session expiration
could cause the broker to shutdown, which is not intended. This patch updates the code to distinguish between
these two types of exceptions. In the case of something like a ZK session expiration it logs a warning and continues.
We shutdown the broker only for FeatureCacheUpdateException.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Christo Lolov <christololov@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2023-04-20 20:30:59 -04:00
David Jacot 45992c4673 KAFKA-14880; TransactionMetadata with producer epoch -1 should be expirable (#13499)
We have seen the following error in logs:

```
"Mar 22, 2019 @ 21:57:56.655",Error,"kafka-0-0","transaction-log-manager-0","Uncaught exception in scheduled task 'transactionalId-expiration'","java.lang.IllegalArgumentException: Illegal new producer epoch -1
```

Investigations showed that it is actually possible for a transaction metadata object to still have -1 as producer epoch when it transitions to Dead.

When a transaction metadata is created for the first time (in handleInitProducerId), it has -1 as its producer epoch. Then a producer epoch is attributed and the transaction coordinator tries to persist the change. If the write fail for instance because there is an under min isr, the transaction metadata remains with its epoch as -1 forever or until the init producer id is retried.

This means that it is possible for transaction metadata to remain with -1 as producer epoch until it gets expired. At the moment, this is not allowed because we enforce a producer epoch greater or equals to 0 in prepareTransitionTo.

Reviewers: Luke Chen <showuon@gmail.com>, Justine Olshan <jolshan@confluent.io>
2023-04-06 08:55:22 +02:00
Jorge Esteban Quilcate Otoya d8b3ed87be
KAFKA-14843: Include Connect framework properties when retrieving connector config definitions (#13445)
Reviewers: Yash Mayya <yash.mayya@gmail.com>, Greg Harris <greg.harris@aiven.io>, Chris Egerton <chrise@aiven.io>
2023-03-28 12:42:33 -04:00
Chris Egerton 507afe40ad
KAFKA-14645: Use plugin classloader when retrieving connector plugin config definitions (#13148)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Greg Harris <gharris1727@gmail.com>
2023-03-28 12:32:16 -04:00
Hector Geraldino 644a2dcec2
KAFKA-14809 Fix logging conditional on WorkerSourceTask (#13386)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-03-16 08:41:25 -04:00
Anastasia Vela d1d3b5a486
MINOR: Fix flaky testClientDisconnectionUpdatesRequestMetrics() (#11987) (#12957)
Reviewers: David Jacot <djacot@confluent.io>
2023-02-22 10:43:12 -08:00
Ron Dagostino 609e34835c
KAFKA-14731: Upgrade ZooKeeper to 3.6.4 (#13273)
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>
2023-02-21 08:45:27 -05:00
David Jacot d038cf7919
KAFKA-14704; Follower should truncate before incrementing high watermark (#13245)
When a leader becomes a follower, it is likely that it has uncommitted records in its log. When it reaches out to the leader, the leader will detect that they have diverged and it will return the diverging epoch and offset. The follower truncates it log based on this.

There is a small caveat in this process. When the leader return the diverging epoch and offset, it also includes its high watermark, low watermark, start offset and end offset. The current code in the `AbstractFetcherThread` works as follow. First it process the partition data and then it checks whether there is a diverging epoch/offset. The former may accidentally expose uncommitted records as this step updates the local watermark to whatever is received from the leader. As the follower, or the former leader, may have uncommitted records, it will be able to updated the high watermark to a larger offset if the leader has a higher watermark than the current local one. This result in exposing uncommitted records until the log is finally truncated. The time window is short but a fetch requests coming at the right time to the follower could read those records. This is especially true for clients out there which uses recent versions of the fetch request but without implementing KIP-320.

When this happens, the follower logs the following messages:
* `Truncating XXX to offset 21434 below high watermark 21437`
* `Non-monotonic update of high watermark from (offset=21437 segment=[20998:98390]) to (offset=21434 segment=[20998:97843])`.

This patch proposes to mitigate the issue by starting by checking on whether a diverging epoch/offset is provided by the leader and skip processing the partition data if it is. This basically means that the first fetch request will result in truncating the log and a subsequent fetch request will update the low/high watermarks.

Reviewers: Ritika Reddy <rreddy@confluent.io>, Justine Olshan <jolshan@confluent.io>, Jason Gustafson <jason@confluent.io>
2023-02-15 08:39:07 +01:00
Kirk True a6474714e8 KAFKA-14496: Wrong Base64 encoder used by OIDC OAuthBearerLoginCallbackHandler (#13000)
The OAuth code to generate the Authentication header was incorrectly
using the URL-safe base64 encoder. For client IDs and/or secrets with
dashes and/or plus signs would not be encoded correctly, leading to the
OAuth server to reject the credentials.

This change uses the correct base64 encoder, per RFC-7617.

Co-authored-by: Endre Vig <vendre@gmail.com>
2022-12-17 12:12:38 +05:30
Matthias J. Sax f999beec04 MINOR: update Streams upgrade guide for 3.1 release (#12926)
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>
2022-12-14 14:32:57 -05:00
Rohan b52d04d4ee MINOR: add docs table entries for new metrics (#12934)
Adds docs for KIP-761.

Reviewers: Anurag Bandyopadhyay (@Anuragkillswitch), Matthias J. Sax <matthias@confluent.io>
2022-12-09 10:30:29 -08:00
Lucas Brutschy ddae7f0f9c KAFKA-14432: RocksDBStore relies on finalizers to not leak memory (#12935)
RocksDBStore relied on finalizers to not leak memory (and leaked memory after the upgrade to RocksDB 7).
The problem was that every call to options.statistics creates a new wrapper object that needs to be finalized.

I simplified the logic a bit and moved the ownership of the statistics from ValueProvider to RocksDBStore.

Reviewers: Bruno Cadonna <cadonna@apache.org>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Christo Lolov <lolovc@amazon.com>
2022-12-08 18:20:10 +01:00
David Jacot fb8c153203 KAFKA-14422; Consumer rebalance stuck after new static member joins a group with members not supporting static members (#12909)
When a consumer group on a version prior to 2.3 is upgraded to a newer version and static membership is enabled in the meantime, the consumer group remains stuck, iff the leader is still on the old version.

The issue is that setting `GroupInstanceId` in the response to the leader is only supported from JoinGroup version >= 5 and that `GroupInstanceId` is not ignorable nor handled anywhere else. Hence is there is at least one static member in the group, sending the JoinGroup response to the leader fails with a serialization error.

```
org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default groupInstanceId at version 2
```

When this happens, the member stays around until the group coordinator is bounced because a member with a non-null `awaitingJoinCallback` is never expired.

This patch fixes the issue by making `GroupInstanceId` ignorable. A unit test has been modified to cover this.

Reviewers: Jason Gustafson <jason@confluent.io>
2022-11-28 20:22:11 +01:00
A. Sophie Blee-Goldman 5302e060fc HOTFIX: re-add line resetting rebalance schedule missing from hotfix 2022-11-18 23:22:03 -08:00
A. Sophie Blee-Goldman 5ee5c6f552 KAFKA-14382: wait for current rebalance to complete before triggering followup (#12869)
Fix for the subtle bug described in KAFKA-14382 that was causing rebalancing loops. If we trigger a new rebalance while the current one is still ongoing, it may cause some members to fail the first rebalance if they weren't able to send the SyncGroup request in time (for example due to processing records during the rebalance). This means those consumers never receive their assignment from the original rebalance, and won't revoke any partitions they might have needed to. This can send the group into a loop as each rebalance schedules a new followup cooperative rebalance due to partitions that need to be revoked, and each followup rebalance causes some consumer(s) to miss the SyncGroup and never revoke those partitions.

Reviewers: John Roesler <vvcephei@apache.org>
2022-11-18 23:07:47 -08:00
Christo Lolov 9a12d68d66 [KAFKA-14324] Upgrade RocksDB to 7.1.2 (#12809)
Reviewers: Bruno Cadonna <cadonna@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2022-11-16 10:03:20 +01:00
Shawn 4f659db7da Revert "KAFKA-13891: reset generation when syncgroup failed with REBALANCE_IN_PROGRESS (#12140)" (#12794)
This reverts commit c23d60d56c.

Reviewers: Luke Chen <showuon@gmail.com>
2022-11-05 20:23:08 +08:00
Shawn 7944d7b328 KAFKA-13891: reset generation when syncgroup failed with REBALANCE_IN_PROGRESS (#12140)
Reviewers: Luke Chen <showuon@gmail.com>
2022-10-26 17:11:13 -07:00
Tom Bentley 7792f45487 MINOR: Update 3.2 branch version to 3.2.4-SNAPSHOT 2022-09-17 07:37:01 +01:00
Tom Bentley 50029d3ed8 Bump version to 3.2.3 2022-09-13 09:12:22 +01:00
Tom Bentley 013d24990d MINOR: Bump version in upgrade guide to 3.2.3 2022-09-13 09:02:02 +01:00
Jason Gustafson e72db09894 KAFKA-14208; Do not raise wakeup in consumer during asynchronous offset commits (#12626)
Asynchronous offset commits may throw an unexpected WakeupException following #11631 and #12244. This patch fixes the problem by passing through a flag to ensureCoordinatorReady to indicate whether wakeups should be disabled. This is used to disable wakeups in the context of asynchronous offset commits. All other uses leave wakeups enabled.

Note: this patch builds on top of #12611.

Co-Authored-By: Guozhang Wang wangguoz@gmail.com

Reviewers: Luke Chen <showuon@gmail.com>
2022-09-13 15:46:48 +08:00
Philip Nee 56baf6448f KAFKA-14196; Do not continue fetching partitions awaiting auto-commit prior to revocation (#12603)
When auto-commit is enabled with the "eager" rebalance strategy, the consumer will commit all offsets prior to revocation. Following recent changes, this offset commit is done asynchronously, which means there is an opportunity for fetches to continue returning data to the application. When this happens, the progress is lost following revocation, which results in duplicate consumption. This patch fixes the problem by adding a flag in `SubscriptionState` to ensure that partitions which are awaiting revocation will not continue being fetched.

Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-09-12 21:13:25 -07:00
Mickael Maison fec31e82e5 MINOR: 3.2 branch version to 3.2.3-SNAPSHOT 2022-09-09 17:58:22 +02:00
Mickael Maison d695a90faa Bump version to 3.2.2 2022-09-09 17:58:22 +02:00
Thomas Cooper da72c0db5e Upgrade Netty and Jackson versions for CVE fixes [KAFKA-14044] (#12376)
Reviewers: Luke Chen <showuon@gmail.com>
2022-09-09 12:03:20 +02:00
Andrew Dean 931d98f52c KAFKA-14194: Fix NPE in Cluster.nodeIfOnline (#12584)
When utilizing the rack-aware consumer configuration and rolling updates are being applied to the Kafka brokers the metadata updates can be in a transient state and a given topic-partition can be missing from the metadata. This seems to resolve itself after a bit of time but before it can resolve the `Cluster.nodeIfOnline` method throws an NPE. This patch checks to make sure that a given topic-partition has partition info available before using that partition info.

Reviewers: David Jacot <djacot@confluent.io>
2022-09-09 09:16:19 +02:00
Mickael Maison d7398e619f MINOR: Update LICENSE-binary 2022-09-02 13:17:39 +02:00
Mickael Maison 16c49bdd81 MINOR: Align Scala version to 2.13.8 2022-09-02 13:17:14 +02:00
Mickael Maison d14db1be58 MINOR: Bump version in upgrade guide to 3.2.2 2022-09-02 12:59:13 +02:00
Manikumar Reddy e86512aafd MINOR: Add configurable max receive size for SASL authentication requests
This adds a new configuration `sasl.server.max.receive.size` that sets the maximum receive size for requests before and during authentication.

Reviewers: Tom Bentley <tbentley@redhat.com>, Mickael Maison <mickael.maison@gmail.com>

Co-authored-by: Manikumar Reddy <manikumar.reddy@gmail.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
2022-09-02 11:09:46 +02:00
Colin Patrick McCabe 2bfa24b2bd MINOR: Add more validation during KRPC deserialization
When deserializing KRPC (which is used for RPCs sent to Kafka, Kafka Metadata records, and some
other things), check that we have at least N bytes remaining before allocating an array of size N.

Remove DataInputStreamReadable since it was hard to make this class aware of how many bytes were
remaining. Instead, when reading an individual record in the Raft layer, simply create a
ByteBufferAccessor with a ByteBuffer containing just the bytes we're interested in.

Add SimpleArraysMessageTest and ByteBufferAccessorTest. Also add some additional tests in
RequestResponseTest.

Reviewers: Tom Bentley <tbentley@redhat.com>, Mickael Maison <mickael.maison@gmail.com>, Colin McCabe <colin@cmccabe.xyz>

Co-authored-by: Colin McCabe <colin@cmccabe.xyz>
Co-authored-by: Manikumar Reddy <manikumar.reddy@gmail.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
2022-09-02 11:09:37 +02:00
Divij Vaidya 2e229db62d KAFKA-14122: Fix flaky test DynamicBrokerReconfigurationTest#testKeyStoreAlter (#12452)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2022-08-25 18:50:02 +02:00
Colin Patrick McCabe 44e419722e KAFKA-13835: Fix two bugs related to dynamic broker configs in KRaft (#12063)
Fix two bugs related to dynamic broker configs in KRaft. The first bug is that we are calling reloadUpdatedFilesWithoutConfigChange when a topic configuration is changed, but not when a
broker configuration is changed. This is backwards. This function must be called only for broker
configs, and never for topic configs or cluster configs.

The second bug is that there were several configurations such as max.connections which are related
to broker listeners, but which do not involve changing the registered listeners. We should support
these configurations in KRaft. This PR fixes the configuration change validation to support this case.

Reviewers: Jason Gustafson <jason@confluent.io>, Matthew de Detrich <mdedetrich@gmail.com>
2022-08-25 18:50:02 +02:00
Derek Troy-West f60ddc9856 MINOR: Add note on IDEMPOTENT_WRITE ACL to notable changes (#12260)
Update notable changes documentation to mention requiring IDEMPOTENT_WRITE permission
when producing messages with default/idempotent configuration and broker version lower than
2.8.0.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Luke Chen <showuon@gmail.com>
2022-08-12 19:55:02 -07:00
Andrew Borley fa369e7dac KAFKA-14107: Upgrade Jetty version for CVE fixes (#12440)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Aaron Freeland <afreeland@gmail.com>
2022-08-05 23:34:55 +02:00
David Arthur a7369bd52f KAFKA-14136 Generate ConfigRecord for brokers even if the value is unchanged (#12483) 2022-08-04 15:19:49 -04:00
David Arthur 4e049c706f KAFKA-14111 Fix sensitive dynamic broker configs in KRaft (#12455)
Enable some of the dynamic broker reconfiguration tests in KRaft mode
2022-08-04 15:19:38 -04:00
David Arthur 89b2bf257b MINOR: Update 3.2 branch to 3.2.2-SNAPSHOT 2022-07-28 16:42:46 -04:00
David Arthur b172a0a94f Bump version to 3.2.1 2022-07-21 20:33:07 -04:00
Viktor Somogyi-Vass 8464e36682
KAFKA-13917: Avoid calling lookupCoordinator() in tight loop (#12417)
Reviewers: Luke Chen <showuon@gmail.com>
2022-07-21 20:04:39 -04:00
David Arthur cb14b100ad
Add 3.2.1 upgrade docs (#12424)
Reviewers: Randall Hauch <rhauch@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
2022-07-20 15:26:26 -04:00
Shawn d8541b20a1 KAFKA-14024: Consumer keeps Commit offset in onJoinPrepare in Cooperative rebalance (#12349)
In KAFKA-13310, we tried to fix a issue that consumer#poll(duration) will be returned after the provided duration. It's because if rebalance needed, we'll try to commit current offset first before rebalance synchronously. And if the offset committing takes too long, the consumer#poll will spend more time than provided duration. To fix that, we change commit sync with commit async before rebalance (i.e. onPrepareJoin).

However, in this ticket, we found the async commit will keep sending a new commit request during each Consumer#poll, because the offset commit never completes in time. The impact is that the existing consumer will be kicked out of the group after rebalance timeout without joining the group. That is, suppose we have consumer A in group G, and now consumer B joined the group, after the rebalance, only consumer B in the group.

Besides, there's also another bug found during fixing this bug. Before KAFKA-13310, we commitOffset sync with rebalanceTimeout, which will retry when retriable error until timeout. After KAFKA-13310, we thought we have retry, but we'll retry after partitions revoking. That is, even though the retried offset commit successfully, it still causes some partitions offsets un-committed, and after rebalance, other consumers will consume overlapping records.

Reviewers: RivenSun <riven.sun@zoom.us>, Luke Chen <showuon@gmail.com>
2022-07-20 10:05:23 +08:00