Commit Graph

11221 Commits

Author SHA1 Message Date
Luke Chen be5784f20a fix 2023-12-05 19:38:34 +08:00
Luke Chen 432d2cbbc7 use 341 2023-12-05 17:10:39 +08:00
Robert Wagner c9a6488805 KAFKA-15755: LeaveGroupResponse v0 - v2 loses its member under certain error conditions (#14635)
This patch fixes a bug in the LeaveGroupResponse construction. Basically, when a top level error is set, no members are expected but the current check always requires one for versions prior to version 3.

Reviewers: David Jacot <djacot@confluent.io>
(cherry picked from commit 3fd6293449)
2023-11-16 10:22:12 +01:00
Greg Harris 9aeaa5dc18 KAFKA-15800: Prevent DataExceptions from corrupting KafkaOffsetBackingStore (#14718)
Signed-off-by: Greg Harris <greg.harris@aiven.io>

Reviewers: Yash Mayya <yash.mayya@gmail.com>
2023-11-10 08:42:35 -08:00
Chris Egerton 3c8ca01cef
KAFKA-15693: Immediately reassign lost connectors and tasks when scheduled rebalance delay is disabled (#14647)
Reviewers: Sagar Rao <sagarmeansocean@gmail.com>, Yash Mayya <yash.mayya@gmail.com>
2023-11-09 11:28:35 -05:00
Luke Chen 79de845bd5 fix build error 2023-11-09 17:59:01 +08:00
David Arthur 3e5ec6fd71 KAFKA-15552 Fix Producer ID ZK migration (#14506)
This patch fixes a problem where we migrate the current producer ID batch to KRaft instead of the next producer ID batch. Since KRaft stores the next batch in the log, we end up serving up a duplicate batch to the first caller of AllocateProducerIds once the KRaft controller has taken over.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-11-09 17:54:51 +08:00
Xiaobing Fang 5829fca0a7 KAFKA-15771: fix concurrency bug in ProduceRequest#partitionSizes() (#14674)
A commit fixes a bug in ProduceRequest#partitionSizes() that may cause this method to incorrectly returning an empty or incomplete response for a thread when another thread is in the process of initialising it. 

Reviewers: Divij Vaidya <diviv@amazon.com>, hudeqi <1217150961@qq.com>, vamossagar12 <sagarmeansocean@gmail.com>

--------------------------------
Co-authored-by: fangxiaobing <fangxiaobing@kuaishou.com>
2023-11-07 08:53:18 +00:00
Matthias J. Sax c7eae56dfa HOTFIX: remove unused import to fix checkstyle error 2023-10-30 14:39:05 -07:00
Matthias J. Sax 86d4022d48 KAFKA-15602: revert KAFKA-4852 (#14617)
This PR reverts
 - 51dbd175b0
 - 496ae054c2

Reviewers:  Philip Nee <pnee@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2023-10-30 13:28:22 -07:00
Matthias J. Sax 749df07163 HOTFIX: close iterator to avoid resource leak (#14624)
Reviewers: Hao Li <hli@confluent.io>, Bill Bejeck <bill@confluent.io>
2023-10-26 11:00:45 -07:00
Lucas Brutschy f682ecf9db MINOR: Fix misleading log-line (#14643)
After finishing restoration, we should only log the active tasks. Standby tasks are not part of restoration and it can be confusing to see them show up on this log message.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-10-26 08:39:18 -07:00
Lucas Brutschy 9e10f8959a KAFKA-15319: Upgrade rocksdb to fix CVE-2022-37434 (#14216)
Rocksdbjni<7.9.2 is vulnerable to CVE-2022-37434 due to zlib 1.2.12

Reviewers: Divij Vaidya <diviv@amazon.com>, Bruno Cadonna <cadonna@apache.org>
2023-10-24 11:14:55 -07:00
Mickael Maison ad925d2582 Revert "KAFKA-15093: Add 3.5 Streams upgrade system tests (#14602)"
This reverts commit d769f1dd87.
It is not needed to explictly add 3.5 as the system tests automatically attempt upgrades to DEV_VERSION which is 3.5.x-SNAPSHOT in this branch.
2023-10-23 21:34:11 +02:00
Greg Harris a6a893796e KAFKA-14767: Fix missing commitId build error after git gc (#13315)
git gc moves commit hashes from individual .git/refs/heads/ to .git/packed-refs which is not read
by the determineCommitId function.

Replace the existing lookup within the .git directory with a GrGit lookup that handles packed and
unpacked refs transparently.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2023-10-23 09:41:30 -07:00
Mickael Maison d769f1dd87 KAFKA-15093: Add 3.5 Streams upgrade system tests (#14602)
Reviewers: Matthias J. Sax <mjsax@apache.org>
2023-10-23 13:45:21 +02:00
Mickael Maison 303c86f7a5 KAFKA-15664: Add 3.4 Streams upgrade system tests (#14601)
Reviewers: Luke Chen <showuon@gmail.com>,  Matthias J. Sax <mjsax@apache.org>
2023-10-23 11:02:28 +02:00
Matthias J. Sax d260329056 KAFKA-15378: fix streams upgrade system test (#14539)
Fixing bad test setup. We tried to fix an upgrade bug for FK-joins in 3.1 release, but it later turned out that the PR was not sufficient to fix it. We finally fixed in 3.4 release.

This PR updates the system test matrix to only test working versions with FK-joins, limited to available test versions.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Hao Li <hli@confluent.io>, Mickael Maison <mickael.maison@gmail.com>
2023-10-20 16:37:44 -07:00
hudeqi add9dc3340
KAFKA-15607: Fix NPE in MirrorCheckpointTask::syncGroupOffset (#14587)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-10-20 12:23:44 -04:00
flashmouse 8b4369c573 KAFKA-15106: Fix AbstractStickyAssignor isBalanced predict (#13920)
in 3.5.0 AbstractStickyAssignor may run useless loop in performReassignments  because isBalanced have a trivial mistake, and result in rebalance timeout in some situation.

Co-authored-by: lixy <lixy@tuya.com>
Reviewers: Ritika Reddy <rreddy@confluent.io>, Philip Nee <pnee@confluent.io>, Kirk True <kirk@mustardgrain.com>, Guozhang Wang <wangguoz@gmail.com>
2023-10-20 20:43:38 +08:00
Nick Telford ccdffd6e4f KAFKA-13973: Fix inflated block cache metrics (#14317)
All block cache metrics are being multiplied by the total number of
column families. In a `RocksDBTimestampedStore`, we have 2 column
families (the default, and the timestamped values), which causes all
block cache metrics in these stores to become doubled.

The cause is that our metrics recorder uses `getAggregatedLongProperty`
to fetch block cache metrics. `getAggregatedLongProperty` queries the
property on each column family in the database, and sums the results.

Since we always configure all column families to share the same block
cache, that causes the same block cache to be queried multiple times for
its metrics, with the results added togehter, effectively multiplying
the real value by the total number of column families.

To fix this, we should simply use `getLongProperty`, which queries a
single column family (the default one). Since all column families share
the same block cache, querying just one of them will give us the correct
metrics for that shared block cache.

Note: the same block cache is shared among all column families of a store
irrespective of whether the user has configured a shared block cache
across multiple stores.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2023-10-20 10:49:24 +02:00
bachmanity1 0c90b6557e KAFKA-7438: Replace Easymock & Powermock with Mockito in RocksDBMetricsRecorderGaugesTest (#14190)
Reviewers: Christo Lolov <christololov@gmail.com>, Divij Vaidya <diviv@amazon.com>
2023-10-20 10:41:57 +02:00
David Arthur 319dc61de7 Fix a Scala 2.12 compile issue (#14126)
Reviewers: Luke Chen <showuon@gmail.com>, Qichao Chu
2023-10-20 10:28:42 +02:00
Luke Chen b3b457bf1b KAFKA-15498: upgrade to snappy 1.1.10.5 (#14458)
Release notes - https://github.com/xerial/snappy-java/releases/tag/v1.1.10.5

This release contains adds support for Windows ARM and fixes some dependencies associated with Linux ppc64. 

Reviewers: Josep Prat <josep.prat@aiven.io>
2023-10-12 11:19:00 +02:00
Luke Chen 0cadf0db71 KAFKA-15498: bump snappy-java version to 1.1.10.4 (#14434)
bump snappy-java version to 1.1.10.4, and add more tests to verify the compressed data can be correctly decompressed and read.

For LogCleanerParameterizedIntegrationTest, we increased the message size for snappy decompression since in the new version of snappy, the decompressed size is increasing compared with the previous version. But since the compression algorithm is not kafka's scope, all we need to do is to make sure the compressed data can be successfully decompressed and parsed/read.

Reviewers: Divij Vaidya <diviv@amazon.com>, Ismael Juma <ismael@juma.me.uk>, Josep Prat <josep.prat@aiven.io>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2023-10-12 11:18:43 +02:00
Levani Kokhreidze c55e89a600 KAFKA-15571: `StateRestoreListener#onRestoreSuspended` is never called because `DelegatingStateRestoreListener` doesn't implement `onRestoreSuspended` (#14519)
With https://issues.apache.org/jira/browse/KAFKA-10575 StateRestoreListener#onRestoreSuspended was added. But local tests show that it is never called because DelegatingStateRestoreListener was not updated to call a new method

Reviewers: Anna Sophie Blee-Goldman <sophie@responsive.dev>, Bruno Cadonna <cadonna@confluent.io>
2023-10-11 16:13:37 -07:00
Matthias J. Sax 1fc067ff2e
HOTIFX: fix Kafka versions for system tests (#14500)
Reviewers: Bill Bejeck <bill@confluent.io>
2023-10-05 11:00:19 -07:00
Lucas Brutschy 8f7310eab6 MINOR: Logging fix in StreamsPartitionAssignor (#14435)
Fix broken log message

Reviewer: A. Sophie Blee-Goldman <ableegoldman@apache.org>
2023-10-02 12:32:27 +02:00
Divij Vaidya 00a1b9f769 Upgrade Jetty to 9.4.52.v20230823 (#14438)
Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
2023-09-26 08:43:44 +00:00
Manikumar Reddy 51a7acda25 KAFKA-15502: Update SslEngineValidator to handle large stores (#14445)
We have observed an issue where inter broker SSL listener is not coming up when running with TLSv3/JDK 17 .
SSL debug logs shows that TLSv3 post handshake messages >16K are not getting read and causing SslEngineValidator process to stuck while validating the provided trust/key store.

- Right now, WRAP returns if there is already data in the buffer. But if we need more data to be wrapped for UNWRAP to succeed, we end up looping forever. To fix this, now we always attempt WRAP and only return early on BUFFER_OVERFLOW.
- Update SslEngineValidator to unwrap post-handshake messages from peer when local handshake status is FINISHED.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2023-09-26 12:21:13 +05:30
Federico Valeri e28c1befc3 MINOR: Fix metadata.version reference in "ZooKeeper to KRaft Migration" documentation (#14366)
In "ZooKeeper to KRaft Migration" documentation, we are still reporting 3.4 as metadata version. Reworking that phrase to make it more clear and avoid the need to update it in the future.

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

Reviewers: Luke Chen <showuon@gmail.com>
2023-09-13 17:20:47 +08:00
David Arthur 6bc36eaf23 KAFKA-15450 Don't allow ZK migration with JBOD (#14367)
Reviewers: Ron Dagostino <rndgstn@gmail.com>
2023-09-12 12:41:36 -04:00
atu-sharm 67032b8ecf KAFKA-15338: The metric group documentation for metrics added in KAFKA-13945 is incorrect (#14221)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-09-07 19:06:35 -07:00
Chris Egerton b8cf3e3174
KAFKA-14876: Add stopped state to Kafka Connect Administration docs section (#14336)
Original author (before modifications for backporting: Yash Mayya <yash.mayya@gmail.com>

Reviewers: Chris Egerton <chrise@aiven.io>
2023-09-05 14:48:01 -04:00
Yash Mayya fb85e9d4aa
MINOR: Update the documentation's table of contents to add missing headings for Kafka Connect (#14337)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-09-05 14:00:22 -04:00
Rohan 9a818d2ca7 KAFKA-15429: reset transactionInFlight on StreamsProducer close (#14326)
Resets the value of transactionInFlight to false when closing the
StreamsProducer. This ensures we don't try to commit against a
closed producer

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2023-09-02 18:17:06 -07:00
Rohan ea206a3d36 KAFKA-15429: catch+log errors from unsubscribe in streamthread shutdown (#14325)
Preliminary fix for KAFKA-15429 which updates StreamThread.completeShutdown to
catch-and-log errors from consumer.unsubscribe. Though this does not prevent
the exception, it does preserve the original exception that caused the stream
thread to exit.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2023-09-02 18:17:06 -07:00
A. Sophie Blee-Goldman 1966f51d62 HOTFIX: avoid placement of unnecessary transient standby tasks & improve assignor logging (#14149)
Minor fix to avoid creating unnecessary standby tasks, especially when these may be surprising or unexpected as in the case of an application with num.standby.replicas = 0 and warmup replicas disabled.

The "bug" here was introduced during the fix for an issue with cooperative rebalancing and in-memory stores. The fundamental problem is that in-memory stores cannot be unassigned from a consumer for any period, however temporary, without being closed and losing all the accumulated state. This caused some grief when the new HA task assignor would assign an active task to a node based on the readiness of the standby version of that task, but would have to remove the active task from the initial assignment so it could first be revoked from its previous owner, as per the cooperative rebalancing protocol. This temporary gap in any version of that task among the consumer's assignment for that one intermediate rebalance would end up causing the consumer to lose all state for it, in the case of in-memory stores.

To fix this, we simply began to place standby tasks on the intended recipient of an active task awaiting revocation by another consumer. However, the fix was a bit of an overreach, as we assigned these temporary standby tasks in all cases, regardless of whether there had previously been a standby version of that task. We can narrow this down without sacrificing any of the intended functionality by only assigning this kind of standby task where the consumer had previously owned some version of it that would otherwise potentially be lost.

Also breaks up some of the long log lines in the StreamsPartitionAssignor and expands the summary info while moving it all to the front of the line (following reports of missing info due to truncation of long log lines in larger applications)
2023-08-30 14:50:46 -07:00
Vincent Jiang c252e930fb KAFKA-15375: fix broken clean shutdown detection logic in LogManager
When running in kraft mode, LogManager.startup is called in a different thread than the main broker (#14239)
startup thread (by BrokerMetadataPublisher when the first metadata update is received.) If a fatal
error happens during broker startup, before LogManager.startup is completed, LogManager.shutdown may
 mark log dirs as clean shutdown improperly.

This PR includes following change:
1. During LogManager startup time:
  - track hadCleanShutdwon info for each log dir
  - track loadLogsCompleted status for each log dir
2. During LogManager shutdown time:
  - do not write clean shutdown marker file for log dirs which have hadCleanShutdown==false and loadLogsCompleted==false

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-08-30 12:20:02 -07:00
Calvin Liu 6f55160175 KAFKA-15353: make sure AlterPartitionRequest.build() is idempotent (#14236)
As described in https://issues.apache.org/jira/browse/KAFKA-15353
When the AlterPartitionRequest version is < 3 and its builder.build is called multiple times, both newIsrWithEpochs and newIsr will all be empty. This can happen if the sender retires on errors.

Reviewers: Luke Chen <showuon@gmail.com>
2023-08-28 18:02:29 +08:00
Greg Harris 0492a3bc87 KAFKA-15211: Mock InvalidParameterException in DistributedConfigTest (#14039)
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewed-by: Chris Egerton <chris.egerton@aiven.io>
2023-08-24 16:04:20 -07:00
Yash Mayya 3b9c3da978 KAFKA-15377: Don't expose externalized secret values in tasks-config API endpoint (#14244)
Reviewers: Greg Harris <greg.harris@aiven.io>
2023-08-24 15:54:29 -07:00
Greg Harris 7e3f1c198d KAFKA-15393: Improve shutdown behavior in MM2 integration tests (#14278)
Reviewers: Yash Mayya <yash.mayya@gmail.com>, Chris Egerton <chrise@aiven.io>
2023-08-24 12:28:35 -07:00
Okada Haruki fffbab7951 KAFKA-15391: Handle concurrent dir rename which makes log-dir to be offline unexpectedly (#14280)
A race condition between async flush and segment rename (for deletion purpose) might cause the entire log directory to be marked offline when we delete a topic. This PR fixes the bug by ignoring NoSuchFileException when we flush a directory.

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-08-24 08:51:53 +00:00
Matthias J. Sax 714b4ebeeb HOTIFX: fix Kafka Streams upgrade path from 3.4 to 3.5 (#14103)
KIP-904 introduced a backward incompatible change that requires a 2-bounce rolling upgrade.
The new "3.4" upgrade config value is not recognized by `AssignorConfiguration` though and thus crashed Kafka Streams if use.

Reviewers: Farooq Qaiser <fqaiser94@gmail.com>, Bruno Cadonna <bruno@confluent.io>
2023-08-18 11:07:52 -07:00
David Arthur a9ccb8562e KAFKA-15374 Handle case of default broker in config migration (#14237)
When collecting the set of broker IDs during the migration, don't try to parse the default broker resource `""` as a broker ID.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-08-18 12:50:00 -04:00
Omnia G.H Ibrahim abd1c8e46f
KAFKA-15102: Add replication.policy.internal.topic.separator.enabled property to MirrorMaker 2 (KIP-949) (#14082)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-08-15 20:06:40 -04:00
Florin Akermann 12fc0d04d7 KAFKA-13197: fix GlobalKTable join/left-join semantics documentation. (#14187)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-08-11 11:46:10 -07:00
Greg Harris 34a30fff57 KAFKA-15202: Fix MM2 offset translation when syncs are variably spaced (#14156)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-08-10 13:48:34 -07:00
Greg Harris 1c56806932 MINOR: Optimize runtime of MM2 integration tests by batching transactions (#13816)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-08-10 13:48:11 -07:00