Commit Graph

15975 Commits

Author SHA1 Message Date
Matthias J. Sax ed6472bcf3 KAFKA-19668: processValue() must be declared as value-changing operation (#20470)
With "merge.repartition.topic" optimization enabled, Kafka Streams tries
to push repartition topics upstream, to be able to merge multiple
repartition topics from different downstream branches together.

However, it is not safe to push a repartition topic if the parent node
is value-changing: because of potentially changing data types, the
topology might become invalid, and fail with serde error at runtime.

The optimization itself work correctly, however, processValues() is not
correctly declared as value-changing, what can lead to invalid
topologies.

Reviewers: Bill Bejeck <bill@confluent.io>, Lucas Brutschy
 <lbrutschy@confluent.io>
2025-09-05 18:02:22 -07:00
Mickael Maison 48f97a480c MINOR: Update 4.1 branch version to 4.1.1-SNAPSHOT 2025-09-02 14:52:33 +02:00
Mickael Maison 13f70256db Bump version to 4.1.0 2025-08-27 10:19:21 +02:00
Mickael Maison 70dd1ca2ca Revert "Bump version to 4.1.0"
This reverts commit 23b64404ae.
2025-08-27 10:15:18 +02:00
Chang-Chi Hsu 8de88db65a
KAFKA-19642 Replace dynamicPerBrokerConfigs with dynamicDefaultConfigs (#20405)
- **Changes**: Replace misused dynamicPerBrokerConfigs with
dynamicDefaultConfigs
- **Reasons**: KRaft servers don't handle the cluser-level configs in
starting

from: https://github.com/apache/kafka/pull/18949/files#r2296809389

Reviewers: Jun Rao <junrao@gmail.com>, Jhen-Yung Hsu
<jhenyunghsu@gmail.com>, PoAn Yang <payang@apache.org>, Chia-Ping Tsai
<chia7712@gmail.com>

---------

Co-authored-by: PoAn Yang <payang@apache.org>
2025-08-27 14:34:58 +08:00
Lucas Brutschy 500bd70a29 KAFKA-19429: Deflake streams_smoke_test, again (#20070)
It looks like we are checking for properties that are not guaranteed
under at_least_once, for example, exact counting (not allowing for
overcounting).

This change relaxes the validation constraint:

The TAGG topic contains effectively count-by-count results. So for
example, if we have the input without duplication

0 -> 1,2,3 we will get in TAGG 3 -> 1, since 1 key had 3 values.

with duplication:

0 -> 1,1,2,3 we will get in TAGG 4 -> 1, since 1 key had 4 values.

This makes the result difficult to compare. Since we run the smoke test
also with Exactly_Once, I propose to disable validation off TAGG under
ALOS.

Similarly, the topic AVG may overcount or undercount. The test case is
extremely similar to DIF, both performing a join and two streams, the
only difference being the mathematical operation performed, so we can
also disable this validation under ALOS with minimal loss of coverage.

Finally, the change fixes a bug that would throw a NPE when validation
of a windowed stream would fail.

Reviewers: Kirk True <kirk@kirktrue.pro>, Matthias J. Sax
 <matthias@confluent.io>
2025-08-25 16:08:10 -07:00
Mickael Maison abeebb3b3f MINOR: Update version to 4.1 in docs 2025-08-25 15:50:55 +02:00
Matthias J. Sax 35e5942743
Revert "KAFKA-13722: remove usage of old ProcessorContext (#18292)" (#20398)
This reverts commit f13a22af0b.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Eduwer Camacaro <eduwerc@gmail.com>, Mickael Maison <mickael.maison@gmail.com>,
2025-08-25 12:43:39 +02:00
Shashank 2dbed66c63 KAFKA-15307: Kafka Streams configuration docs outdated (#20329)
Updated Kafka Streams configuration documentation to stay latest with
version 4.0.0.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-08-17 13:14:42 -07:00
Clemens Hutter f61a5a5b41 MINOR: Remove SPAM URL in Streams Documentation (#20321)
The previous URL http://lambda-architecture.net/ seems to now be controlled by spammers

Co-authored-by: Shashank <hsshashank.grad@gmail.com>
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-08-13 12:06:22 -07:00
Matthias J. Sax d9be929f4a MINOR: add missing section to TOC (#20305)
Add new group coordinator metrics section to TOC.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-08-05 14:29:21 -07:00
Mickael Maison 23b64404ae Bump version to 4.1.0 2025-08-05 14:29:00 +02:00
Mickael Maison 6340f437cd Revert "Bump version to 4.1.0"
This reverts commit e14d849cbf.
2025-08-05 13:01:40 +02:00
Mickael Maison de16dd103a KAFKA-19581: Temporary fix for Streams system tests 2025-08-05 12:13:27 +02:00
Jared Harley fc030b411c KAFKA-19576 Fix typo in state-change log filename after rotate (#20269)
The `state-change.log` file is being incorrectly rotated to
`stage-change.log.[date]`. This change fixes the typo to have the log
file correctly rotated to `state-change.log.[date]`

_No functional changes._

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Christo Lolov
 <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Ken Huang
 <s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-08-05 12:32:16 +08:00
Ken Huang 7722cff6ce MINOR: The upgrade.html file contains duplicate IDs on the same page (#19996)
According to correct HTML syntax, IDs on the same page should be unique,
so we should fix this.

Reviewers: TengYao Chi <frankvicky@apache.org>
2025-08-05 12:30:59 +08:00
Luke Chen cdc7a4e2b7 MINOR: improve the min.insync.replicas doc (#20237)
Along with the change: https://github.com/apache/kafka/pull/17952

([KIP-966](https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas)),
the semantics of `min.insync.replicas` config has small change, and add
some constraints. We should document them clearly.

Reviewers: Jun Rao <junrao@gmail.com>, Calvin Liu <caliu@confluent.io>,
 Mickael Maison <mickael.maison@gmail.com>, Paolo Patierno
 <ppatierno@live.com>, Federico Valeri <fedevaleri@gmail.com>, Chia-Ping
 Tsai <chia7712@gmail.com>
2025-08-05 00:27:04 +08:00
Lucas Brutschy 0179193b75
KAFKA-19529: State updater sensor names should be unique (#20262) (#20274)
CI / build (push) Has been cancelled Details
All state updater threads use the same metrics instance, but do not use
unique names for their sensors. This can have the following symptoms:

1) Data inserted into one sensor by one thread can affect the metrics of
all state updater threads.
2) If one state updater thread is shutdown, the metrics associated to
all state updater threads are removed.
3) If one state updater thread is started, while another one is removed,
it can happen that a metric is registered with the `Metrics` instance,
but not associated to any `Sensor` (because it is concurrently removed),
which means that the metric will not be removed upon shutdown. If a
thread with the same name later tries to register the same metric, we
may run into a `java.lang.IllegalArgumentException: A metric named ...
already exists`, as described in the ticket.

This change fixes the bug giving unique names to the sensors. A test is
added that there is no interference of the removal of sensors and
metrics during shutdown.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-08-01 14:57:33 +02:00
lucliu1108 9c83c6d1f3 MINOR: Delete the redundant feature upgrade instruction for running KIP-1071 EA (#20250)
CI / build (push) Has been cancelled Details
Follow up on https://github.com/apache/kafka/pull/20241.

Delete the instruction that manually set `streams.version=1` for running
Kafka 4.1 since it is already achieved in previous setup steps.

Reviewers: Lucas Brutschy <lucasbru@apache.org>
2025-07-29 14:29:51 +02:00
lucliu1108 1d4b22bc3e MINOR: Improve Kafka Streams Protocol Upgrade Doc (#20241)
CI / build (push) Has been cancelled Details
As a follow-up minor addition to PR for
[KSTREAMS-7735](https://confluentinc.atlassian.net/browse/KSTREAMS-7735
) (https://github.com/apache/kafka/pull/20029) , add the instructions
for upgrading `streams.version` parameter for KIP-1071 EA.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-07-26 10:33:48 +01:00
Tsung-Han Ho (Miles Ho) d5a3acda89 KAFKA-19501 Update OpenJDK base image from buster to bullseye (#20165)
CI / build (push) Has been cancelled Details
The changes update the OpenJDK base image from 17-buster to 17-bullseye:
- Updates tests/docker/Dockerfile to use openjdk:17-bullseye instead of
openjdk:17-buster
  - Updates tests/docker/ducker-ak script to use the new default image
- Updates documentation in tests/README.md with the new image name
examples

Reviewers: Federico Valeri <fedevaleri@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-07-22 16:25:53 +02:00
Mickael Maison e14d849cbf Bump version to 4.1.0
CI / build (push) Has been cancelled Details
2025-07-21 12:01:15 +02:00
Mickael Maison ca2409695d MINOR: Revert "Bump version to 4.1.0"
This reverts commit 57e81f2010.

This is an attempt at working around https://issues.apache.org/jira/browse/KAFKA-19528
2025-07-21 11:08:50 +02:00
Calvin Liu e4e2dce2eb KAFKA-19522: avoid electing fenced lastKnownLeader (#20200)
CI / build (push) Waiting to run Details
This patch fixes the bug that allows the last known leader to be elected as a partition leader while still in a fenced state, before the next heartbeat removes the fence.
https://issues.apache.org/jira/browse/KAFKA-19522

Reviewers: Jun Rao <junrao@gmail.com>, TengYao Chi
<frankvicky@apache.org>
2025-07-20 16:55:45 +08:00
Dmitry Werner 01d8154b6e KAFKA-19520 Bump Commons-Lang for CVE-2025-48924 (#20196)
CI / build (push) Waiting to run Details
Bump Commons-Lang for CVE-2025-48924.

Reviewers: Luke Chen <showuon@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
2025-07-19 15:07:28 +08:00
Lucas Brutschy eb155a2113
MINOR: Revert "KAFKA-18913: Start state updater in task manager (#198… (#20186)
CI / build (push) Has been cancelled Details
This reverts commit 4d6cf3efef. It seemed
to trigger a race condition in the state updater initialization.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2025-07-17 17:28:08 +02:00
Ming-Yen Chung 05f012c7f1 KAFKA-19427 Allow the coordinator to grow its buffer dynamically (#20040)
CI / build (push) Waiting to run Details
* Coordinator starts with a smaller buffer, which can grow as needed.

* In freeCurrentBatch, release the appropriate buffer:
  * The Coordinator recycles the expanded buffer
(`currentBatch.builder.buffer()`), not `currentBatch.buffer`, because
`MemoryBuilder` may allocate a new `ByteBuffer` if the existing one
isn't large enough.

  * There are two cases that buffer may exceeds `maxMessageSize`      1.
If there's a single record whose size exceeds `maxMessageSize` (which,
so far, is derived from `max.message.bytes`) and the write is in
`non-atomic` mode, it's still possible for the buffer to grow beyond
`maxMessageSize`. In this case, the Coordinator should revert to using a
smaller buffer afterward.      2. Coordinator do not recycles the buffer
that larger than `maxMessageSize`. If the user dynamically reduces
`maxMessageSize` to a value even smaller than `INITIAL_BUFFER_SIZE`, the
Coordinator should avoid recycling any buffer larger than
`maxMessageSize` so that Coordinator can allocate the smaller buffer in
the next round.

* Add tests to verify the above scenarios.

Reviewers: David Jacot <djacot@confluent.io>, Sean Quah
<squah@confluent.io>, Ken Huang <s7133700@gmail.com>, PoAn Yang
<payang@apache.org>, TaiJuWu <tjwu1217@gmail.com>, Jhen-Yung Hsu
<jhenyunghsu@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-17 00:54:34 +08:00
Calvin Liu 98cb8df7a5
MINOR: Bump LATEST_PRODUCTION to 4.1IV1 and Use MV to enable ELR (#20174)
CI / build (push) Waiting to run Details
Removing the isEligibleLeaderReplicasV1Enabled to let ELR be enabled if
MV is at least 4.1IV1. Also bump the Latest Prod MV to 4.1IV1

Reviewers: Jun Rao <junrao@gmail.com>
2025-07-15 20:23:53 -07:00
Bill Bejeck f35f94b3e6 KAFKA-19504: Remove unused metrics reporter initialization in KafkaAdminClient (#20166)
CI / build (push) Has been cancelled Details
The `AdminClient` adds a telemetry reporter to the metrics reporters
list in the constructor.  The problem is that the reporter was already
added in the `createInternal` method.  In the `createInternal` method
call, the `clientTelemetryReporter` is added to a
`List<MetricReporters>` which is passed to the `Metrics` object, will
get closed when `Metrics.close()` is called.  But adding a reporter to
the reporters list in the constructor is not used by the `Metrics`
object and hence doesn't get closed, causing a memory leak.

All related tests pass after this change.

Reviewers: Apoorv Mittal <apoorvmittal10@apache.org>, Matthias J. Sax
 <matthias@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>,
 Jhen-Yung Hsu <jhenyunghsu@gmail.com>
2025-07-14 20:21:12 -04:00
Luke Chen 793294bd2e
KAFKA-19495: Update config for native image (v4.1.0) (#20151)
CI / build (push) Has been cancelled Details
Backport of https://github.com/apache/kafka/pull/20150

Reviewers: Mickael Maison <mickael.maison@gmail.com>, TengYao Chi <frankvicky@apache.org>
2025-07-11 14:38:52 +02:00
Mickael Maison d0a308e4f6 Merge tag '4.1.0-rc0' into 4.1
CI / build (push) Has been cancelled Details
4.1.0-rc0
2025-07-09 14:50:03 +02:00
Mickael Maison 610f076542 Bump version to 4.1.0 2025-07-09 14:50:03 +02:00
Mickael Maison 57e81f2010 Bump version to 4.1.0 2025-07-09 11:48:41 +02:00
José Armando García Sancio b0ff9ba161 KAFKA-19184: Add documentation for upgrading the kraft version (#20071)
CI / build (push) Waiting to run Details
Update the documentation to describe how to upgrade the kraft feature
version from 0 to 1.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Alyssa Huang
<ahuang@confluent.io>
2025-07-09 11:21:36 +02:00
Ismael Juma 487af011ca KAFKA-19444: Add back JoinGroup v0 & v1 (#20116)
CI / build (push) Has been cancelled Details
This fixes librdkafka older than the recently released 2.11.0 with
Kerberos authentication and Apache Kafka 4.x.

Even though this is a bug in librdkafka, a key goal of KIP-896 is not to
break the popular client libraries listed in it. Adding back JoinGroup
v0 & v1 is a very small change and worth it from that perspective.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-07 15:15:29 -07:00
Omnia Ibrahim e6b78ae9e5 KAFKA-19397: Ensure consistent metadata usage in produce request and response (#19964)
CI / build (push) Waiting to run Details
- Metadata doesn't have the full view of topicNames to ids during
rebootstrap of client or when topic has been deleted/recreated. The
solution is to pass down topic id and stop trying to figure it out later
in the logic.

---------

Co-authored-by: Kirk True <kirk@kirktrue.pro>
2025-07-07 19:52:15 +08:00
Ken Huang f14e60fc8f KAFKA-19042 Move ProducerSendWhileDeletionTest to client-integration-tests module (#19971)
Use Java to rewrite ProducerSendWhileDeletionTest by new test infra and
move it to client-integration-tests module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-07 16:26:51 +08:00
Jhen-Yung Hsu b4875501e1
MINOR: Add 4.1 branch to CI (#20112)
Add 4.1 branch to CI per https://github.com/apache/kafka/pull/18215

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-07 16:15:16 +08:00
Sushant Mahajan 71f5600283
KAFKA-19453: Ignore group not found in share group record replay (#20100)
* When a `ShareGroup*` record is replayed in group metadata manager,
there is a call to check if the group exists. If the group does not
exist - we are throwing an exception which is unnecessary.
* In this PR, we have added check to ignore this exception.
* New test to validate the logic has been added.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>

Note: cherry pick from PR https://github.com/apache/kafka/pull/20076 in
trunk.
2025-07-03 19:46:59 +01:00
Sushant Mahajan d02028d773
MINOR: Code change to prevent NPE to due share delete. (#20092)
* The `GroupCoordinatorService.onPartitionsDeleted` code and
`GroupMetadataManager.shareGroupBuildPartitionDeleteRequest` code looks
up the metadata image to find topic name/partitions for topic ids.
* If the topic id is not present in the image, it will throw an NPE
resulting in crash.
* This PR aims to solve the issue.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-07-02 17:16:18 +01:00
Kirk True 38f9cf6188 KAFKA-19152: Add top-level documentation for OAuth flows (#20025)
Adds documentation to support the OAuth additions from KIP-768 and
KIP-1139.

The existing documentation is heavily geared toward Kafka's support for
non-production OAuth usage. Since this mode is still supported, it
should not be removed. However, with the addition of the production
OAuth usage, the documentation is less than succinct because it has a
bit of a split personality issue.
2025-07-01 12:41:10 +05:30
Lucas Brutschy 32c8cfa87f KAFKA-19379: Basic upgrade guide for KIP-1071 EA (#20029)
Basic documentation describing:   - That it's in EA now

 - What it does

 - What features are not yet supported

 - How to enable it / disable it

 - Any changes in the interfaces

          - kafka-streams-groups.sh

          - StreamsGroupDescribe

 - How to provide feedback

Reviewers: Andrew Schofield <aschofield@confluent.io>, Matthias J. Sax
 <matthias@confluent.io>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Matthias J. Sax <mjsax@apache.org>
2025-06-30 13:56:37 +02:00
Sean Quah b55c59a661 KAFKA-19445: Fix coordinator runtime metrics sharing sensors (#20062)
When sensors are shared between different metric groups, data from all
groups is combined and added to all metrics under each sensor. This
means that different metric groups will report the same values for their
metrics.

Prefix sensor names with metric group names to isolate metric groups.

Reviewers: Yung <yungyung7654321@gmail.com>, Sushant Mahajan
<smahajan@confluent.io>, Dongnuo Lyu <dlyu@confluent.io>, TengYao Chi
<frankvicky@apache.org>
# Conflicts:
#	coordinator-common/src/test/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntimeMetricsImplTest.java
2025-06-30 17:04:03 +08:00
Matthias J. Sax adfcc9ed3f MINOR: Improve ProcessorContext JavaDocs (#20042)
Clarify that state stores are sharded, and shards cannot be shared
across Processors.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-06-26 10:08:02 -07:00
David Jacot 7bdeb36a52 KAFKA-19246; OffsetFetch API does not return group level errors correctly with version 1 (#19704)
The OffsetFetch API does not support top level errors in version 1.
Hence, the top level error must be returned at the partition level.

Side note: It is a tad annoying that we create error response in
multiple places (e.g. KafkaApis, Group CoordinatorService). There were a
reason for this but I cannot remember.

Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Sean Quah <squah@confluent.io>, Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>
2025-06-26 15:30:44 +02:00
Ritika Reddy c4cac07819
KAFKA-19414: Remove 2PC public APIs from 4.1 until release (KIP-939) (#19985)
We are removing some of the previously added public APIs until KIP-939
is ready to use.

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-06-25 09:06:21 -07:00
Lucas Brutschy aa0d1f5000 MINOR: Reject requests using unsupported features in KIP-1071 (#20031)
KIP-1071 does not currently support all features planned in the KIP. We
should reject any requests that are using features that are currently
not implemented.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Matthias J. Sax
 <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
2025-06-25 14:49:53 +02:00
Rajini Sivaram 85f9e93933 MINOR: Fix response for consumer group describe with empty group id (#20030)
ConsumerGroupDescribe with an empty group id returns a response containing `null` groupId in a non-nullable field. Since the response cannot be serialized, this results in UNKNOWN_SERVER_ERROR being returned to the client. This PR sets the group id in the response to an empty string instead and adds request tests for empty group id.

Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-25 10:50:43 +01:00
Matthias J. Sax fb054b590e KAFKA-19398: (De)Register oldest-iterator-open-since-ms metric dynamically (#20022)
The metric for oldest-iterator-open-since-ms might report a null value
if there is not open iterator.

This PR changes the behavior to dynamically register/deregister the
entire metric instead of allowing it to return a null value.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2025-06-24 17:21:53 -07:00
Calvin Liu b80aa15c17
KAFKA-19383: Handle the deleted topics when applying ClearElrRecord (#20033)
https://issues.apache.org/jira/browse/KAFKA-19383 When applying the
ClearElrRecord, it may pick up the topicId in the image without checking
if the topic has been deleted. This can cause the creation of a new
TopicRecord with an old topic ID.

Reviewers: Alyssa Huang <ahuang@confluent.io>, Artem Livshits <alivshits@confluent.io>, Colin P. McCabe <cmccabe@apache.org>

No conflicts.
2025-06-24 17:04:45 -07:00