Commit Graph

16047 Commits

Author SHA1 Message Date
Alieh Saeedi c058c134d2
KAFKA-19496: Deflake streams admin api describe test (#20154)
This fixes the flaky

`DescribeStreamsGroupTest.testDescribeMultipleStreamsGroupWithMembersAndVerboseOptions()`,
which sometimes fails due to `ERROR stream-thread Missing source topics:
Source topics customInputTopic2 are missing`

Reviewers: Bill Bejeck <bbejeck@apache.org>
2025-07-11 15:48:24 +02:00
Luke Chen 2346c0e737
KAFKA-19495: update native image config for native images (#20150)
We failed the native image build and test workflow

[here](https://github.com/apache/kafka/actions/runs/16211393417/job/45772104969).
The failed messages are:
```
Exception in thread "main" java.lang.ExceptionInInitializerError at
org.apache.kafka.server.config.AbstractKafkaConfig.<clinit>(AbstractKafkaConfig.java:56)
at
java.base@21.0.2/java.lang.Class.ensureInitialized(DynamicHub.java:601)
at kafka.tools.StorageTool$.$anonfun$execute$1(StorageTool.scala:79) at
scala.Option.flatMap(Option.scala:283) at
kafka.tools.StorageTool$.execute(StorageTool.scala:79) at
kafka.tools.StorageTool$.main(StorageTool.scala:46) at
kafka.docker.KafkaDockerWrapper$.main(KafkaDockerWrapper.scala:57) at
kafka.docker.KafkaDockerWrapper.main(KafkaDockerWrapper.scala) at
java.base@21.0.2/java.lang.invoke.LambdaForm$DMH/sa346b79c.invokeStaticInit(LambdaForm$DMH)
Caused by: org.apache.kafka.common.config.ConfigException: Invalid value
org.apache.kafka.common.security.oauthbearer.DefaultJwtRetriever for
configuration sasl.oauthbearer.jwt.retriever.class: Class
org.apache.kafka.common.security.oauthbearer.DefaultJwtRetriever could
not be found. at
org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:778)
at
org.apache.kafka.common.config.ConfigDef$ConfigKey.<init>(ConfigDef.java:1271)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:155)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:198)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:237)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:399)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:412)
at
org.apache.kafka.common.config.internals.BrokerSecurityConfigs.<clinit>(BrokerSecurityConfigs.java:197)
... 9 more
```
After investigation, I found we have to update the native image configs
to support the new code change as described

[here](https://github.com/apache/kafka/blob/trunk/docker/native/README.md#native-image-reachability-metadata).
This PR fixes this issue and verified that the same workflow for native
image passed

[here](https://github.com/apache/kafka/actions/runs/16215454627/job/45783738496).

The PR for v4.1.0 is https://github.com/apache/kafka/pull/20151 .

Reviewers: TengYao Chi <frankvicky@apache.org>
2025-07-11 17:26:28 +08:00
Sanskar Jhajharia 27383970b6
MINOR: Cleanup Connect Module (1/n) (#19869)
CI / build (push) Waiting to run Details
Now that Kafka support Java 17, this PR makes some changes in connect
module. The changes in this PR are limited to only some files. A future
PR(s) shall follow.
The changes mostly include:
- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()

Sub modules targeted: api, basic-auth-extensions, file, json, mirror,
mirror-client

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-11 00:35:41 +08:00
Shivsundar R 56a3c6dde9
KAFKA-19485: Added check before sending acknowledgements on initial epoch. (#20135)
CI / build (push) Waiting to run Details
https://issues.apache.org/jira/browse/KAFKA-19485

**Bug :**
There is a bug in `ShareConsumeRequestManager` where we are adding
acknowledgements on initial `ShareSession` epoch even after checking for
it.
Added fix to only include acknowledgements in the request if we have to,

PR also adds the check at another point in the code where we could
potentially be sending such acknowledgements.  One of the cases could be
when metadata is refreshed with empty topic IDs after a broker restart.
This means leader information would not be available on the node.

- Consumer subscribed to a partition whose leader was node-0.
- Broker restart happens and node-0 is elected leader again. Broker
starts a new `ShareSession`.
- Background thread sends a fetch request with **non-zero** epoch.
- Broker responds with `SHARE_SESSION_NOT_FOUND`.
- Client updates session epoch to 0 once it receives this error.
- Client updates metadata but receives empty metadata response. (Leader
unavailable)
- Application thread processing the previous fetch, completes and sends
acks to piggyback on next fetch.
- Next fetch will send the piggyback acknowledgements on the fetch for
previously subscribed partitions resulting in error from broker
("`Acknowledge data present on initial epoch`"). (Currently we attempt
to send even if leader is unavailable).

**Fix** :  Add a check before sending out acknowledgments if we are on
an initial epoch.
Added unit test covering the above scenario.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-07-10 09:06:19 +01:00
Xiao Yang ded7df9707
MINOR: fix docker_release example (#19427)
CI / build (push) Waiting to run Details
Fix docker_release example.
Currently, the command doesn't display correctly

Reviewers: TengYao Chi <frankvicky@apache.org>, PoAn Yang
 <payang@apache.org>, Yung <yungyung7654321@gmail.com>, Ken Huang
 <s7133700@gmail.com>
2025-07-10 12:41:21 +08:00
Xuan-Zhang Gong 2f6ea81d0a
KAFKA-19488: Update the docs of "if-not-exists" (#20133)
"the action will only execute" is incorrect, as the admin still sends
the request. The "if-not-exists" flag is actually used to swallow the
exception

Reviewers: TengYao Chi <frankvicky@apache.org>, Nick Guo
<lansg0504@gmail.com>, Ken Huang <s7133700@gmail.com>
2025-07-10 10:26:06 +08:00
Jinhe Zhang c625b44d8c
MINOR: Throw exceptions if source topic is missing (#20123)
CI / build (push) Waiting to run Details
In the old protocol, Kafka Streams used to throw a
`MissingSourceTopicException` when a source topic is missing. In the new
protocol, it doesn’t do that anymore, while only log the status that is
returned from the broker, which contains a status that indicates that a
source topic is missing.

This change:
1. Throws an `MissingSourceTopicException` when source topic is missing
2. Adds unit tests
3. Modifies integration tests to fit both old and new protocols

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-07-09 21:19:12 +02:00
Andrew Schofield 7b8a594a22
MINOR: Tidy up in AlterShareGroupOffsetsHandler (#20130)
Minor tidying up in AlterShareGroupOffsetsHandler based on review
comment
https://github.com/apache/kafka/pull/20049#discussion_r2192904850.

Reviewers: Jimmy Wang <wangzhiwang611@gmail.com>, Lan Ding
 <isDing_L@163.com>, TaiJuWu <tjwu1217@gmail.com>, Ken Huang
 <s7133700@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping
 Tsai <chia7712@gmail.com>
2025-07-10 01:24:13 +08:00
Chang-Chi Hsu 22698493e9
MINOR: Move partitions == 0 logic from waitForTopic to waitTopicDeletion (#20108)
## Changes

- The partitions == 0 branch has been moved from **waitForTopic** to
**waitTopicDeletion**.

## Reasons

- Clarify the responsibility of each helper method makes the test code
easier to reason by moving the partitions == 0 logic from
**waitForTopic** into a dedicated method **waitTopicDeletion**.

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TaiJuWu
 <tjwu1217@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Ken Huang
 <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-10 01:10:20 +08:00
Jhen-Yung Hsu 007fe6e92a
KAFKA-19466 LogConcurrencyTest should close the log when the test completes (#20110)
- Fix testUncommittedDataNotConsumedFrequentSegmentRolls() and
testUncommittedDataNotConsumed(), which call createLog() but never close
the log when the tests complete.
- Move LogConcurrencyTest to the Storage module and rewrite it in Java.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-10 01:01:42 +08:00
Lucas Brutschy dabde76ebf
KAFKA-19477: Sticky Assignor JMH Benchmark (#20118)
CI / build (push) Waiting to run Details
The current assignor used in KIP-1071 is verbatim the assignor used on
the client-side. The assignor performance was not a big concern on the
client-side, and it seems some additional performance overhead has crept
in during the adaptation to the broker-side interfaces, so we expect it
to be too slow for groups of non-trivial size.

We base ourselves on the share-group parameters for these benchmarks:

 - Up to 1000 members      - Up to 100 topics      - Up to 100
partitions per topic

Note, however, that the parameters influencing the Streams assignment
are different and more complicated compared to regular consumer groups /
share consumer groups. The assignment logic is independent of the number
of topics, but depends on the number of subtopologies. A subtopology may
read from multiple topics. We simplify this relationship by assuming one
topic per subtopology Members may be part of the same process or
separate processes. We introduce a parameter membersPerProcess to tune
two extreme configurations (1, 50).

We define 50% of the subtopologies to be stateful. Stateful
subtopologies get standby replicas assigned, if enabled. For example, if
we have 100 subtopologies with 100 partitions, we get 10,000 active
tasks and 5,000 standby tasks. 

Reviewers: Bill Bejeck <bbejeck@apache.org>
2025-07-09 13:58:03 +02:00
José Armando García Sancio e42e01eec3
KAFKA-19184: Add documentation for upgrading the kraft version (#20071)
Update the documentation to describe how to upgrade the kraft feature
version from 0 to 1.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Alyssa Huang
<ahuang@confluent.io>
2025-07-09 11:20:47 +02:00
Ming-Yen Chung ff4d951027
KAFKA-17715 Remove argument force_use_zk_connection from kafka_acls_cmd_with_optional_security_settings (#19209)
The e2e tests currently cover version 2.1.0 and above. Thus, we can
remove `force_use_zk_connection` in
`kafka_acls_cmd_with_optional_security_settings`

In contrast, the `force_use_zk_connection` in
`kafka_topics_cmd_with_optional_security_settings` and
`kafka_configs_cmd_with_optional_security_settings` still needs to be
kept as `kafka-topics.sh` does not support `--bootstrap-server` in 2.1
and 2.2

e2e test result:
```
===========================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-07-02--001
run time:         200 minutes 28.399 seconds
tests run:        90
passed:           90
flaky:            0
failed:           0
ignored:          0
===========================================
```

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-09 17:07:56 +08:00
Sushant Mahajan 8aa5eae2f9
KAFKA-19457: Make share group init retry interval configurable. (#20104)
* While creating share group init requests  in
`GroupMetadataManager.shareGroupHeartbeat`,  we check for topics in
`initializing` state and if they are a certain amount of time old, we
issue retry requests for the same.
* The interval for considering initializing topics as old was based of
`offsetsCommitTimeoutMs` and was not configurable.
* In this PR, we remedy the situation by introducing a new config to
supply the value. The default is `30_000` which is a
heuristic based on the fact that the share coordinator `persister`
retries request with exponential backoff, with upper cap of `30_000`
seconds.
* Tests have been updated wherever applicable.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Lan Ding
 <isDing_L@163.com>, TaiJuWu <tjwu1217@gmail.com>, Andrew Schofield
 <aschofield@confluent.io>
2025-07-09 09:52:58 +01:00
Abhinav Dixit e489682c45
KAFKA-19450: ShareConsumerPerformance does not handle exceptions from consumeMessagesForSingleShareConsumer (#20126)
### About
Within `ShareConsumerPerformance.java`, all the share consumers run with
within an executorService object and when we
perform `executorService.submit()`, we do not store this future and
exception would be recovered only when we do a future.get() in this
case. I believe this is a shortcoming
in `ShareConsumerPerformance.java` which needs to be improved.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-07-09 09:51:05 +01:00
Gaurav Narula 36b9bb94f1
KAFKA-19474 Move WARN log on log truncation below HWM (#20106)
CI / build (push) Waiting to run Details
#5608 introduced a regression where the check for `targetOffset <
log.highWatermark`
to emit a `WARN` log was made incorrectly after truncating the log.

This change moves the check for `targetOffset < log.highWatermark`  to
`UnifiedLog#truncateTo` and ensures we emit a `WARN` log on truncation
below  the replica's HWM by both the `ReplicaFetcherThread` and
`ReplicaAlterLogDirsThread`

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-07-09 09:55:02 +08:00
Jonah Hooper d86ba7f54a
KAFKA-18681: Created GetReplicaLogInfo RPCs (#19664)
CI / build (push) Waiting to run Details
Creates GetReplicaLogInfoRequest and GetReplicaLogInfoResponse RPCs
Information returned by these brokers will be used to aid
unclean-recovery by selecting longest logs.

Reviewers: Alyssa Huang <ahuang@confluent.io>, Calvin Liu <caliu@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, TaiJuWu <tjwu1217@gmail.com>
2025-07-08 10:41:01 -07:00
Masahiro Mori ea7b145860
KAFKA-19390: Call safeForceUnmap() in AbstractIndex.resize() on Linux to prevent stale mmap of index files (#19961)
https://issues.apache.org/jira/browse/KAFKA-19390

The AbstractIndex.resize() method does not release the old memory map
for both index and time index files.  In some cases, Mixed GC may not
run for a long time, which can cause the broker to crash when the
vm.max_map_count limit is reached.

The root cause is that safeForceUnmap() is not being called on Linux
within resize(), so we have changed the code to unmap old mmap on all
operating systems.

The same problem was reported in
[KAFKA-7442](https://issues.apache.org/jira/browse/KAFKA-7442), but the
PR submitted at that time did not acquire all necessary locks around the
mmap accesses and was closed without fixing the issue.

Reviewers: Jun Rao <junrao@gmail.com>
2025-07-08 09:15:32 -07:00
Alieh Saeedi db1c6f31a3
KAFKA-18288: Fix Streams CLI describe (#20099)
CI / build (push) Waiting to run Details
This PR includes the following fixes:

- Streams CLI used to list and return the description of the first group
which is a bug. With this fix, it returns the descriptions of the groups
specified by the `--group` or `all-groups`. Integration test are added
to verify the fix.
- `timeoutOption` is missing in describe groups. This fix adds and tests
it with short timeout.
- `DescribeStreamsGroupsHandler` used to return an empty group in `DEAD`
state when the group id was not found, but with this fix, it throws
`GroupIdNotFoundException`
2025-07-08 15:28:56 +02:00
Lucas Brutschy a88fd01e74
KAFKA-19478 [1/N]: Precompute values in ProcessState (#20120)
This is a very mechanical and obvious change that is making most
accessors in ProcessState constant time O(1), instead of linear time
O(n), by computing the collections and aggregations at insertion time,
instead of every time the value is accessed.

Since the accessors are used in deeply nested loops, this reduces the
runtime of our worst case benchmarks by ~14x.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2025-07-08 13:32:47 +02:00
Jhen-Yung Hsu dde0b8cd92
MINOR: Prevent unnecessary test runs - KAFKA-19042 follow-up (#20122)
CI / build (push) Waiting to run Details
PlaintextConsumerTest should extend AbstractConsumerTest instead
BaseConsumerTest. Otherwise, those tests will be executed on both
`clients-integration-tests` and `core` (see
https://github.com/apache/kafka/pull/20081/files#r2190749592).

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-08 07:42:15 +08:00
Mickael Maison a3ed705092
MINOR: Fix build warning in Streams (#20098)
CI / build (push) Waiting to run Details
When building Streams I get this warning:
```
> Task :streams:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note:
<PATH>/kafka/streams/src/test/java/org/apache/kafka/streams/processor/internals/RecordCollectorTest.java
uses unchecked or unsafe operations.
```

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Ken Huang
 <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-08 02:44:12 +08:00
Ken Huang a399852ced
KAFKA-19042 Move PlaintextConsumerTest to client-integration-tests module (#20081)
Use Java to rewrite PlaintextConsumerTest by new test infra and  move it
to client-integration-tests module.

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-08 01:41:59 +08:00
Ismael Juma 4b607616c7
KAFKA-19444: Add back JoinGroup v0 & v1 (#20116)
This fixes librdkafka older than the recently released 2.11.0 with
Kerberos authentication and Apache Kafka 4.x.

Even though this is a bug in librdkafka, a key goal of KIP-896 is not to
break the popular client libraries listed in it. Adding back JoinGroup
v0 & v1 is a very small change and worth it from that perspective.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-07 08:44:24 -07:00
Bolin Lin e8ee7fc210
KAFKA-19315 Move ControllerMutationQuotaManager to server module (#19807)
CI / build (push) Has been cancelled Details
Migrate ControllerMutationQuotaManager to Java implementation and move
to server module, including ClientQuotaManager and associated files.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-07 01:55:38 +08:00
Ken Huang d31885d33c
MINOR: Use <code> block instead of backtick (#20107)
CI / build (push) Waiting to run Details
When writing HTML, it's recommended to use the <code> element instead of
backticks for inline code formatting.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, TengYao Chi
<frankvicky@apache.org>
2025-07-06 14:49:51 +08:00
Omnia Ibrahim 9df616da76
KAFKA-19397: Ensure consistent metadata usage in produce request and response (#19964)
CI / build (push) Has been cancelled Details
Flaky Test Report / Flaky Test Report (push) Has been cancelled Details
- Metadata doesn't have the full view of topicNames to ids during
rebootstrap of client or when topic has been deleted/recreated. The
solution is to pass down topic id and stop trying to figure it out later
in the logic.

---------

Co-authored-by: Kirk True <kirk@kirktrue.pro>
2025-07-04 17:44:09 +01:00
Andrew Schofield da4fbba279
KAFKA-19468: Ignore unsubscribed topics when computing share assignment (#20101)
When the group coordinator is processing a heartbeat from a share
consumer, it must decide whether the recompute the assignment. Part of
this decision hinges on whether the assigned partitions match the
partitions initialised by the share coordinator. However, when the set
of subscribed topics changes, there may be initialised partitions which
are not currently assigned. Topics which are not subscribed should be
omitted from the calculation about whether to recompute the assignment.

Co-authored-by: Sushant Mahajan <smahajan@confluent.io>

Reviewers: Lan Ding <53332773+DL1231@users.noreply.github.com>, Sushant
 Mahajan <smahajan@confluent.io>, Apoorv Mittal
 <apoorvmittal10@gmail.com>
2025-07-04 14:55:19 +01:00
Andrew Schofield 860853dba2
KAFKA-19363: Finalize heterogeneous simple share assignor (#20074)
CI / build (push) Waiting to run Details
Finalise the share group SimpleAssignor for heterogeneous subscriptions.
The assignor code is much more accurate about the number of partitions
assigned to each member, and the number of members assigned for each
partition. It eliminates the idea of hash-based assignment because that
has been shown to the unhelpful. The revised code is very much more
effective at assigning evenly as the number of members grows and shrinks
over time.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
2025-07-04 10:35:31 +01:00
Jhen-Yung Hsu 4e31e270ba
MINOR: Update the docs for Metrics (#20094)
CI / build (push) Waiting to run Details
Update the outdated Javadocs in Metrics.java. The `MetricName(String
name, String group)` constructor in MetricName.java was removed in

59b918ec2b
Minor typo fixes included.

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-04 02:01:29 +08:00
Jhen-Yung Hsu 2e3ddb22ae
MINOR: Fix the tests in LogValidatorTest (#20093)
CI / build (push) Waiting to run Details
Fix incorrect tests introduced in the refactor

5b9cbcf886

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>,
Chia-Ping Tsai <chia7712@gmail.com>
2025-07-03 19:04:43 +08:00
Sushant Mahajan 268cf664c3
KAFKA-19454: Handle topics missing in metadata in share delete. (#20090)
* There are instances where share group delete calls in group
coordinator (`onPartitionsDelete`, `deleteShareGroups`) where we lookup
the metadata image to fetch the topic id/partitions/topic name for a
topic name/id. However, there have been
instances where the looked up info was not found due to cluster being
under load or the underlying topic being deleted and information not
propagated correctly.
* To remedy the same, this PR adds checks to determine that topic is
indeed present in the image before the lookups thus preventing NPEs. The
problematic situations are logged.
* New tests have been added for `GroupMetadataManger` and
`GroupCoordinatorService`.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-07-03 11:19:24 +01:00
Andrew Schofield 729f9ccf06
KAFKA-19440: Handle top-level errors in AlterShareGroupOffsets RPC (#20049)
While testing the code in https://github.com/apache/kafka/pull/19820, it
became clear that the error handling problems were due to the underlying
Admin API. This PR fixes the error handling for top-level errors in the
AlterShareGroupOffsets RPC.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Lan Ding
 <isDing_L@163.com>, TaiJuWu <tjwu1217@gmail.com>
2025-07-03 11:00:56 +01:00
Luke Chen eb378da99c
KAFKA-19462: Count fetch size when remote fetch (#20088)
CI / build (push) Waiting to run Details
Estimate the fetch size for remote fetch to avoid to exceed the
`fetch.max.bytes` config. We don't want to query the remoteLogMetadata
during API handling, thus we assume the remote fetch can get
`max.partition.fetch.bytes` size. Tests added.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-07-03 10:45:59 +08:00
Abhinav Dixit 7cb370b786
KAFKA-19463: nextFetchOffset does not take ongoing state transition into account (#20080)
CI / build (push) Waiting to run Details
### About
`nextFetchOffset` function in `SharePartition` updates the fetch offsets
without considering batches/offsets which might be undergoing state
transition. This can cause problems in updating to the right fetch
offset.

### Testing
The new code added has been tested with the help of unit tests.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
2025-07-02 18:09:43 +01:00
Yunchi Pang 42041f4772
MINOR: Refactor createResponseConfig to avoid collection copy and conversion (#19867)
issue: https://github.com/apache/kafka/pull/19687/files#r2094574178

Why:
- To improve performance by avoiding redundant temporary collections and
repeated method calls.
- To make the utility more flexible for inputs from both Java and Scala.

What:
- Refactored `createResponseConfig` in `ConfigHelper.scala` by
overloading the method to accept both Java maps and `AbstractConfig`.
- Extracted helper functions to `ConfigHelperUtils` in the server
module.

Reviewers: Ken Huang <s7133700@gmail.com>, Jhen-Yung Hsu
<jhenyunghsu@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping
Tsai <chia7712@gmail.com>
2025-07-02 21:32:11 +08:00
Sanskar Jhajharia 220ff4f774
MINOR: Cleanup JMH-Benchmarks Module (#19791)
Now that Kafka supports Java 17, this PR makes some changes in
jmh-benchmarks module. The changes mostly include:
- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Ken Huang
 <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-07-02 20:53:57 +08:00
Sushant Mahajan 28c53ba09a
KAFKA-19453: Ignore group not found in share group record replay. (#20076)
CI / build (push) Waiting to run Details
* When a `ShareGroup*` record is replayed in group
metadata manager, there is a call to check if the group exists. If the
group does not exist - we are throwing an exception which is
unnecessary.
* In this PR, we have added check to ignore this exception.
* New test to validate the logic has been added.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Dongnuo Lyu
<139248811+dongnuo123@users.noreply.github.com>
2025-07-02 10:10:14 +01:00
stroller 14ea11dc31
KAFKA-19371: Don't create the __remote_log_metadata topic when it already exists during broker restarts (#19899)
* The CREATE_TOPIC request gets issued only when it is clear that the
topic does not exist in the cluster.
* When the request to describe the topic gets timed-out or any exception
thrown other than UnknownTopicOrPartitionException, then the same gets
re-thrown and the describe/create topic request gets retried in the next
iteration until the initializationRetryMaxTimeoutMs gets breached.

Fixes: https://issues.apache.org/jira/browse/KAFKA-19371

Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash
<kamal.chandraprakash@gmail.com>

---------

Co-authored-by: stroller.fu <stroller.fu@zoom.us>
2025-07-02 11:22:26 +05:30
Matthias J. Sax eaa55c420b
MINOR: simplify OpenIterator (#20060)
CI / build (push) Waiting to run Details
We can remove the explicit counter for open iterators, and just use
size() on the set of open iterators we track anyway.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-07-01 12:54:49 -07:00
Lucas Brutschy 2ae85ef214
KAFKA-19429: Deflake streams_smoke_test, again (#20070)
It looks like we are checking for properties that are not guaranteed
under at_least_once, for example, exact counting (not allowing for
overcounting).

This change relaxes the validation constraint:

The TAGG topic contains effectively count-by-count results. So for
example, if we have the input without duplication

0 -> 1,2,3 we will get in TAGG 3 -> 1, since 1 key had 3 values.

with duplication:

0 -> 1,1,2,3 we will get in TAGG 4 -> 1, since 1 key had 4 values.

This makes the result difficult to compare. Since we run the smoke test
also with Exactly_Once, I propose to disable validation off TAGG under
ALOS.

Similarly, the topic AVG may overcount or undercount. The test case is
extremely similar to DIF, both performing a join and two streams, the
only difference being the mathematical operation performed, so we can
also disable this validation under ALOS with minimal loss of coverage.

Finally, the change fixes a bug that would throw a NPE when validation
of a windowed stream would fail.

Reviewers: Kirk True <kirk@kirktrue.pro>, Matthias J. Sax
 <matthias@confluent.io>
2025-07-01 21:48:07 +02:00
Sushant Mahajan 05b2601dde
KAFKA-19456: State and leader epoch should not be updated on writes. (#20079)
* If a write request with higher state than seen so far for a
specific share partition arrives at the share coordinator, the code will
create a new share snapshot and also update the internal view of the
state epoch.
* For writes with higher leader epoch, the current records are updated
with that value as well.
* The above is not the expected behavior and only initialize RPCs should
set and alter the state epoch and read RPC should set the leader epoch.
* This PR rectifies the behavior.
* Few tests have been removed.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-07-01 19:57:57 +01:00
Sushant Mahajan ac583ad2c0
KAFKA-19455: Retry persister request on metadata image issues. (#20078)
* If we get an `UNKNOWN_TOPIC_OR_PARTITION` response from the
`ShareCoordinator` is could imply a transient issue where the metadata
image is not upto date.
* In this case it makes sense to retry the request to give time for data
to be available.
* In this PR, we are making the required change.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-07-01 19:47:59 +01:00
Tsung-Han Ho (Miles Ho) ad934d3202
MINOR: Remove threadNamePrefix parameter from ReplicaManager and ReplicaFetcherManager (#20069)
CI / build (push) Waiting to run Details
- remove `threadNamePrefix` from `ReplicaManager` constructor
- update `BrokerServer` to use updated constructor
- remove `threadNamePrefix` from `ReplicaFetcherManager`

Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi
 <frankvicky@apache.org>
2025-07-01 20:36:50 +08:00
Kirk True 3c902bacc0
KAFKA-19152: Add top-level documentation for OAuth flows (#20025)
CI / build (push) Waiting to run Details
Adds documentation to support the OAuth additions from KIP-768 and
KIP-1139.

The existing documentation is heavily geared toward Kafka's support for
non-production OAuth usage. Since this mode is still supported, it
should not be removed. However, with the addition of the production
OAuth usage, the documentation is less than succinct because it has a
bit of a split personality issue.
2025-07-01 12:39:37 +05:30
Matthias J. Sax c8f83592b2
MINOR: improve StreamsProducer error handling (#20058)
CI / build (push) Waiting to run Details
StreamProducer may timeout in sendOffsetsToTransaction() or
commitTransaction() call. To distinguish both cases, we should make both
calls in individual try-catch blocks.

Reviewers: Bill Bejeck<bbejeck@apache.org>
2025-06-30 15:03:35 -07:00
Jhen-Yung Hsu 64aebb5621
MINOR: remove unused FlattenedIterator (#20067)
CI / build (push) Waiting to run Details
Remove FlattenedIterator. It’s no longer used anywhere after
https://github.com/apache/kafka/pull/20037.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Lan Ding
 <isDing_L@163.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-06-30 20:02:08 +08:00
Lucas Brutschy 53d654ab6e
KAFKA-19379: Basic upgrade guide for KIP-1071 EA (#20029)
CI / build (push) Waiting to run Details
Basic documentation describing:   - That it's in EA now

 - What it does

 - What features are not yet supported

 - How to enable it / disable it

 - Any changes in the interfaces

          - kafka-streams-groups.sh

          - StreamsGroupDescribe

 - How to provide feedback

Reviewers: Andrew Schofield <aschofield@confluent.io>, Matthias J. Sax
 <matthias@confluent.io>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Matthias J. Sax <mjsax@apache.org>
2025-06-30 09:28:22 +02:00
Sean Quah 08eda2ebed
KAFKA-19445: Fix coordinator runtime metrics sharing sensors (#20062)
When sensors are shared between different metric groups, data from all
groups is combined and added to all metrics under each sensor. This
means that different metric groups will report the same values for their
metrics.

Prefix sensor names with metric group names to isolate metric groups.

Reviewers: Yung <yungyung7654321@gmail.com>, Sushant Mahajan
<smahajan@confluent.io>, Dongnuo Lyu <dlyu@confluent.io>, TengYao Chi
<frankvicky@apache.org>
2025-06-30 15:14:39 +08:00
Yunchi Pang 975fe75d25
MINOR: Make feature lists immutable (#20052)
Replaces `.collect(Collectors.toList())` with `.toList()` for feature
collections, ensuring they are immutable and preventing accidental
modification.

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Yung
<yungyung7654321@gmail.com>, Ken Huang <s7133700@gmail.com>, TengYao Chi
<frankvicky@apache.org>
2025-06-30 12:35:46 +08:00