Commit Graph

14961 Commits

Author SHA1 Message Date
Luke Chen 00a1b1e8ce Bump the commons-beanutils for CVE-2025-48734. Since `commons-validator`
CI / build (push) Has been cancelled Details
hasn't had new release with newer `commons-beanutils` versions, we manually bump it in kafka.

Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-06-11 15:27:22 +08:00
Okada Haruki 1cc14f6343 KAFKA-19334 MetadataShell execution unintentionally deletes lock file (#19817)
CI / build (push) Has been cancelled Details
## Summary
- MetadataShell may deletes lock file unintentionally when it exists or
fails to acquire lock. If there's running server, this causes unexpected
result as below:
  * MetadataShell succeeds on 2nd run unexpectedly
  * Even worse, LogManager/RaftManager's lock also no longer work from
concurrent Kafka process startup

Reviewers: TengYao Chi <frankvicky@apache.org>
# Conflicts:
#	shell/src/test/java/org/apache/kafka/shell/MetadataShellIntegrationTest.java
2025-06-09 12:48:19 +08:00
Alyssa Huang ded7653066
MINOR: Fix some Request toString methods (#19655) (#19689)
CI / build (push) Has been cancelled Details
Reviewers: Colin P. McCabe <cmccabe@apache.org>
```
Conflicts:
    clients/src/main/java/org/apache/kafka/common/requests/AlterUserScramCredentialsRequest.java - import statement
    clients/src/main/java/org/apache/kafka/common/requests/IncrementalAlterConfigsRequest.java - import statement
    core/src/test/scala/unit/kafka/server/KafkaApisTest.scala - different logging and metadatacache instantiation
```

Cherry-Picked-From: 042be5b9ac
Cherry-Picked-By: Alyssa Huang <ahuang@confluent.io>
Cherry-Picked-At: Mon May 12 11:01:47 2025 -0700
2025-05-27 15:37:29 -07:00
Dongnuo Lyu e9c5069be7 KAFKA-18687: Setting the subscriptionMetadata during conversion to consumer group (#19790)
CI / build (push) Waiting to run Details
When a consumer protocol static member replaces an existing member in a
classic group, it's not necessary to recompute the assignment. However,
it happens anyway.

In

[ConsumerGroup.fromClassicGroup](0ff4dafb7d/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/modern/consumer/ConsumerGroup.java (L1140)),
we don't set the group's subscriptionMetadata.  Later in the consumer
group heartbeat, we [call

updateSubscriptionMetadata](0ff4dafb7d/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java (L1748)),
which [notices that the group's subscriptionMetadata needs an

update](0ff4dafb7d/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java (L2757))
and bumps the epoch. Since the epoch is bumped, we [recompute the

assignment](0ff4dafb7d/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java (L1766)).

As a fix, this patch sets the subscriptionMetadata in
ConsumerGroup.fromClassicGroup.

Reviewers: Sean Quah <squah@confluent.io>, David Jacot <djacot@confluent.io>
2025-05-27 11:31:13 +02:00
Alyssa Huang 3170e1130c KAFKA-18345; Prevent livelocked elections (#19658)
CI / build (push) Has been cancelled Details
At the retry limit binaryExponentialElectionBackoffMs it becomes
statistically likely that the exponential backoff returned
electionBackoffMaxMs. This is an issue as multiple replicas can get
stuck starting elections at the same cadence.

This change fixes that by added a random jitter to the max election
backoff.

Reviewers: José Armando García Sancio <jsancio@apache.org>, TaiJuWu
 <tjwu1217@gmail.com>, Yung <yungyung7654321@gmail.com>
2025-05-20 16:10:23 -04:00
Andy Li 14fd498ed0 MINOR: API Responses missing latest version in Kafka protocol guide (#19769)
### Issue: 

API Responses missing latest version in [Kafka protocol
guide](https://kafka.apache.org/protocol.html)

#### For example:

These are missing:

- ApiVersions Response (Version: 4) — Only versions 0–3 are documented,
though version 4 of the request is included.

- DescribeTopicPartitions Response — Not listed at all.

- Fetch Response (Version: 17) — Only versions 4–16 are documented,
though version 17 of the request is included.

#### After the fix:

docs/generated/protocol_messages.html

<img width="1045" alt="image"
src="https://github.com/user-attachments/assets/5ea79ced-aab5-4c47-8e09-9956047c9bf1"
/>

Reviewers: dengziming <dengziming1993@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-05-21 00:27:32 +08:00
Matthias J. Sax 923086dba2 KAFKA-19171: Kafka Streams crashes with UnsupportedOperationException (#19507)
CI / build (push) Has been cancelled Details
This PR fixes a regression bug introduced with KAFKA-17203. We need to
pass in mutable collections into `closeTaskClean(...)`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Bruno Cadonna <bruno@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2025-05-15 21:40:08 -07:00
Matthias J. Sax 62c6697ac9 KAFKA-19208: KStream-GlobalKTable join should not drop left-null-key record (#19580)
CI / build (push) Waiting to run Details
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-05-15 18:37:39 -07:00
David Jacot d7d7876989 KAFKA-19274; Group Coordinator Shards are not unloaded when `__consumer_offsets` topic is deleted (#19713)
CI / build (push) Waiting to run Details
Group Coordinator Shards are not unloaded when `__consumer_offsets`
topic is deleted. The unloading is scheduled but it is ignored because
the epoch is equal to the current epoch:

```
[2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1]
Scheduling  unloading of metadata for __consumer_offsets-0 with epoch
OptionalInt[0]
(org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime)
[2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1] Scheduling
unloading of metadata for __consumer_offsets-1 with epoch OptionalInt[0]
(org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime)
[2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1] Ignored unloading
metadata for __consumer_offsets-0 in epoch OptionalInt[0] since current
epoch is 0.
(org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime)
[2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1] Ignored unloading
metadata for __consumer_offsets-1 in epoch OptionalInt[0] since current
epoch is 0.
(org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime)
```

This patch fixes the issue by not setting the leader epoch in this case.
The coordinator expects the leader epoch to be incremented when the
resignation code is called. When the topic is deleted, the epoch is not
incremented. Therefore, we must not use it. Note that this is aligned
with deleted partitions are handled too.

Reviewers: Dongnuo Lyu <dlyu@confluent.io>, José Armando García Sancio <jsancio@apache.org>
2025-05-15 19:12:42 +02:00
Kuan-Po Tseng f99db0804e KAFKA-19275 client-state and thread-state metrics are always "Unavailable" (#19712)
CI / build (push) Has been cancelled Details
Fix the issue where JMC is unable to correctly display client-state and
thread-state metrics. The root cause is that these two metrics directly
return the `State` class to JMX. If the user has not set up the RMI
server, JMC or other monitoring tools will be unable to interpret the
`State` class. To resolve this, we should return a string representation
of the state instead of the State class in these two metrics.

Reviewers: Luke Chen <showuon@gmail.com>, Ken Huang
 <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-05-14 14:08:01 +08:00
Sean Quah d64a97099d KAFKA-18688: Fix uniform homogeneous assignor stability (#19677)
CI / build (push) Waiting to run Details
When the number of partitions is not divisible by the number of members,
some members will end up with one more partition than others.
Previously, we required these to be the members at the start of the
iteration order, which meant that partitions could be reassigned even
when the previous assignment was already balanced.

Allow any member to have the extra partition, so that we do not move
partitions around when the previous assignment is already balanced.

Before the PR
```
Benchmark                                 (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionType)  (topicCount)  Mode  Cnt    Score   Error  Units
ServerSideAssignorBenchmark.doAssignment              FULL           RANGE          false          10000                         50         HOMOGENEOUS          1000  avgt    2   26.175          ms/op
ServerSideAssignorBenchmark.doAssignment              FULL           RANGE          false          10000                         50       HETEROGENEOUS          1000  avgt    2  123.955          ms/op
ServerSideAssignorBenchmark.doAssignment       INCREMENTAL           RANGE          false          10000                         50         HOMOGENEOUS          1000  avgt    2   24.408          ms/op
ServerSideAssignorBenchmark.doAssignment       INCREMENTAL           RANGE          false          10000                         50       HETEROGENEOUS          1000  avgt    2  114.873          ms/op
```
After the PR
```
Benchmark                                 (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionType)  (topicCount)  Mode  Cnt    Score   Error  Units
ServerSideAssignorBenchmark.doAssignment              FULL           RANGE          false          10000                         50         HOMOGENEOUS          1000  avgt    2   24.259          ms/op
ServerSideAssignorBenchmark.doAssignment              FULL           RANGE          false          10000                         50       HETEROGENEOUS          1000  avgt    2  118.513          ms/op
ServerSideAssignorBenchmark.doAssignment       INCREMENTAL           RANGE          false          10000                         50         HOMOGENEOUS          1000  avgt    2   24.636          ms/op
ServerSideAssignorBenchmark.doAssignment       INCREMENTAL           RANGE          false          10000                         50       HETEROGENEOUS          1000  avgt    2  115.503          ms/op
```

Reviewers: David Jacot <djacot@confluent.io>
2025-05-13 17:03:19 +02:00
Sean Quah 48f75616d7 KAFKA-19163: Avoid deleting groups with pending transactional offsets (#19496)
When a group has pending transactional offsets but no committed offsets,
we can accidentally delete it while cleaning up expired offsets. Add a
check to avoid this case.

Reviewers: David Jacot <djacot@confluent.io>
2025-05-13 14:10:55 +02:00
David Jacot 9aad1849e2
MINOR: Fix version in 4.0 branch (#19686)
CI / build (push) Waiting to run Details
This patch fixes the version used in the `4.0` branch. It should be
`4.0.1` instead of `4.1.0`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-05-12 20:17:30 +02:00
ChickenchickenLove 5015183c1c KAFKA-19242: Fix commit bugs caused by race condition during rebalancing. (#19631)
### Motivation
While investigating “events skipped in group

rebalancing” ([spring‑projects/spring‑kafka#3703](https://github.com/spring-projects/spring-kafka/issues/3703))
I discovered a race
condition between
- the main poll/commit thread, and
- the consumer‑coordinator heartbeat thread.

If the main thread enters
`ConsumerCoordinator.sendOffsetCommitRequest()` while the heartbeat
thread is finishing a rebalance (`SyncGroupResponseHandler.handle()`),
the group state transitions in the following order:

```
COMPLETING_REBALANCE  →  (race window)  →  STABLE
```
Because we read the state twice without a lock:
1. `generationIfStable()` returns `null` (state still
`COMPLETING_REBALANCE`),
2. the heartbeat thread flips the state to `STABLE`,
3. the main thread re‑checks with `rebalanceInProgress()` and wrongly
decides that a rebalance is still active,
4. a spurious `CommitFailedException` is returned even though the commit
could succeed.

For more details, please refer to sequence diagram below.  <img
width="1494" alt="image"

src="https://github.com/user-attachments/assets/90f19af5-5e2d-4566-aece-ef764df2d89c"
/>

### Impact
- The exception is semantically wrong: the consumer is in a stable
group, but reports failure.
- Frameworks and applications that rely on the semantics of
`CommitFailedException` and `RetryableCommitException` (for example
`Spring Kafka`) take the wrong code path, which can ultimately skip the
events and break “at‑most‑once” guarantees.

### Fix
We enlarge the synchronized block in
`ConsumerCoordinator.sendOffsetCommitRequest()` so that the consumer
group state is examined atomically with respect to the heartbeat thread:

### Jira
https://issues.apache.org/jira/browse/KAFKA-19242

https: //github.com/spring-projects/spring-kafka/issues/3703

Signed-off-by: chickenchickenlove <ojt90902@naver.com>

Reviewers: David Jacot <david.jacot@gmail.com>
2025-05-12 17:01:54 +02:00
Sean Quah cf3c177936 KAFKA-19160;KAFKA-19164; Improve performance of fetching stable offsets (#19497)
CI / build (push) Waiting to run Details
When fetching stable offsets in the group coordinator, we iterate over
all requested partitions. For each partition, we iterate over the
group's ongoing transactions to check if there is a pending
transactional offset commit for that partition.

This can get slow when there are a large number of partitions and a
large number of pending transactions. Instead, maintain a list of
pending transactions per partition to speed up lookups.

Reviewers: Shaan, Dongnuo Lyu <dlyu@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, David Jaco <djacot@confluent.io>
2025-05-12 09:34:43 +02:00
Luke Chen 1f856d437d MINOR: exclude error_prone_annotations lib from caffeine dependency (#19638)
CI / build (push) Has been cancelled Details
In https://github.com/apache/kafka/pull/16578 , we tried to exclude both
`checker-qual` and `error_prone_annotations`, but when excluding
`error_prone_annotations`, the compilation failed. So in the end, we
only excluded `checker-qual` and shipped `error_prone_annotations.jar`
to users. In Kafka v4.0.0, thanks to jdk 8 removal, we upgraded caffeine
to the latest v3.1.8, instead of v2.x.x, and now, we can successfully
pass the compilation without error after excluding
`error_prone_annotations` from `caffeine`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ken Huang <s7133700@gmail.com>
2025-05-08 10:11:20 +08:00
Kamal Chandraprakash c2068878c9
KAFKA-19131: Adjust remote storage reader thread maximum pool size to avoid illegal argument (#19629)
CI / build (push) Has been cancelled Details
The remote storage reader thread pool use same count for both maximum
and core size. If users adjust the pool size larger than original value,
it throws `IllegalArgumentException`. Updated both value to fix the
issue.

cherry-pick PR: #19532
cherry-pick commit:
965743c35b

---------

Signed-off-by: PoAn Yang <payang@apache.org>

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang
<payang@apache.org>

Co-authored-by: PoAn Yang <payang@apache.org>
2025-05-04 18:53:16 +05:30
Lucas Brutschy 0832c2ceb1
KAFKA-19195: Only send the right group ID subset to each GC shard (#19555)
CI / build (push) Has been cancelled Details
Cherry-picked from
[e79f5f0](e79f5f0f65)

If a share or consumer group is described, all group IDs sent to all
shards of the group coordinator. This change fixes it. It tested in the
unit tests, since it's somewhat inconvenient to test the passed read
operation lambda.
2025-04-28 15:33:20 +02:00
Colin Patrick McCabe 0297ba2c67 KAFKA-19192; Old bootstrap checkpoint files cause problems updated servers (#19545)
Old bootstrap.metadata files cause problems with server that include
KAFKA-18601. When the server tries to read the bootstrap.checkpoint
file, it will fail if the metadata.version is older than 3.3-IV3
(feature level 7). This causes problems when these clusters are
upgraded.

This PR makes it possible to represent older MVs in BootstrapMetadata
objects without causing an exception. An exception is thrown only if we
attempt to access the BootstrapMetadata. This ensures that only the code
path in which we start with an empty metadata log checks that the
metadata version is 7 or newer.

Reviewers: José Armando García Sancio <jsancio@apache.org>, Ismael Juma
 <ismael@juma.me.uk>, PoAn Yang <payang@apache.org>, Liu Zeyu
 <zeyu.luke@gmail.com>, Alyssa Huang <ahuang@confluent.io>
2025-04-24 16:55:22 -04:00
David Arthur 3901c8519e KAFKA-19166: Fix RC tag in release script (#19518)
The release script was pushing the RC tag off of a temporary branch that
was never merged back into the release branch. This meant that our RC
and release tags were detached from the rest of the repository.

This patch changes the release script to merge the RC tag back into the
release branch and pushes both the tag and the branch.

Reviewers: Luke Chen <showuon@gmail.com>
2025-04-22 17:57:31 +08:00
Manikumar Reddy 56743b37f6 MINOR: Supress stdout when checking Log4j 1.x configuration compatibility mode (#19502)
when using log41 config, we are printing addtional line like below. This
PR is to fix that.
2025-04-17 14:30:47 +05:30
Hong-Yi Chen 8a515da2c8 KAFKA-19054: StreamThread exception handling with SHUTDOWN_APPLICATION may trigger a tight loop with MANY logs (#19394)
Under the `SHUTDOWN_APPLICATION` configuration in Kafka Streams, a tight
loop in the shutdown process can flood logs with repeated messages. This
PR introduces a check to ensure that the shutdown log is emitted only
once every 10 seconds, thereby preventing log flooding.

Reviewers: PoAn Yang <payang@apache.org>, Matthias J. Sax <matthias@confluent.io>
2025-04-16 20:46:17 -07:00
Rajini Sivaram f98dec9440 KAFKA-19147: Start authorizer before group coordinator to ensure coordinator authorizes regex topics (#19488)
[KAFKA-18813](https://issues.apache.org/jira/browse/KAFKA-18813) added
`Topic:Describe` authorization of topics matching regex patterns to the
group coordinator since it was difficult to authorize these in the
broker when processing consumer heartbeats using the new protocol. But
group coordinator is started in `BrokerServer` before the authorizer is
created. And hence group coordinator doesn't have an authorizer and
never performs authorization. As a result, topics that are not
authorized for `Describe` may be assigned to consumers. This potentially
leaks information about topic existence, topic id and partition count to
users who are not authorized to describe a topic. This PR starts
authorizer earlier to ensure that authorization is performed by the
group coordinator. Also adds integration tests for verification.

Note that we still have a second issue when members have different
permissions. If regex is resolved by a member with permission to more
topics, unauthorized topics may be assigned to members with lower
permissions. In this case, we still return assignment containing topic
id and partitions to the member without `Topic:Describe` access. This is
not addressed by this PR, but an integration test that illustrates the
issue has been added so that we can verify when the issue is fixed.

Reviewers: David Jacot <david.jacot@gmail.com>
2025-04-16 13:08:37 +01:00
Azhar Ahmed 143fcb1d7c KAFKA-19071: Fix doc for remote.storage.enable (#19345)
As of 3.9, Kafka allows disabling remote storage on a topic after it was
enabled. It allows subsequent enabling and disabling too.

However the documentation says otherwise and needs to be corrected.

Doc:
https://kafka.apache.org/39/documentation/#topicconfigs_remote.storage.enable

Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>
2025-04-14 11:10:22 +08:00
Florian Hussonnois de27409e30 KAFKA-18962: Fix onBatchRestored call in GlobalStateManagerImpl (#19188)
Call the StateRestoreListener#onBatchRestored with numRestored and not
the totalRestored when reprocessing state

See: https://issues.apache.org/jira/browse/KAFKA-18962

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias
Sax <mjsax@apache.org>
2025-04-09 13:38:03 -07:00
José Armando García Sancio 83f6a1d7e6 KAFKA-18991; Missing change for cherry-pick 2025-04-09 12:52:15 -04:00
TengYao Chi 952c8a5e94 KAFKA-18991: FetcherThread should match leader epochs between fetch request and fetch state (#19223)
This PR fixes a potential issue where the `FetchResponse` returns
`divergingEndOffsets` with an older leader epoch. This can lead to
committed records being removed from the follower's log, potentially
causing data loss.

In detail:
`processFetchRequest` gets the requested leader epoch of partition data
by `topicPartition` and compares it with the leader epoch of the current
fetch state. If they don't match, the response is ignored.

Reviewers: Jun Rao <junrao@gmail.com>
2025-04-09 12:32:22 -04:00
José Armando García Sancio 4dbe4739bd KAFKA-18723; Better handle invalid records during replication (#18852)
For the KRaft implementation there is a race between the network thread,
which read bytes in the log segments, and the KRaft driver thread, which
truncates the log and appends records to the log. This race can cause
the network thread to send corrupted records or inconsistent records.
The corrupted records case is handle by catching and logging the
CorruptRecordException. The inconsistent records case is handle by only
appending record batches who's partition leader epoch is less than or
equal to the fetching replica's epoch and the epoch didn't change
between the request and response.

For the ISR implementation there is also a race between the network
thread and the replica fetcher thread, which truncates the log and
appends records to the log. This race can cause the network thread send
corrupted records or inconsistent records. The replica fetcher thread
already handles the corrupted record case. The inconsistent records case
is handle by only appending record batches who's partition leader epoch
is less than or equal to the leader epoch in the FETCH request.

Reviewers: Jun Rao <junrao@apache.org>, Alyssa Huang <ahuang@confluent.io>, Chia-Ping Tsai <chia7712@apache.org>
2025-04-09 11:15:52 -04:00
Parker Chang 4844bc3067 KAFKA-18984: Reset interval.ms By Using kafka-client-metrics.sh (#19213)
kafka-client-metrics.sh cannot reset the interval using `--interval=`.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-04-09 15:42:04 +08:00
PoAn Yang 33fa572a1c MINOR: remove transform and through from repartition description (#19291)
`transform` and `through` are removed in 4.0. Since users cannot
reference them in 4.0 document, it's not good to keep using them as
example in `repartition` description.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-04-08 18:13:26 -07:00
Xuan-Zhang Gong 8de7b69ced MINOR: small optimization by judgment (#19386)
judgments can help avoid unnecessary `segments.sizeInBytes()`  loops

from https://github.com/apache/kafka/pull/18393/files#r2029925512

Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-04-07 00:32:33 +08:00
Ayoub Omari 71c9d83b20 KAFKA-16407: Fix foreign key INNER join on change of FK from/to a null value (#19303)
Fixes both KAFKA-16407 and KAFKA-16434.

Summary of existing issues:

- We are ignoring new left record when its previous FK value is null
- We do not unset foreign key join result when FK becomes null

Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-04-05 20:46:56 -07:00
nilmadhab mondal 2c48809fad KAFKA-18713: Fix FK Left-Join result race condition (#19005)
When a row in a FK-join left table is updated, we should send a "delete
subscription with no response" for the old FK to the right hand side, to
avoid getting two responses from the right hand side. Only the "new
subscription" for the new FK should request a response. If two responses
are requested, there is a race condition for which both responses could
be processed in the wrong order, leading to an incorrect join result.

This PR fixes the "delete subscription" case accordingly, to no request
a response.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-04-03 16:53:01 -07:00
TengYao Chi b0b4f42f4c KAFKA-18067: Add a flag to disable producer reset during active task creator shutting down (#19269)
JIRA: KAFKA-18067

Fix producer client double-closing issue in Kafka Streams. 
During StreamThread shutdown, TaskManager closes first, which closes the
producer client. Later, calling `unsubscribe` on the main consumer may
trigger the `onPartitionsLost` callback, attempting to reset
StreamsProducer when EOS is enabled. This causes an already closed
producer to be closed twice while the newly created producer is never
closed.

In detail:
This patch adds a flag to control the producer reset and has a new
method to change this flag, which is only invoked in
`ActiveTaskCreator#close`.
This would guarantee that the disable reset producer will only occur
when StreamThread shuts down.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias Sax <mjsax@apache.org>
2025-04-03 14:14:26 -07:00
Jorge Esteban Quilcate Otoya 617c96cea4
KAFKA-15931: Cancel RemoteLogReader gracefully (#19331)
Backports f24945b519 to 4.0

Instead of reopening the transaction index, it cancels the RemoteFetchTask without interrupting it--avoiding to close the TransactionIndex channel.

This will lead to complete the execution of the remote fetch but ignoring the results. Given that this is considered a rare case, we could live with this. If it becomes a performance issue, it could be optimized.

Reviewers: Jun Rao <junrao@gmail.com>
2025-04-01 16:22:53 -07:00
PoAn Yang 4dd893ba21
KAFKA-806 Index may not always observe log.index.interval.bytes (#18842)
Currently, each log.append() will add at most 1 index entry, even when
the appended data is larger than log.index.interval.bytes. One potential
issue is that if a follower restarts after being down for a long time,
it may fetch data much bigger than log.index.interval.bytes at a time.
This means that fewer index entries are created, which can increase the
fetch time from the consumers.

(cherry picked from commit e124d3975b)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-20 16:38:54 +08:00
David Jacot 16da5a885e
MINOR: Bump to 4.0.1-SNAPSHOT (#19224)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-18 11:56:02 +01:00
TengYao Chi 4ae681446f KAFKA-18993 Remove confusing notable change section from upgrade.html (#19212)
Currently, the "Notable changes in 4.0.0" for the client is very confusing. We should remove it.

Reviewers: mingdaoy <mingdaoy@gmail.com>, Luke Chen <showuon@gmail.com>, Ken Huang <s7133700@gmail.com>, David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-16 02:53:14 +08:00
Matthias J. Sax 3e791d7d48 KAFKA-18943: Kafka Streams incorrectly commits TX during task revokation (#19164)
Fixes two issues:
 - only commit TX if no revoked tasks need to be committed
 - commit revoked tasks after punctuation triggered

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Anna Sophie Blee-Goldman <sophie@responsive.dev>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bill@confluent.io>
2025-03-13 16:56:31 -07:00
José Armando García Sancio 4990c970c1 KAFKA-18979; Report correct kraft.version in ApiVersions (#19205)
Skip kraft.version when applying FeatureLevelRecord records. The kraft.version is stored as control records and not as metadata records. This solution has the benefits of removing from snapshots any FeatureLevelRecord for kraft.version that was incorrectly written to the log and allows ApiVersions to report the correct finalized kraft.version.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2025-03-13 18:44:52 -04:00
mingdaoy 8f765a1886 KAFKA-18422 Adjust Kafka client upgrade path section (#19119)
This patch adds a section about upgrading clients to the upgrade notes.

Reviewers: Ismael Juma <ismael@juma.me.uk>, David Jacot <djacot@confluent.io>
2025-03-12 14:57:51 +01:00
Ken Huang aba4f5e2e6 KAFKA-18074: Add kafka client compatibility matrix (#18091)
Add new section for Kafka 4.0 compatibility metrics for user, so user
can check server and client in this section.

Reviewers: Divij Vaidya <diviv@amazon.com>, Matthias J. Sax <matthias@confluent.io>
2025-03-11 18:44:19 -07:00
PoAn Yang 8f0ce72cfb KAFKA-18195: Fix Kafka Streams broker compatibility matrix (#18258)
Use "incompatible" instead of an empty cell in Kafka Streams broker
compatibility docs. Also update compatibility matrix for 4.0.0 release.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-03-11 18:44:10 -07:00
Luke Chen 5df681bf8a MINOR: improve upgrade to v4.0.0 doc (#19172)
In "Upgrading to 4.0.0 from any version 0.8.x through 3.9.x" section, we
directly give instructions about [Upgrading to KRaft-based
clusters](https://kafka.apache.org/documentation/#upgrade_390_kraft),
but there might still be some users under ZK cluster before upgrading to
v4.0.0. We need to make it clear that they need to upgrade to KRaft mode
first before upgrading to v4.0.0 in "Upgrading to 4.0.0 from any version
0.8.x through 3.9.x" section.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-11 19:17:48 +08:00
TengYao Chi 42cac5a555 MINOR: Adjust ToC of zk2kraft and fix wrong section number of docker (#19146)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-09 08:57:02 +08:00
Colin P. McCabe 2e6fe07f0d HOTFIX: Add missing file from KAFKA-18920 commit. 2025-03-07 13:58:46 -08:00
Mahsa Seifikar ed43537410 MINOR: Fix missing argument in kafka-features.sh upgrade doc (#19160)
Running `bin/kafka-features.sh upgrade --release-version 4.0` results in
the following error. This PR fixes the issue by adding the required
argument.

`kafka-features: error: one of the arguments --bootstrap-server
--bootstrap-controller is required.`

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2025-03-07 13:58:30 -08:00
Calvin Liu d671e7dbc5 KAFKA-18940: fix electionWasClean (#19156)
The electionWasClean should also consider if the election is done
through ELR. Otherwise, the metric uncleanLeaderElection will wrongly
count the ELR election
https://issues.apache.org/jira/browse/KAFKA-18940

Reviewers: Jun Rao <junrao@gmail.com>
2025-03-07 13:47:50 -08:00
Colin Patrick McCabe 5c0b8bef07 KAFKA-18920: The kcontrollers must set kraft.version in ApiVersionsResponse (#19127)
The kafka controllers need to set kraft.version in their
ApiVersionsResponse messages according to the current kraft.version
reported by the Raft layer. Instead, currently they always set it to 0.

Also remove FeatureControlManager.latestFinalizedFeatures. It is not
needed and it does a lot of copying.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-07 13:47:05 -08:00
David Jacot cf269bee8c KAFKA-18046; High CPU usage when using Log4j2 (#19138)
This patch is a first step towards resolving KAFKA-18046. Apache Kafka
4.0 ships with log4j2 so the issue raised in the ticket causing high CPU
usage on the fetch path due to LoggerFactory.getLogger() being called on
the handling of all fetch responses is not good. Hence, I propose to fix
that one by caching the Logger used by the `CompletedFetch` class.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2025-03-07 09:04:03 +01:00