The changes update the OpenJDK base image from 17-buster to 17-bullseye:
- Updates tests/docker/Dockerfile to use openjdk:17-bullseye instead of
openjdk:17-buster
- Updates tests/docker/ducker-ak script to use the new default image
- Updates documentation in tests/README.md with the new image name
examples
Reviewers: Federico Valeri <fedevaleri@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
This patch fixes the bug that allows the last known leader to be elected as a partition leader while still in a fenced state, before the next heartbeat removes the fence.
https://issues.apache.org/jira/browse/KAFKA-19522
Reviewers: Jun Rao <junrao@gmail.com>, TengYao Chi
<frankvicky@apache.org>
* Coordinator starts with a smaller buffer, which can grow as needed.
* In freeCurrentBatch, release the appropriate buffer:
* The Coordinator recycles the expanded buffer
(`currentBatch.builder.buffer()`), not `currentBatch.buffer`, because
`MemoryBuilder` may allocate a new `ByteBuffer` if the existing one
isn't large enough.
* There are two cases that buffer may exceeds `maxMessageSize` 1.
If there's a single record whose size exceeds `maxMessageSize` (which,
so far, is derived from `max.message.bytes`) and the write is in
`non-atomic` mode, it's still possible for the buffer to grow beyond
`maxMessageSize`. In this case, the Coordinator should revert to using a
smaller buffer afterward. 2. Coordinator do not recycles the buffer
that larger than `maxMessageSize`. If the user dynamically reduces
`maxMessageSize` to a value even smaller than `INITIAL_BUFFER_SIZE`, the
Coordinator should avoid recycling any buffer larger than
`maxMessageSize` so that Coordinator can allocate the smaller buffer in
the next round.
* Add tests to verify the above scenarios.
Reviewers: David Jacot <djacot@confluent.io>, Sean Quah
<squah@confluent.io>, Ken Huang <s7133700@gmail.com>, PoAn Yang
<payang@apache.org>, TaiJuWu <tjwu1217@gmail.com>, Jhen-Yung Hsu
<jhenyunghsu@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Removing the isEligibleLeaderReplicasV1Enabled to let ELR be enabled if
MV is at least 4.1IV1. Also bump the Latest Prod MV to 4.1IV1
Reviewers: Jun Rao <junrao@gmail.com>
The `AdminClient` adds a telemetry reporter to the metrics reporters
list in the constructor. The problem is that the reporter was already
added in the `createInternal` method. In the `createInternal` method
call, the `clientTelemetryReporter` is added to a
`List<MetricReporters>` which is passed to the `Metrics` object, will
get closed when `Metrics.close()` is called. But adding a reporter to
the reporters list in the constructor is not used by the `Metrics`
object and hence doesn't get closed, causing a memory leak.
All related tests pass after this change.
Reviewers: Apoorv Mittal <apoorvmittal10@apache.org>, Matthias J. Sax
<matthias@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>,
Jhen-Yung Hsu <jhenyunghsu@gmail.com>
Update the documentation to describe how to upgrade the kraft feature
version from 0 to 1.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Alyssa Huang
<ahuang@confluent.io>
This fixes librdkafka older than the recently released 2.11.0 with
Kerberos authentication and Apache Kafka 4.x.
Even though this is a bug in librdkafka, a key goal of KIP-896 is not to
break the popular client libraries listed in it. Adding back JoinGroup
v0 & v1 is a very small change and worth it from that perspective.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
- Metadata doesn't have the full view of topicNames to ids during
rebootstrap of client or when topic has been deleted/recreated. The
solution is to pass down topic id and stop trying to figure it out later
in the logic.
---------
Co-authored-by: Kirk True <kirk@kirktrue.pro>
Use Java to rewrite ProducerSendWhileDeletionTest by new test infra and
move it to client-integration-tests module.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
* When a `ShareGroup*` record is replayed in group metadata manager,
there is a call to check if the group exists. If the group does not
exist - we are throwing an exception which is unnecessary.
* In this PR, we have added check to ignore this exception.
* New test to validate the logic has been added.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
Note: cherry pick from PR https://github.com/apache/kafka/pull/20076 in
trunk.
* The `GroupCoordinatorService.onPartitionsDeleted` code and
`GroupMetadataManager.shareGroupBuildPartitionDeleteRequest` code looks
up the metadata image to find topic name/partitions for topic ids.
* If the topic id is not present in the image, it will throw an NPE
resulting in crash.
* This PR aims to solve the issue.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Adds documentation to support the OAuth additions from KIP-768 and
KIP-1139.
The existing documentation is heavily geared toward Kafka's support for
non-production OAuth usage. Since this mode is still supported, it
should not be removed. However, with the addition of the production
OAuth usage, the documentation is less than succinct because it has a
bit of a split personality issue.
Basic documentation describing: - That it's in EA now
- What it does
- What features are not yet supported
- How to enable it / disable it
- Any changes in the interfaces
- kafka-streams-groups.sh
- StreamsGroupDescribe
- How to provide feedback
Reviewers: Andrew Schofield <aschofield@confluent.io>, Matthias J. Sax
<matthias@confluent.io>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Matthias J. Sax <mjsax@apache.org>
When sensors are shared between different metric groups, data from all
groups is combined and added to all metrics under each sensor. This
means that different metric groups will report the same values for their
metrics.
Prefix sensor names with metric group names to isolate metric groups.
Reviewers: Yung <yungyung7654321@gmail.com>, Sushant Mahajan
<smahajan@confluent.io>, Dongnuo Lyu <dlyu@confluent.io>, TengYao Chi
<frankvicky@apache.org>
# Conflicts:
# coordinator-common/src/test/java/org/apache/kafka/coordinator/common/runtime/CoordinatorRuntimeMetricsImplTest.java
The OffsetFetch API does not support top level errors in version 1.
Hence, the top level error must be returned at the partition level.
Side note: It is a tad annoying that we create error response in
multiple places (e.g. KafkaApis, Group CoordinatorService). There were a
reason for this but I cannot remember.
Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Sean Quah <squah@confluent.io>, Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>
KIP-1071 does not currently support all features planned in the KIP. We
should reject any requests that are using features that are currently
not implemented.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Matthias J. Sax
<matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
ConsumerGroupDescribe with an empty group id returns a response containing `null` groupId in a non-nullable field. Since the response cannot be serialized, this results in UNKNOWN_SERVER_ERROR being returned to the client. This PR sets the group id in the response to an empty string instead and adds request tests for empty group id.
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
The metric for oldest-iterator-open-since-ms might report a null value
if there is not open iterator.
This PR changes the behavior to dynamically register/deregister the
entire metric instead of allowing it to return a null value.
Reviewers: Bill Bejeck <bbejeck@apache.org>
https://issues.apache.org/jira/browse/KAFKA-19383 When applying the
ClearElrRecord, it may pick up the topicId in the image without checking
if the topic has been deleted. This can cause the creation of a new
TopicRecord with an old topic ID.
Reviewers: Alyssa Huang <ahuang@confluent.io>, Artem Livshits <alivshits@confluent.io>, Colin P. McCabe <cmccabe@apache.org>
No conflicts.
Previously, we could wait for up to half of the broker session timeout
for an RPC to complete, and then delay by up to half of the broker
session timeout. When taken together, these two delays could lead to
brokers erroneously missing heartbeats.
This change removes exponential backoff for heartbeats sent from the
broker to the controller. The load caused by heartbeats is not heavy,
and controllers can easily time out heartbeats when the queue length is
too long. Additionally, we now set the maximum RPC time to the length of
the broker period. This minimizes the impact of heavy load.
Reviewers: José Armando García Sancio <jsancio@apache.org>, David Arthur <mumrah@gmail.com>
If there are more deletion filters after we initially hit the
`MAX_RECORDS_PER_USER_OP` bound, we will add an additional deletion
record ontop of that for each additional filter.
The current error message returned to the client is not useful either,
adding logic so client doesn't just get `UNKNOWN_SERVER_EXCEPTION` with
no details returned.
Reverts #16922 and #18732 of incomplete feature.
PRs #16922 and #18732 are part of
[KIP-1035](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1035%3A+StateStore+managed+changelog+offsets).
In particular,
on starting a Kafka Streams instance, if it has pre-existing state, the
state stores are initialized on the main thread. Part of this
initialization registers the stateful metrics with the JMX thread-id tag
of `main`. This breaks the KIP-1076 implementation where need to
register metrics with thread-id tags of `xxxStreamThread-N`. This is
necessary due to the fact that the `StreamsMetric` is a singleton shared
by all `StreamThread` instances, so we need to make sure only add
metrics for the current `StreamThread` otherwise duplicate metrics are
registered. This PR reverts the changes until a fix is implemented,
allowing the individual `StreamThread`s to register the metrics.
Reviewers: Matthias J. Sax <matthias@confluent.io>
## Summary
- Fix potential race condition in
LogSegment#readMaxTimestampAndOffsetSoFar(), which may result in
non-monotonic offsets and causes replication to stop.
- See https://issues.apache.org/jira/browse/KAFKA-19407 for the details
how it happen.
Reviewers: Vincent PÉRICART <mauhiz@gmail.com>, Jun Rao
<junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
The PR cherrypicks below commit:
```
commit 1ca8779bee
Author: Apoorv Mittal <apoorvmittal10@gmail.com>
Date: Tue Jun 24 09:46:14 2025 +0100
MINOR: Correcting client error for fenced share partition (#20023)
Correct the error when SharePartition is fenced.
Reviewers: Abhinav Dixit <adixit@confluent.io>, Sushant Mahajan
<smahajan@confluent.io>, Andrew Schofield <aschofield@confluent.io>
```
And selective changes from following commit, which are critical:
```
commit 3d4407ff9d
Author: Sushant Mahajan <smahajan@confluent.io>
Date: Mon Jun 23 23:57:15 2025 +0530
MINOR: Change exceptions for few error codes in SharePartition.
(#20020)
* The `SharePartition` class wraps the errors received from
`PersisterStateManager` to be sent to the client. * In this PR, we
are categorizing the errors a bit better. * Some exception messages
in `PersisterStateManager` have been updated to show the share
partition key. * Tests have been updated wherever needed.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal
<apoorvmittal10@gmail.com>
```
Reviewers: Andrew Schofield <aschofield@confluent.io>, Sushant Mahajan
<smahajan@confluent.io>
* The share group rebalance metric was not being invoked at the
appropriate group id bump position.
* This PR solves the issue.
* The metric name has been updated
(s/rebalance-rate/share-group-rebalance-rate,
s/rebalance-count/share-group-rebalance-count/)
* Updated tests in `GroupMetadataManagerTest` and
`GroupCoordinatorMetricsTest`
Reviewers: Andrew Schofield <aschofield@confluent.io>
It looks like we are checking for properties that are not guaranteed
under at_least_once, for example, exact counting (not allowing for
overcounting).
This change relaxes the validation constraint to only check that we
counted _at least_ N messages, and our sums come out as _at least_ the
expected sum.
Reviewers: Matthias J. Sax <matthias@confluent.io>
`streams_broker_down_resilience_test` produce messages with `null` key
to a topic with three partitions and expect each partition to be
non-empty afterward. But I don't think this is a correct assumption, as
a producer may try to be sticky and only produce to two partitions.
This cause occasional flakiness in the test.
The fix is to produce records with keys.
Reviewers: Matthias J. Sax <matthias@confluent.io>, PoAn Yang
<payang@apache.org>
In this upgrade test, applications sometimes crash before the upgrade,
so it's actually triggering a bug in several older versions (2.x and
possibly others). It seems to be a rare race condition that has been
happening since 2022. Since we are not going to roll out a patch release
for Kafka Streams 2.x, we should just allow applications to crash before
the upgrade.
Reviewers: Matthias J. Sax <matthias@confluent.io>
* `SnapshotEpoch` type in `ShareSnapshotValue.json` and
`ShareUpdateValue.json` is currently
`uint16` which might overflow under heavy traffic.
* To be consistent with other epochs, this PR updates the type to
`int32`.
backport from trunk (cb809e2574)
Reviewers: Andrew Schofield <aschofield@confluent.io>
* https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka
states the `retention.ms` property for the `__share_group_state` to be
`-1`.
* This PR makes it explicit when defining the values of those configs.
* Existing test has been updated.
```
$ bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe
--topic __share_group_state
Topic: __share_group_state TopicId: XCwzZjEGSjm5lUc_BeCrqA
PartitionCount: 50 ReplicationFactor: 1
Configs:
compression.type=producer,
min.insync.replicas=1,
cleanup.policy=delete,
segment.bytes=104857600,
retention.ms=-1
...
```
Cherry-pick from trunk (56a6ba2d2e)
Reviewers: Andrew Schofield <aschofield@confluent.io>
The test is resizing the `__consumer_offset` topic after broker start.
This seems to be completely unsupported. The group coordinator fetches
the number of partitions for the consumer offset topic once and never
updates it. So we can be in a state where two brokers have a different
understanding of how `__consumer_offsets` are partitioned.
The result in this test can be that two group coordinators both think
they own a certain group. The test is resizing `__consumer_offsets`
right after start-up from 3 to 50. Before the broker bounce, the GC
operates on only three partitions (0-2). During the bounce, we get new
brokers that operate on (0-49). This means that two brokers can both
think, at the same time, that they own a group.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Fix to ensure assigned partitions whose topics are not in the consumer
explicit subscription are considered not fetchable (so that no records
are returned on poll for them)
This scenario could happen in the new async consumer (using the Consumer
rebalance protocol) when the subscription changes, because the consumer
will keep its assignment until the coordinator sends a new one (broker
drives assignments).
This does not happen in the classic consumer because the assignment
logic lives on the client-side, so the consumer pro-actively updates
assignment as needed.
This PR validates assignment vs subscription on fetch for explicit
subscription only. Regular expressions, shared subscription remain
unchanged (regex case still under discussion, will be handled separately
if needed)
Reviewers: Andrew Schofield <aschofield@confluent.io>, TengYao Chi
<frankvicky@apache.org>, Kirk True <ktrue@confluent.io>, Jhen-Yung Hsu
<jhenyunghsu@gmail.com>
Profiling has shown that using the Collections Streams API approach adds
unnecessary overhead compared to a traditional for loop. Minor revisions
to the code have been made to use simpler constructs to improve
performance.
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Andrew Schofield
<aschofield@confluent.io>
Extending test coverage of authorization for streams group RPC
StreamsGroupDescribe. The RPC requires DESCRIBE GROUP and DESCRIBE TOPIC
permissions for all topics.
Reviewers: Bill Bejeck <bbejeck@apache.org>
Extending test coverage of authorization for streams group RPC
StreamsGroupHeartbeat. The RPC requires READ GROUP and DESCRIBE TOPIC
permissions for all topics. For creating internal topics, we require
CREATE TOPIC permission. If internal topic creation fails, the request
does not fail, but the status reflects this problem.
Reviewers: Bill Bejeck <bbejeck@apache.org>
These can be used to implement transformations on top of the RPC
definitions. Group IDs were already marked. This PR additionally adds
the entityType for all topic names.
Reviewers: Matthias J. Sax <matthias@confluent.io>
This is a follow up to [https://github.com/apache/kafka/pull/19910](url)
`The coordinator failed to write an epoch fence transition for producer
jt142 to the transaction log with error COORDINATOR_NOT_AVAILABLE. The
epoch was increased to 2 but not returned to the client
(kafka.coordinator.transaction.TransactionCoordinator)` -- as we don't
bump the epoch with this change, we should also update the message to
not say "increased" and remove the
**epochAndMetadata.transactionMetadata.hasFailedEpochFence = true** line
In the test, the expected behavior is:
1) First append transaction to the log fails with
COORDINATOR_NOT_AVAILABLE (epoch 1)
2) We try init_pid again, this time the SINGLE epoch bump succeeds, and
the following things happen simultaneously (epoch 2)
-> Transition to COMPLETE_ABORT
-> Return CONCURRENT_TRANSACTION error to the client
3) The client retries, and there is another epoch bump; state
transitions to EMPTY (epoch 3)
Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits
<alivshits@confluent.io>
This change compares the remote replica's HWM with the leader's HWM and
completes the FETCH request if the remote HWM is less than the leader's
HWM. When the leader's HWM is updated any pending FETCH RPC is
completed.
Reviewers: Alyssa Huang <ahuang@confluent.io>, David Arthur
<mumrah@gmail.com>, Andrew Schofield <aschofield@confluent.io>
(cherry picked from commit 742b327025)