With 4.0 release, we remove pageview demo because it depends on
`:connect:json` which requires JDK 17. This PR removes the connect
dependency and adds a customized serializer and deserializer, to make
pageview demo works with JDK 11.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Optimize `RemoteLogManager#buildFilteredLeaderEpochMap` .
Add a temporary unit test `testBuildFilteredLeaderEpochMapModify` in
`RemoteLogManagerTest` to verify the output consistency of the method
before and after optimization.
Randomly generate leaderEpochs and iterate 100000 times for
verification.
```
@Test
public void testBuildFilteredLeaderEpochMapModify() {
int testIterations = 100000;
for (int i = 0; i < testIterations; i++) { TreeMap<Integer,
Long> leaderEpochToStartOffset =
generateRandomLeaderEpochAndStartOffset();
// before optimize NavigableMap<Integer, Long>
optimizeBefore =
RemoteLogManager.buildFilteredLeaderEpochMap(leaderEpochToStartOffset);
// after optimize NavigableMap<Integer, Long>
optimizeAfter =
RemoteLogManager.buildFilteredLeaderEpochMap2(leaderEpochToStartOffset);
assertEquals(optimizeBefore, optimizeAfter); } }
private static TreeMap<Integer, Long>
generateRandomLeaderEpochAndStartOffset() { TreeMap<Integer,
Long> map = new TreeMap<>(); Random random = new Random();
int numEntries = random.nextInt(100000); long lastStartOffset =
0;
for (int i = 0; i < numEntries; i++) { // generate
a leader epoch int leaderEpoch = random.nextInt(100000);
long startOffset;
// generate a random start offset , or use the last start
offset if (i > 0 && random.nextDouble() < 0.2) {
startOffset = lastStartOffset; } else { startOffset =
Math.abs(random.nextLong()) % 100000; } lastStartOffset =
startOffset;
map.put(leaderEpoch, startOffset);
}
return map;
}
```
Command:
``` ./gradlew storage:test --tests RemoteLogManagerTest```
Result: All unit tests passed.
<img width="1258" height="424" alt="image"
src="https://github.com/user-attachments/assets/7d9fc3b5-3bbc-440f-b1cf-3a2a5f97557a"
/> <img width="411" height="66" alt="image"
src="https://github.com/user-attachments/assets/22a0b443-88e8-43d2-a3f2-51266935ed34"
/>
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>,
Chia-Ping Tsai <chia7712@gmail.com>
Add the ackWhenCommitted boolean field to the AddRaftVoter request, and
bump the RPC's version to 1.
The default value of the ackWhenCommitted field is true, and in this
case the leader will return a response after committing the VotersRecord
generated by the RPC. If ackWhenCommitted is false, the leader will
return a response after generating the new VotersRecord and adding it to
the batch accumulator.
Reviewers: Alyssa Huang <ahuang@confluent.io>, José Armando García
Sancio <jsancio@apache.org>
The `MetadataVersionValidator` class became unused after commit
b6e6a3c68a (KAFKA-18360: Remove zookeeper configurations) removed the
`inter.broker.protocol.version` configuration, which was the only
component using this validator.
This commit removes both the validator class and its associated test as
they serve no purpose in the current implementation.
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
* The persister receives the response from the share coordinator and
performs various actions based on the response error code.
* There was an issue in handling the error code and error message. We
were creating an Error object from the error code in the response but
not looking at the error message in the response.
* In some cases where a standard code has a custom associated message -
this will not provide complete information in the logs.
* In this PR, we rectify the situation by first checking the error
message in response and if empty, getting the standard message from
Errors object.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal
<apoorvmittal10@gmail.com>
This PR corrects the name of logger in the inner class
PersisterStateManagerHandler. After this change it will be possible to
change the log level dynamically in the kafka brokers. This PR also
includes a small change in a log line in GroupCoordinatorService, making
it clearer.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Sushant Mahajan
<smahajan@confluent.io>, Lan Ding <isDing_L@163.com>, TengYao Chi
<frankvicky@apache.org>
In order to better analyze steady-state performance of Kafka, this PR
enables a warmup in the Producer Performance test. The warmup duration
is specified as a number of records that are a subset of the total
numRecords. If warmup records is greater than 0, the warmup is
represented by a second Stats object which holds warmup results. Once
warmup records have been exhausted, the test switches to using the
existing Stats object. At end of test, if warmup was enabled, the
summary of the whole test (warump + steady state) is printed followed by
the summary of the steady-state portion of the test. If no warmup is
used, summary prints don't change from existing behavior. This
contribution is an original work and is licensed to the Kafka project
under the Apache license
Testing strategy comprises new Java unit tests added to
ProducerPerformanceTests.java.
Reviewers: Kirk True <kirk@kirktrue.pro>, Federico Valeri
<fedevaleri@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Another refactor PR to move in-flight batch and state out of
SharePartition. This PR concludes the refactoring and subsequent PRs for
this ticket will involve code cleanups and better lock handling. However
the intent is to keep PRs small so they can be reviewed easily.
Reviewers: Andrew Schofield <aschofield@confluent.io>
While working on KAFKA-19476, I realized that we need to refactor
SharePartition for read/write lock handling. I have started some work in
the area. For the initial PR, I have moved AcquisitionLockTimeout class
outside of SharePartition.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Improve the error handling while building the remote-log-auxiliary state
when a follower node with an empty disk begin to synchronise with the
leader. If the topic has remote storage enabled, then the
ReplicaFetcherThread attempt to build the remote-log-auxiliary state.
Note that the remote-log-auxiliary state gets invoked only when the
leader-log-start-offset is non-zero and leader-log-start-offset is not
equal to leader-local-log-start-offset.
When the LeaderAndISR request is received, then the
ReplicaManager#becomeLeaderOrFollower invokes 'makeFollowers' initially,
followed by the RemoteLogManager#onLeadershipChange call. As a result,
when ReplicaFetcherThread initiates the
RemoteLogManager#fetchRemoteLogSegmentMetadata, the partition may not
have been initialized at that time and throws retriable exception.
Introduced RetriableRemoteStorageException to gracefully handle the
error.
After the patch:
```
[2025-07-19 19:28:20,934] INFO [ReplicaFetcher replicaId=3, leaderId=1,
fetcherId=0] Could not build remote log auxiliary state for orange-1 due
to error: RemoteLogManager is not ready for partition: orange-1
(kafka.server.ReplicaFetcherThread)
[2025-07-19 19:28:20,934] INFO [ReplicaFetcher replicaId=3, leaderId=2,
fetcherId=0] Could not build remote log auxiliary state for orange-0 due
to error: RemoteLogManager is not ready for partition: orange-0
(kafka.server.ReplicaFetcherThread)
```
Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
- startConsumerThread is always true so removed the variable.
- Replaced the repetitive lock handling logic with
`withReadLockAndEnsureInitialized` to reduce duplication and improve
readability.
- Consolidated the logic in `initializeResources` and. simplified method
arguments to better encapsulate configuration.
- Extracted common code and reduced the usage of global variables.
- Named the variables properly.
Tests:
- Existing UTs since this patch refactored the code.
Reviewers: PoAn Yang <payang@apache.org>
### Problem
The
`ShareGroupCommandTest.testDeleteShareGroupOffsetsArgsWithoutTopic()`,
`ShareGroupCommandTest.testDeleteShareGroupOffsetsArgsWithoutGroup()`,
`ResetStreamsGroupOffsetTest.testResetOffsetsWithoutGroupOption()`,
`DeleteStreamsGroupTest.testDeleteWithoutGroupOption()`,
`DescribeStreamsGroupTest.testDescribeWithoutGroupOption()` tests were
flaky due to a dependency on Set iteration order in error message
generation.
### Root Cause
The cleanup [commit](https://github.com/apache/kafka/pull/20091) that
replaced `new HashSet<>(Arrays.asList(...))` with `Set.of(...)` in
ShareGroupCommandOptions and StreamsGroupCommandOptions changed the
iteration characteristics of collections used for error message
generation:
This produces different orders like `[topic], [group]` vs `[group],
[topic]`, but the tests expected a specific order, causing intermittent
failures.
### Solution
Fix the root cause by ensuring deterministic error message generation
through alphabetical sorting of option names.
Reviewers: ShivsundarR <shr@confluent.io>, Ken Huang
<s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>
When initPid(keepPrepared = true) is called after a client crashes,
several situations should be considered.
When there's an ongoing transaction, we can transition it to the newly
added PREPARED_TRANSACTION state. However, what if there's no ongoing
transaction?
Another scenario could be:
- Issued a commit, to commit prepared
- The commit succeeded on the TC, but the client crashed
- Client restarted with keepPreparedTxn=true (because it doesn't know if
the commit succeeded or not and needs to keep retrying the commit until
it's successful)
- Issued a commit, but the transaction is not ongoing, because it's
committed
**Solution:**
This is a perfectly valid scenario as the external transaction
coordinator for the 2PC transaction will keep committing participants,
and the participants need to eventually return success (that's a
guarantee for a prepared transaction).
_Rejected Alt 1_ -> Return an InvalidTxnStateException : Returning an
error would break the above scenario.
_Rejected Alt 2_ -> Then the next thought is that we should somehow
validate if the state is expected, but we don't have data to validate
the result against.
**Final Solution:** Just returning the success and transitioning to
READY is the proper handling of this condition.
Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits
<alivshits@confluent.io>
https://issues.apache.org/jira/browse/KAFKA-19213
Fixes a bug where creating a producer/consumer using a `Properties`
object created using the `Properties(Properties defaults)` constructor
will ignore the default properties.
Reviewers: Kirk True <kirk@kirktrue.pro>, TaiJuWu <tjwu1217@gmail.com>,
Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
The descrption "REQUIRED: The number of messages to send or consume" is
not correct, since those tools do NOT send any records.
Reviewers: TengYao Chi <frankvicky@apache.org>
Now that Kafka support Java 17, this PR makes some changes in connect
module. The changes in this PR are limited to only some files. A future
PR(s) shall follow.
The changes mostly include:
- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()
Modules target: test-plugins, transforms
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Changes: Rename `waitForTopic` to `waitTopicCreation` for better clarity
Reasons: To align with `waitTopicDeletion` Reference:
https://github.com/apache/kafka/pull/20108/files#r2221659660
Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<frankvicky@apache.org>
This feature adds maintenance burden and potential security concerns
while providing no apparent value to the Kafka community. See
[KIP-1193](https://cwiki.apache.org/confluence/x/dAxJFg) for more
details.
Reviewers: TengYao Chi <frankvicky@apache.org>, Ken Huang
<s7133700@gmail.com>
---------
Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
Now that Kafka support Java 17, this PR makes some changes in tools
module. The changes in this PR are limited to only some files. A future
PR(s) shall follow.
The changes mostly include:
- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()
Sub modules targeted: tools/src/main
Reviewers: Ken Huang <s7133700@gmail.com>, Jhen-Yung Hsu
<jhenyunghsu@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
1. JMH test should return value against return void (compiler can
eliminate returned value and benchmark would be incorrect).
2. Also move constant variable from method to class, to prevent JIT to
unfold.
3. Increase warm up iterations
Reviewers: Lucas Brutschy <lucasbru@apache.org>
There is a typo in the unit test, it calls
`runOnceWithoutProcessingThreads` while it should call
`runOnceWithProcessingThreads` instead.
Reviewers: Lucas Brutschy <lucasbru@apache.org>
* We INFO log a message, if a share partition could be cold snapshotted.
* However, this may create noise if we have highly partitioned topic
backing the share partition. This will be further exacerbated by
multiple share groups using that topic.
* To reduce log pollution, this PR changes the level to DEBUG.
Reviewers: ShivsundarR <shr@confluent.io>, Andrew Schofield
<aschofield@confluent.io>
Implements KIP-1034 to add support of Dead Letter
Queue in Kafka Streams.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Bruno Cadonna
<cadonna@apache.org>
Co-authored-by: Sebastien Viale <sebastien.viale@michelin.com>
The PR refactors the findNextFetchOffset variable from AtomicBoolean to
boolean itself as the access is always done while holding a lock. This
also improves handling of `writeShareGroupState` method response where
now complete lock is not required, rather on sub-section.
Reviewers: Abhinav Dixit <adixit@confluent.io>, Andrew Schofield
<aschofield@confluent.io>
1. Add check leader missing logic in method
`ConsumerGroupCommand.ConsumerGroupService#prepareOffsetsToReset` in
order to fail quickly
2. Add some tests
Reviewers: TaiJuWu <tjwu1217@gmail.com>, Lan Ding <isDing_L@163.com>,
Ken Huang <s7133700@gmail.com>, Andrew Schofield
<aschofield@confluent.io>
Add its for `Admin.deleteShareGroupOffsets`,
`Admin.alterShareGroupOffsets` and `Admin.listShareGroupOffsets` to
`PlaintextAdminIntegrationTest`.
Reviewers: Andrew Schofield <aschofield@confluent.io>
## Summary
jira: https://issues.apache.org/jira/browse/KAFKA-19517
Ensure `LoadSummary#numRecords` counts all records, including control
batches, to maintain consistency with numBytes.
## Test
`testLoading` now verifies `numRecords`.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, TengYao Chi
<frankvicky@apache.org>
This patch adds an API level integration test for the producer epoch
verification when processing transactional offset commit and end txn
markers.
Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi
<kitingiao@gmail.com>, Sean Quah <squah@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>
This patch fixes the bug that allows the last known leader to be elected as a partition leader while still in a fenced state, before the next heartbeat removes the fence.
https://issues.apache.org/jira/browse/KAFKA-19522
Reviewers: Jun Rao <junrao@gmail.com>, TengYao Chi
<frankvicky@apache.org>
see https://github.com/apache/kafka/pull/19769#issuecomment-3065869429
This patch adds a test to `ProtocolTest` to ensure the Protocol page displays the correct API version range.
Reviewers: Yung <yungyung7654321@gmail.com>, TengYao Chi
<frankvicky@apache.org>, Gaurav Narula <gaurav_narula2@apple.com>, Ken
Huang <s7133700@gmail.com>, Jimmy Wang <wangzhiwang@qq.com>
This PR removes the dependencies on `core` and `scala-library` from the
`coordinator-common` module, as a follow-up to
https://github.com/apache/kafka/pull/20089.
These dependencies have been removed from tests, and the previously
added import-control relaxations have been reverted accordingly.
Reviewers: TengYao Chi <frankvicky@apache.org>, Ken Huang
<s7133700@gmail.com>
Temporarily fix it by disable the new protocol, will take a deeper look
at it in the consumer protocol.
Reviewers: Matthias J. Sax <matthias@confluent.io>
The comment on the RemoteLogManager.getLeaderEpochEntries method has a
small error description,it should be start(inclusive)and end(exclusive).
Reviewers: Ken Huang <s7133700@gmail.com>, Lan Ding <isDing_L@163.com>,
Chia-Ping Tsai <chia7712@gmail.com>
Improve the error message in the kafka-storage.sh when an incorrect
release-version is given. Specifically, following the behavior of
kafka-feature.sh, when an incorrect release-version is entered, it
returns the currently supported versions to the user.
Reviewers: TengYao Chi <frankvicky@apache.org>, Yung
<yungyung7654321@gmail.com>
The MetadataImage has a lot of stuff in it and it gets passed around in
many places in the new GroupCoordinator. This makes it difficult to
understand what metadata the group coordinator actually relies on and
makes it too easy to use metadata in ways it wasn't meant to be used.
This change encapsulate the MetadataImage in an interface
(`CoordinatorMetadataImage`) that indicates and controls what metadata
the group coordinator actually uses. Now it is much easier at a glance
to see what dependencies the GroupCoordinator has on the metadata. Also,
now we have a level of indirection that allows more flexibility in how
the GroupCoordinator is provided the metadata it needs.
`ReplicaManager#alterReplicaLogDirs` does not resume log cleaner while
handling an `AlterReplicaLogDirs` request for a topic partition which
already has an `AlterReplicaLogDirs` in progress, leading to a resource
leak where the cleaning for topic partitions remains paused even after
the log directory has been altered.
This change ensures we invoke `LogManager#resumeCleaning` if the future
replica directory has changed.
Reviewers: Jun Rao <junrao@gmail.com>
Removing the isEligibleLeaderReplicasV1Enabled to let ELR be enabled if
MV is at least 4.1IV1. Also bump the Latest Prod MV to 4.1IV1
Reviewers: Paolo Patierno <ppatierno@live.com>, Jun Rao <junrao@gmail.com>
### Summary of Changes
- Rewrote both `CoordinatorLoaderImpl` and `CoordinatorLoaderImplTest`
in Java, replacing their original Scala implementations.
- Removed the direct dependency on `ReplicaManager` and replaced it with
functional interfaces for `partitionLogSupplier` and
`partitionLogEndOffsetSupplier`
- Preserved original logic and test coverage during migration.
Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>,
TengYao Chi <frankvicky@apache.org>, Chia-Ping Tsai
<chia7712@gmail.com>
Added required ACLs for new streams operations:
- STREAMS_GROUP_HEARTBEAT (88) requires:
• READ on Group
• DESCRIBE on Topics
• [Conditional] CREATE on Cluster or Topics
- STREAMS_GROUP_DESCRIBE (89) requires:
• DESCRIBE on Group
• DESCRIBE on Topic
Here is the rendering of the modified document.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
Co-authored-by: Lucas Brutschy <lbrutschy@gmail.com>
The mocked value for `UnifiedLog#topicId` was incorrectly set up which
caused test failure.
Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <payang@apache.org>, Satish Duggana <satishd@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>