As per the current implementation in archiveRecords, when LSO is
updated, if we have multiple record batches before the new LSO, then
only the first one gets archived. This is because of the following lines
of code ->
`isAnyOffsetArchived = isAnyOffsetArchived ||
archivePerOffsetBatchRecords(inFlightBatch, startOffset, endOffset - 1,
initialState);`
`isAnyBatchArchived = isAnyBatchArchived ||
archiveCompleteBatch(inFlightBatch, initialState);`
The first record / batch will make `isAnyOffsetArchived` /
`isAnyBatchArchived` true, after which this line of code will
short-circuit and the methods `archivePerOffsetBatchRecords` /
`archiveCompleteBatch` will not be called again. This PR changes the
order of the expressions so that the short-circuit does not prevent from
archiving all the required batches.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
The `record-size` and `throughput` arguments don’t work in
`TestRaftServer`. The `recordsPerSec` and `recordSize` values are always
hard-coded.
- Fix `recordsPerSec` and `recordSize` values hard-coded issue
- Add "Required" description to command-line options to make it clear to
users.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This PR aims at cleaning up the `jmh-benchmarks` module further by
getting rid of some extra code which can be replaced by record
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
During broker restarts, the topic-based RemoteLogMetadataManager (RLMM)
constructs the state by reading the internal `__remote_log_metadata`
topic. When the partition is not ready to perform remote storage
operations, then ReplicaNotAvailableException thrown back to the
consumer. The clients retries the request immediately.
This results in a lot of FETCH requests on the broker and utilizes the
request handler threads. Using the CountdownLatch to reduce the
frequency of ReplicaNotAvailableException thrown back to the clients.
This will improve the request handler thread usage on the broker.
Previously for one consumer, when RLMM is not ready for a partition,
then ~9K FetchConsumer requests / sec are received on the broker. With
this patch, the number of FETCH requests reduced by 95% to 600 / sec.
Reviewers: Lan Ding <isDing_L@163.com>, Satish Duggana
<satishd@apache.org>
Now that Kafka support Java 17, this PR makes some changes in connect
module. The changes in this PR are limited to only some files. A future
PR(s) shall follow.
The changes mostly include:
- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()
Modules target: runtime/src/test
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
During shutdown, when the RSM closes first, then the ongoing requests
might throw an error. To handle the ongoing requests gracefully, closing
the RSM after closing the remote-log reader thread pools.
Reviewers: Satish Duggana <satishd@apache.org>
This PR aims at cleaning up the tools module further by getting rid of
some extra code which can be replaced by `record`
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Fix typo and docs in following.
```
clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRebalanceListener.java
clients/src/main/resources/common/message/FetchRequest.json
raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java
```
Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Lan Ding
<isDing_L@163.com>, Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang
<payang@apache.org>
This PR fixes a problem related to `TestLinearWriteSpeed`. During my
work on KIP-780, I discovered that benchmarks for `TestLinearWriteSpeed`
do not account for compression algorithms. It always uses
`Compression.NONE` when creating records. The problem was introduced in
this PR [1].
[1] - https://github.com/apache/kafka/pull/17736
Reviewers: Ken Huang <s7133700@gmail.com>, Mickael Maison
<mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Follow-up to
[KAFKA-18486](https://issues.apache.org/jira/browse/KAFKA-18486)
* Replace PartitionState with PartitionRegistration in
makeFollower/makeLeader
* Remove PartitionState.java since it is no longer referenced
Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>,
Chia-Ping Tsai <chia7712@gmail.com>
Fixes a false positive in `BrokerRegistrationRequestTest` caused by
`isMigratingZkBroker`, and migrates the test from Scala to Java.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
1. Move TransactionMetadata to transaction-coordinator module.
2. Rewrite TransactionMetadata in Java.
3. The `topicPartitions` field uses `HashSet` instead of `Set`, because
it's mutable field.
4. In Scala, when calling `prepare*` methods, they can use current value
as default input in `prepareTransitionTo`. However, in Java, it doesn't
support function default input value. To avoid a lot of duplicated code
or assign value to wrong field, we add a private class `TransitionData`.
It can get current `TransactionMetadata` value as default value and
`prepare*` methods just need to assign updated value.
Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits
<alivshits@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
CoordinatorMetricsShard was split into a separate module in
(https://github.com/apache/kafka/pull/16883), causing the link in the
javadoc to become invalid.
So, remove broken link in CoordinatorMetricsShard javadoc.
Reviewers: TengYao Chi <kitingiao@gmail.com>, Sanskar Jhajharia
<sjhajharia@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
The broker observer should not read update voter set timer value when
polling to determine its backoff, since brokers cannot auto-join the
KRaft voter set. If auto-join or kraft.version=1 is not supported,
controller observers should not read this timer either when polling.
The updateVoterSetPeriodMs timer is not something that should be
considered when calculating the backoff returned by polling, since this
timer does not represent the same thing as the fetchTimeMs timer.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, José Armando García
Sancio <jsancio@apache.org>, Alyssa Huang <ahuang@confluent.io>,
Kuan-Po Tseng <brandboat@gmail.com>
This PR does the following:
- Rewrite to new test infra.
- Rewrite to java.
- Move to clients-integration-tests.
- Add `ensureConsistentMetadata` method to `ClusterInstance`,
similar to `ensureConsistentKRaftMetadata` in the old infra, and
refactors related code.
Reviewers: TengYao Chi <frankvicky@apache.org>, Ken Huang
<s7133700@gmail.com>
Refactor metric gauges instantiation to use lambda expressions instead
of ImmutableValue.
Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Log a warning for each topic that failed to be created as a result of an
automatic creation. This makes the underlying cause more visible so
users can take action.
Previously, at the default log level, you could only see logs that the
broker was attempting to autocreate topics. If the creation failed, then
it was logged at debug.
Signed-off-by: Robert Young <robertyoungnz@gmail.com>
Reviewers: Luke Chen <showuon@gmail.com>, Kuan-Po Tseng <brandboat@gmail.com>
Add the `controller.quorum.auto.join.enable` configuration. When enabled
with KIP-853 supported, follower controllers who are observers (their
replica id + directory id are not in the voter set) will:
- Automatically remove voter set entries which match their replica id
but not directory id by sending the `RemoveVoterRPC` to the leader.
- Automatically add themselves as a voter when their replica id is not
present in the voter set by sending the `AddVoterRPC` to the leader.
Reviewers: José Armando García Sancio
[jsancio@apache.org](mailto:jsancio@apache.org), Chia-Ping Tsai
[chia7712@gmail.com](mailto:chia7712@gmail.com)
### Summary
Adds comprehensive test coverage for the StorageTool format command
feature validation, including tests for valid feature overrides, invalid
feature detection, and multiple feature specifications. Also adds debug
output to help with troubleshooting format operations.
### Changes Made
#### Test Coverage Improvements
- **`testFormatWithReleaseVersionAndFeatureOverride()`**: Tests that
feature overrides work correctly when specified with `--feature` flag
- **`testFormatWithInvalidFeatureThrowsError()`**: Tests error handling
for unsupported features
- **`testFormatWithMultipleFeatures()`**: Tests multiple feature
specifications in a single format command
#### Debug Output Enhancement
- **Formatter.java**: Added debug output to print bootstrap metadata
during format operations
- Helps with troubleshooting format issues by showing the complete
bootstrap metadata being written
- Improves visibility into what features and configurations are being
applied
#### Test Updates
- **FormatterTest.java**: Updated existing tests to account for the new
debug output\
### Related
- KIP-1022: [Formatting and Updating Features
](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1022%3A+Formatting+and+Updating+Features)
Reviewers: Kevin Wu <kevin.wu2412@gmail.com>, Justine Olshan
<jolshan@confluent.io>
Dependencies such as api-ldap-client-api and mina-core used by the
current version of apacheds have several critical CVEs: CVE-2018-1337,
CVE-2024-52046 and CVE-2019-0231.
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This PR do following:
1. Use correct json format to test.
2. make `PartitionReassignmentState` and `VerifyAssignmentResult.java`
become record
Reviewers: TengYao Chi <frankvicky@apache.org>, Ken Huang
<s7133700@gmail.com>, PoAn Yang <payang@apache.org>
Use Java 24 for the spotbugs checks, now that Spotbugs works on Java
24.
Added some more warning exclusions for warnings that are new to 4.9.4.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
In the current implementation, some delayed share fetch operations get
trapped in the delayed share fetch purgatory when the partition
leaderships change during share consumption. This is because there is no
check in code to make sure the current broker is still the partition
leader corresponding to the share partitions. So, when leadership
changes, the share partitions cannot be acquired, because they have
already been fenced, and tryComplete returns false. Although the
operatio does get completed when the timer expires for it, but it is too
late by then, and the operation get stuck in the watchers list waiting
for it to get purged when estimated operations increase to more than
1000. This Pr resolves this by adding the required check so that if
partition leadership changes, then the delayed share fetches waiting on
it gets completed instantaneously.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield
<aschofield@confluent.io>
The previous URL http://lambda-architecture.net/ seems to now be controlled by spammers
Co-authored-by: Shashank <hsshashank.grad@gmail.com>
Reviewers: Mickael Maison <mickael.maison@gmail.com>
Improve RLMM doc:
1. Distinguish RLMM configs from other tiered storage configs, all RLMM
configs need to start with a specific prefix, but the original
documentation miss description.
2. Added description of additional configs for client, which is required
when configuring authentication information. This can confuse users, for
example: Aiven-Open/tiered-storage-for-apache-kafka#681
Reviewers: Luke Chen <showuon@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
*What*
https://issues.apache.org/jira/browse/KAFKA-19572
- If a `ShareConsumer` constructor failed due to any exception, then we
call `close()` in the catch block.
- If there were uninitialized members accessed during `close()`, then it
would throw a NPE. Currently there are no null checks, hence we were
attempting to use these fields during `close()` execution.
- To avoid this, PR adds null checks in the `close()` function before we
access fields which possibly could be null.
Reviewers: Apoorv Mittal <amittal@confluent.io>, Lianet Magrans
<lmagrans@confluent.io>
Document deprecation of PARTITIONER_ADPATIVE_PARTITIONING_ENABLE_CONFIG
in `upgrade.html`, which was missed in #20317
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
Fixing max delivery check on acquisition lock timeout and write state
RPC failure.
When acquisition lock is already timed out and write state RPC failure
occurs then we need to check if records need to be archived. However
with the fix we do not persist the information, which is relevant as
some records may be archived or delivery count is bumped. The
information will be persisted eventually.
The persister call has failed already hence issuing another persister
call due to a failed persister call earlier is not correct. Rather let
the data persist in future persister calls.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Abhinav Dixit
<adixit@confluent.io>
Now that Kafka support Java 17, this PR makes some changes in `trogdor`
module. The changes mostly include:
- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()
Some minor cleanups around use of enhanced switch blocks and conversion
of classes to record classes.
Reviewers: Ken Huang <s7133700@gmail.com>, Vincent Jiang
<vpotucek@me.com>, Chia-Ping Tsai <chia7712@gmail.com>
Minor PR to move persister call outside of the lock. The lock is not
required while making the persister call.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Abhinav Dixit <adixit@confluent.io>
The PR removes unnecessary updates for find next fetch offset. When the
state is in transition and not yet completed then anyways respective
offsets should not be considered for acquisition. The find next fetch
offset is updated finally when transition is completed.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Abhinav Dixit
<adixit@confluent.io>
Fixes a typo in ProducerConfig: Renames
`PARTITIONER_ADPATIVE_PARTITIONING_ENABLE_CONFIG` →
`PARTITIONER_ADAPTIVE_PARTITIONING_ENABLE_CONFIG`
The old key is retained for backward compatibility.
See: [KIP-1175: Fix the typo `PARTITIONER_ADPATIVE_PARTITIONING_ENABLE`
in ProducerConfig](https://cwiki.apache.org/confluence/x/KYogFQ)
Reviewers: Yung <yungyung7654321@gmail.com>, TengYao Chi
<frankvicky@apache.org>, Ken Huang <s7133700@gmail.com>, Nick Guo
<lansg0504@gmail.com>, Ranuga Disansa <go2ranuga@gmail.com>
The default value of `num.recovery.threads.per.data.dir` is now 2
according to KIP-1030. We should update config files which are still
setting 1.
---------
Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
Reviewers: Luke Chen <showuon@gmail.com>
Implements a timeout mechanism (using maxPollTimeMs) that waits for
missing source topics to be created before failing, instead of
immediately throwing exceptions in the new Streams protocol.
Additionally, throw TopologyException when partition count mismatch is
detected.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Alieh Saeedi
<asaeedi@confluent.io>, Matthias J. Sax <matthias@confluent.io>
The link for the heading Errant Record Reporter is missing the # symbol,
which is causing it to redirect to a 404 Not Found page. Please refer
to the updated preview. <img width="665" height="396"
alt="kafka-site-preview"
src="https://github.com/user-attachments/assets/1c6f3ea9-de9b-4b2c-a4d6-919199a6ff6f"
/>
Reviewers: PoAn Yang <payang@apache.org>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This commit updates CI to test against Java 24 instead of Java 23 which
is EOL.
Due to Spotbugs not having released version 4.9.4 yet, we can't run
Spotbugs on Java 24. Instead, we are choosing to run Spotbugs, and the
rest of the compile and validate build step, on Java 17 for now.
Once 4.9.4 has released, we will switch to using Java 24 for this.
Exclude spotbugs from the run-tests gradle action. Spotbugs is already
being run once in the build by "compile and validate", there is no
reason to run it again as part of executing tests.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Now that Kafka support Java 17, this PR makes some changes in tools
module. The changes in this PR are limited to only some files. A future
PR(s) shall follow. The changes mostly include:
- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()
Some minor changes to use the enhanced switch.
Sub modules targeted: tools/src/test
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
The compiler warning is due to a lack of import. This patch imports the ApiException to fix it.
Reviewers: TengYao Chi <frankvicky@apache.org>, Yung
<yungyung7654321@gmail.com>
When using a connector that requires a schema, such as JDBC connectors,
with JSON messages, the current JSONConverter necessitates including the
schema within every message. To address this, we are introducing a new
parameter, schema.content, which allows you to provide the schema
externally. This approach not only reduces the size of the messages but
also facilitates the use of more complex schemas.
KIP :
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1054%3A+Support+external+schemas+in+JSONConverter
Reviewers: Mickael Maison <mickael.maison@gmail.com>, TengYao Chi <frankvicky@apache.org>, Edoardo Comar <ECOMAR@uk.ibm.com>