Jira: [KAFKA-19676](https://issues.apache.org/jira/browse/KAFKA-19676)
All subclasses of EpochState do not throw an IOException when closing,
so catching it is unnecessary. We could override close to remove the
IOException declaration.
Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TaiJuWu
<tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
That config has a `Validator` so we already automatically print the
valid values in the generated docs:
https://kafka.apache.org/documentation/#streamsconfigs_upgrade.from
That will be one less place to upgrade each time we make a new release.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
* The `ShareCoordinatorShard` maintains the the record offset
information for `SharePartitionKey`s in the
`ShareCoordinatorOffsetsManager` class.
* Replay of `ShareSnapshot`s in the shards are reflected in the offsets
manager including records created due to delete state.
* However, if the share partition delete is due to topic delete, no
record will ever be written for the same `SharePartitionKey` post the
delete tombstone (as topic id will not repeat).
As a result the offset manager will always consider the deleted share
partition's offset as the last redundant one.
* The fix is to make the offset manager aware of the tombstone records
and remove them from the redundant offset calculation.
* Unit tests have been updated for the same.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal
<apoorvmittal10@gmail.com>
In the original algorithm, standby tasks are assigned to a process that
previously owned the task only if it is “load-balanced”, meaning the
process has fewer tasks that members, or it is the least loaded
process. This is strong requirement, and will cause standby tasks to
often not get assigned to process that previously owned it.
Furthermore, the condition “is the least loaded process” is hard to
evaluate efficiently in this context.
We propose to instead use the same “below-quota” condition as in active
task assignment.
We compute a quota for active and standby tasks, by definiing numOfTasks
= numberOfActiveTasks+numberOfStandbyTasks and defining the quota as
numOfTasks/numberOfMembers rounded up. Whenever a member becomes “full”
(its assigned number of tasks is equal to numOfTasks) we deduct its
tasks from numOfTasks and decrement numberOfMembers and recompute the
quota, which means that the quota may be reduced by one during the
assignment process, to avoid uneven assignments.
A standby task can be assigned to a process that previously owned it,
whenever the process has fewer than numOfMembersOfProcess*quota.
This condition will, again, prioritize standby stickyness, and can be
evaluated in constant time.
In our worst-performing benchmark, this improves the runtime by 2.5x on
top of the previous optimizations, but 5x on the more important
incremental assignment case.
Reviewers: Bill Bejeck <bbejeck@apache.org>
The
[distutils](https://docs.python.org/3.13/whatsnew/3.12.html#distutils)
package is removed from Python 3.12.
Change `distutils` usage to `shutil`.
Reviewers: Mickael Maison <mimaison@apache.org>
---------
Signed-off-by: PoAn Yang <payang@apache.org>
Querying the oldest-open-iterator metric can result in a
NoSuchElementException when the last open iterator gets removed, due to
a race condition between the query and the metric update.
To avoid this race condition, this PR caches the metric result, to avoid
accessing the list of open iterator directly. We don't need to clear
this cache, because the entire metric is removed when the last iterator
gets removed.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Similarly to what was done for
AsyncKafkaConsumerTest::testFailConstructor,
[here](https://github.com/apache/kafka/pull/20491)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>
Add the explanation of `null` for DeleteAclsRequest#ResourceNameFilter
Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
The method rollbackOrProcessStateUpdates in SharePartition received 2
separate lists of updatedStates (InFlightState) and stateBatches
(PersisterStateBatch). This PR introduces a new subclass called
`PersisterBatch` which encompasses both these objects.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
This is actually fixing a difference between the old and the new
assignor. Given the assignment ordering, the legacy assignor has a
preference for range-style assignments built in, that is, assigning
C1: 0_0, 1_0 C2: 0_1, 1_1
instead of
C1: 0_0, 0_1 C2: 1_0, 1_1
We add tests to both assignors to check for this behavior, and improve
the new assingor by enforcing corresponding orderings.
Reviewers: Bill Bejeck <bill@confluent.io>
Integration tests for Stream Admin related API
Previous PR: https://github.com/apache/kafka/pull/20244
This one adds:
- Integration test for Admin#listStreamsGroupOffsets API
- Integration test for Admin#deleteStreamsGroupOffsets API
- Integration test for Admin#alterStreamsGroupOffsets API
Reviewers: Alieh Saeedi <asaeedi@confluent.io>, Lucas Brutschy
<lucasbru@apache.org>
In `RaftUtil.addVoterResponse` and `RaftUtil.removeVoterResponse`
methods, when the input `errorMessage` is `null`, the returned string is
not actually null but `NONE`.
This introduces an inconsistency: semantically, `null` should represent
“no error message,” while `NONE` looks like a real string value and may
confuse clients.
Reviewers: Alyssa Huang <ahuang@confluent.io>, José Armando García
Sancio <jsancio@apache.org>, Anton Agestam <anton.agestam@aiven.io>,
Chia-Ping Tsai <chia7712@gmail.com>
The original name is confusing which could cause engineers to make a
mistake and confuse the `batchSize` with some other unit like number of
records.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
If there's a failure in the kafka consumer constructor, we attempt to
close it
2329def2ff/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java (L540)
In that case, it could be the case that some components may have not
been created, so we should consider some null checks to avoid noisy logs
about NPE.
This noisy logs have been reported with the console share consumer in a
similar scenario, so this task is to review and do a similar fix for the
Async if needed.
The fix is to check if handlers/invokers are null before trying to close
them. Similar to what was done here
https://github.com/apache/kafka/pull/20290
Reviewers: TengYao Chi <frankvicky@apache.org>, Lianet Magrans
<lmagrans@confluent.io>
The original test timeout when using new protocol, because it use
`ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG` as the exception's timeout,
which is 300s. Also the test for new protocol and old protocol use the
same group ID, so the failure will be hidden.
What I do:
1. Set the timeout as 5 secs so it can be captured within 10s
2. Use new appId for new protocol
Reviewers: Lucas Brutschy <lucasbru@apache.org>
As the PR title suggests, this PR is an attempt to perform some cleanups
in the server module. The changes are mostly around the use of Record
type for some classes, changes to use enhanced switch, etc.
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
# Description
`kafka-acls.sh` doesn't print message about duplicate authorization.
# Changes
Now the cli searches for existing AclBinding, prints duplicate bindings,
and removes them from the adding list.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This is a follow-up to #19449, which do the following things:
1. Add document to explain `schema.content` only work for sink connector
when `schemas.enable` set to true.
2. Handle the case that while jsonValue contains the `schema` and
`payload` fields, we should use the corresponding value.
Reviewers: Priyanka K U <priyanka.ku@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
**This upgrade includes:**
- Dependency configurations are now realized only when necessary, which
helps improve configuration performance and memory usage.
- The configuration cache improves build time by caching the result of
the configuration phase and reusing it for subsequent builds. This
feature can significantly improve build performance.
reference: [Gradle 8.14.3 Release
Notes](https://docs.gradle.org/8.14.3/release-notes.html#build-authoring-improvements)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This PR removes two System.out.println(...) statements from
StreamsGraphTest. These outputs were left over from debugging and are
not needed in the test logic.
Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This PR aims at cleaning up the `raft` module further by getting rid of
some extra code which can be replaced by `record`
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This PR aims at cleaning up the`connect:runtime` module further by
getting rid of some extra code which can be replaced by record and the
relevant changes.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
With "merge.repartition.topic" optimization enabled, Kafka Streams tries
to push repartition topics upstream, to be able to merge multiple
repartition topics from different downstream branches together.
However, it is not safe to push a repartition topic if the parent node
is value-changing: because of potentially changing data types, the
topology might become invalid, and fail with serde error at runtime.
The optimization itself work correctly, however, processValues() is not
correctly declared as value-changing, what can lead to invalid
topologies.
Reviewers: Bill Bejeck <bill@confluent.io>, Lucas Brutschy
<lbrutschy@confluent.io>
We add the three main changes in this PR
- Disallowing null values for most LIST-type configurations makes sense,
since users cannot explicitly set a configuration to null in a
properties file. Therefore, only configurations with a default value of
null should be allowed to accept null.
- Disallowing duplicate values is reasonable, as there are currently no
known configurations in Kafka that require specifying the same value
multiple times. Allowing duplicates is both rare in practice and
potentially confusing to users.
- Disallowing empty list, even though many configurations currently
accept them. In practice, setting an empty list for several of these
configurations can lead to server startup failures or unexpected
behavior. Therefore, enforcing non-empty lists helps prevent
misconfiguration and improves system robustness.
These changes may introduce some backward incompatibility, but this
trade-off is justified by the significant improvements in safety,
consistency, and overall user experience.
Additionally, we introduce two minor adjustments:
- Reclassify some STRING-type configurations as LIST-type, particularly
those using comma-separated values to represent multiple entries. This
change reflects the actual semantics used in Kafka.
- Update the default values for some configurations to better align with
other configs.
These changes will not introduce any compatibility issues.
Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
Should be `headers.separator=<headers.separator>` instead of
`headers.separator=<line.separator>`
Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Add to the consumer.close java doc to describe the error handling
behaviour.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>,
TengYao Chi <frankvicky@apache.org>
This reverts commit d86ba7f54a.
Reverting since we are planning to change how KIP-966 is implemented. We
should revert this RPC until we have more clarity on how this KIP will
be executed.
Reviewers: José Armando García Sancio <jsancio@apache.org>
There’s a difference in the two consumers’ `pollForFetches()` methods in
this case: `ClassicKafkaConsumer` doesn't block waiting for data in the
fetch buffer, but `AsyncKafkaConsumer` does.
In `ClassicKafkaConsumer.pollForFetches()`, after enqueuing the `FETCH`
request, the consumer makes a call to `ConsumerNetworkClient.poll()`. In
most cases `poll()` returns almost immediately because it successfully
sent the `FETCH` request. So even when the `pollTimeout` value is, e.g.
3000, the call to `ConsumerNetworkClient.poll()` doesn't block that long
waiting for a response.
After sending out a `FETCH` request, `AsyncKafkaConsumer` then calls
`FetchBuffer.awaitNotEmpty()` and proceeds to block there for the full
length of the timeout. In some cases, the response to the `FETCH` comes
back with no results, which doesn't unblock
`FetchBuffer.awaitNotEmpty()`. So because the application thread is
still waiting for data in the buffer, it remains blocked, preventing any
more `FETCH` requests from being sent, causing the long pauses in the
console consumer.
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Andrew Schofield
<aschofield@confluent.io>
1. Optimize the `equals()`, `hashCode()`, and `toString()` methods in
`OffsetAndMetadata`.
2. Add UT and IT to these modifications.
Reviewers: TengYao Chi <kitingiao@gmail.com>, Sean Quah
<squah@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
The `kafka-share-groups.sh` tool checks whether a topic already has a
start-offset in the share group when resetting offsets. This is not
necessary. By removing the check, it is possible to set a start offset
for a topic which has not yet but will be subscribed in the future, thus
initialising the consumption point.
There is still a small piece of outstanding work to do with resetting
the offset for a non-existent group which should also create the group.
A subsequent PR will be used to address that.
Reviewers: Jimmy Wang <48462172+JimmyWang6@users.noreply.github.com>,
Lan Ding <isDing_L@163.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
Ensure that metrics are retrieved and displayed (when requested) before
ShareConsumer.close() is called. This is important because metrics are
technically supposed to be removed on ShareConsumer.close(), which means
retrieving them after close() would yield an empty map.
Related to https://github.com/apache/kafka/pull/20267.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
#20390 Replace the -`-producer.config` for the verifiable producer and
`--consumer.config` option by `--command-config` for the verifiable
consumer. However, for e2e tests targeting older broker versions, the
original configuration should still be used.
Fix the following tests:
`consumer_protocol_migration_test.py`、`compatibility_test_new_broker_test.py`
and `upgrade_test.py`.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
Refactor help and version handling in command-line tools by replacing
duplicate code with `CommandLineUtils#maybePrintHelpOrVersion`.
Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang
<s7133700@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping
Tsai <chia7712@gmail.com>
*What*
- Currently in `ShareConsumerImpl`, we were not resetting
`background-event-queue-size` metric to 0 after draining the events from
the queue.
- This PR fixes it by using `BackgroundEventHandler::drainEvents`
similar to `AsyncKafkaConsumer`.
- Added a unit test to verify the metric is reset to 0 after draining
the events.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>
This change adds:
- Integration test for `Admin#describeStreamsGroups` API
- Integration test for `Admin#deleteStreamsGroup` API
Reviewers: Alieh Saeedi <asaeedi@confluent.io>, Lucas Brutschy
<lucasbru@apache.org>
---------
Co-authored-by: Lucas Brutschy <lbrutschy@gmail.com>