Commit Graph

16356 Commits

Author SHA1 Message Date
Ken Huang da6a562f6d
KAFKA-17834: Improvements to Dockerfile (#17554)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-22 17:04:42 +02:00
Ken Huang a0640f9517
KAFKA-18351: Remove top-level version field from docker-compose.yml files (#18322)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Sylwester Lachiewicz <slachiewicz@apache.org>, Dávid Szigecsán
2025-09-22 16:57:30 +02:00
Ken Huang 01fccd3513
KAFKA-15186 AppInfo metrics don't contain the client-id (#20493)
All Kafka component register AppInfo metrics to track the application
start time or commit-id etc. These metrics are useful for monitoring and
debugging. However, the AppInfo doesn't provide client-id, which is an
important information for custom metrics reporter.

The AppInfoParser class registers a JMX MBean with the provided
client-id, but when it adds metrics to the Metrics registry, the
client-id is not included. This KIP aims to add the client-id as a tag.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-21 16:28:03 +08:00
Ken Huang 07b786e5bf
KAFKA-19681 Improve MetadataShell tool by skipping missing children and removing zkMigrationState (#20504)
The current `metadata-shell` find command throws an exception due to
child node `zkMigrationState`.
This interrupts the output and makes the CLI less usable.

Additionally, `zkMigrationState` is no longer used in Kafka 4.x, but it
is still defined under image/features, which causes misleading error
messages.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-21 12:35:46 +08:00
Chang-Chi Hsu 5919762009
MINOR: Remove exitMessage.set() call in TopicBasedRemoteLogMetadataManagerTest (#20563)
- **Reasons:** In this case, the `exit(int statusCode)` method invokes
`exit(statusCode, null)`, which means
the `message` argument is always `null` in this code path. As a result,
assigning
`exitMessage` has no effect and can be safely removed.

- **Changes:** Remove a redundant field assignment.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-20 18:04:10 +08:00
Now c49ab6b4ae
MINOR: Optimize map lookup efficiency with getOrDefault (#20229)
Optimized `getRemainingRecords()` method by replacing inefficient
`containsKey() + get()` pattern with `getOrDefault()` to reduce map
lookups from 2 to 1 per partition.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-20 11:52:29 +08:00
Jhen-Yung Hsu 57e9f98e15
KAFKA-19644 Enhance the documentation for producer headers and integration tests (#20524)
- Improve the docs for Record Headers.
- Add integration tests to verify that the order of headers in a record
is preserved when producing and consuming.
- Add unit tests for RecordHeaders.java.

Reviewers: Ken Huang <s7133700@gmail.com>, Hong-Yi Chen
 <apalan60@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-20 11:46:19 +08:00
Lianet Magrans 848e3d0092
KAFKA-19722: Adding missing metric assigned-partitions for new consumer (#20557)
Adding the missing metric to track the number of partitions assigned.
This metric should be registered whenever the consumer is using a
groupId, and should track the number of partitions from the subscription
state, regardless of the subscription type (manual or automatic).

This PR registers the missing metric as part of the
ConsumerRebalanceMetricsManager setup. This manager is created if there
is a group ID, and reused by the consumer membershipMgr and the
streamsMemberhipMgr, so we ensure that the metric is registered for the
new consumer and streams.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, TengYao Chi
 <frankvicky@apache.org>
2025-09-19 12:42:43 -04:00
Lan Ding cfa0b416ef
MINOR: Remove metrics attribute from StreamsGroup (#20559)
The `metrics` attribute in `StreamsGroup` is not used anymore. This
patch removes it.

Reviewers: Ken Huang <s7133700@gmail.com>, Lucas Brutschy
 <lbrutschy@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-19 16:32:41 +08:00
Sean Quah d067c6c040
KAFKA-19716: Clear out coordinator snapshots periodically while loading (#20547)
When nested Timeline collections are created and discarded while loading
a coordinator partition, references to them accumulate in the current
snapshot. Allow the GC to reclaim them by starting a new snapshot and
discarding previous snapshots every 16,384 records.

Small intervals degrade loading times for non-transactional offset
commit workloads while large intervals degrade loading times for
transactional workloads. A default of 16,384 was chosen as a compromise.

Also add a benchmark for group coordinator loading.

Reviewers: David Jacot <djacot@confluent.io>
2025-09-19 09:44:07 +02:00
Ryan Dielhenn b72db2b2c7
MINOR: Delete temporary directories after using them in RaftManagerTest Updated (#20550)
Follow-up to [#11193](https://github.com/apache/kafka/pull/11193). This
change adds cleanup of the temporary log and metadata directories
created by RaftManagerTest so they are removed after each test run.
Without this cleanup, the directories remain until the entire test suite
completes, leaving extra files in the system temporary directory.

Testing:
- Ran `./gradlew core:test --tests kafka.raft.RaftManagerTest` and
confirmed all tests pass.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
2025-09-19 10:22:05 +08:00
Andrew Schofield 5ed4a48829
MINOR: Tighten up argument descriptions for console CLI tools (#20554)
Small improvements to the argument descriptions in the usage messages
for the console producer/consumer tools.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-19 10:10:12 +08:00
Sean Quah dbd2b527d0
MINOR: Fix format in CoordinatorLoaderImplTest (#20548)
Fix indentation in `CoordinatorLoaderImplTest` to be consistent with the
rest of the code in the package.

Reviewers: TengYao Chi <kitingiao@gmail.com>, David Jacot <djacot@confluent.io>
2025-09-18 15:52:03 +02:00
David Jacot 8c8e93c4a1
MINOR: Remove metrics attribute from ConsumerGroup (#20542)
The `metrics` attribute in `ConsumerGroup` is not used anymore. This
patch removes it.

Reviewers: Lianet Magrans <lmagrans@confluent.io>, Chia-Ping Tsai
 <chia7712@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Dongnuo Lyu
 <dlyu@confluent.io>
2025-09-18 11:10:35 +02:00
David Jacot d6fdbfcf15
MINOR: Fix typos in CoordinatorRecordTypeGenerator (#20549)
This patch fixes a few typos in CoordinatorRecordTypeGenerator.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, TengYao Chi
 <frankvicky@apache.org>, Sean Quah <squah@confluent.io>
2025-09-18 16:22:35 +08:00
Jinhe Zhang 04b4a8f571
KAFKA-19705: Enable streams rebalance protocol in IQv2 integration test (#20541)
Update IQv2 Integration tests for streams group protocol

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-09-18 09:41:52 +02:00
Shivsundar R e647bdcee5
MINOR : Fix parantheses in console_consumer.py and console_share_consumer.py (#20552)
*What*
We were missing a parantheses when we invoked a method
`supports_formatter_property()`. This would mean we would get the object
not call the function.
PR fixes this by including parantheses and invoking the actual function.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2025-09-18 11:53:12 +05:30
Lan Ding 9a32a71e76
KAFKA-19699 improve the documentation of `RecordsToDelete` (#20527)
document the behavior of "-1" (HIGH_WATERMARK)

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-17 23:49:42 +08:00
Shivsundar R 3bc50f937c
KAFKA-19623: Implement KIP-1147 for console producer/consumer/share-consumer. (#20479)
*What*
https://issues.apache.org/jira/browse/KAFKA-19623

- The PR implements KIP-1147

(https://cwiki.apache.org/confluence/display/KAFKA/KIP-1147%3A+Improve+consistency+of+command-line+arguments)
for the console tools i.e. `ConsoleProducer`, `ConsoleConsumer` and
`ConsoleShareConsumer`.

- Currently the previous names for the options are still usable but
there will be warning message stating those are deprecated and will be
removed in a future version.
- I have added unit tests and also manually verified using the console
tools that things are working as expected.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Jhen-Yung Hsu
 <jhenyunghsu@gmail.com>, Jimmy Wang
 <48462172+JimmyWang6@users.noreply.github.com>
2025-09-17 15:28:20 +01:00
David Jacot bbbc0cf793
MINOR: Fix format in CoordinatorLoaderImpl (#20538)
The format of the code in `CoordinatorLoaderImpl` in inconsistent with
the rest of the code in the package. This small PR fixes it.

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Sean
 Quah <squah@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-17 17:13:28 +08:00
Jinhe Zhang 8ba41a2d0d
MINOR: Expose internal topic creation errors to the user (#20325)
This PR introduces an ExpiringErrorCache that temporarily stores topic
creation errors, allowing the system to provide detailed failure reasons
in subsequent heartbeat responses.

Key Designs:

Time-based expiration: Errors are cached with a TTL based on the
streams group heartbeat interval (2x heartbeat interval). This ensures
errors remain available for at least one retry cycle while preventing
unbounded growth.    2. Priority queue for efficient expiry: Uses a
min-heap to track entries by expiration time, enabling efficient cleanup
of expired entries during cache operations.    3. Capacity enforcement:
Limits cache size to prevent memory issues under high error rates. When
capacity is exceeded, oldest entries are evicted first.    4. Reference
equality checks: Uses eq for object identity comparison when cleaning up
stale entries, avoiding expensive value comparisons while correctly
handling entry updates.

Reviewers: Lucas Brutschy <lucasbru@apache.org>
2025-09-16 20:52:39 +02:00
Shashank b043ca2074
KAFKA-19683: Remove dead tests and modify tests in TaskManagerTest [1/N] (#20501)
This is the first part of cleaning up of the tests in `TaskManagerTest`
- Removed dead tests
- Added new tests as suggested earlier

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-09-16 20:46:20 +02:00
Lianet Magrans 9f657abf3a
MINOR: Improve consumer rebalance callbacks docs (#20528)
Clarify rebalance callbacks behaviour (got some questions for
onPartitionsAssigned, docs where indeed confusing about the partitions
received in params).   Reviewed all rebalance callbacks with it.

Reviewers: Bill Bejeck<bbejeck@apache.org>
2025-09-16 11:12:19 -04:00
Lucas Brutschy 2c347380b7
KAFKA-19694: Trigger StreamsRebalanceListener in Consumer.close (#20511)
In the consumer, we invoke the consumer rebalance onPartitionRevoked or
onPartitionLost callbacks, when the consumer closes. The point is that
the application may want to commit, or wipe the state if we are closing
unsuccessfully.

In the StreamsRebalanceListener, we did not implement this behavior,
which means when closing the consumer we may lose some progress, and in
the worst case also miss that we have to wipe our local state state
since we got fenced.

In this PR we implement StreamsRebalanceListenerInvoker, very similarly
to ConsumerRebalanceListenerInvoker and invoke it in Consumer.close.

Reviewers: Lianet Magrans <lmagrans@confluent.io>, Matthias J. Sax
 <matthias@confluent.io>, TengYao Chi <frankvicky@apache.org>,
 Uladzislau Blok <123193120+UladzislauBlok@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-16 16:32:47 +02:00
Lan Ding daa7aae0c1
KAFKA-19604 Document controller.quorum.auto.join.enable config in upgrade.html (#20409)
Document controller.quorum.auto.join.enable config in upgrade.html

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-16 16:34:14 +08:00
Jhen-Yung Hsu dddb619177
MINOR: Move RaftManager interface to raft module (#20366)
- Move the `RaftManager` interface to raft module, and remove the
`register` and `leaderAndEpoch` methods since they are already part of
the RaftClient APIs.
- Rename RaftManager.scala to KafkaRaftManager.scala.

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-16 16:19:42 +08:00
Mickael Maison 3cbb2a0aaf
MINOR: Small cleanups in clients (#20530)
- Fix non-constant calls to logging
- Fix assertEquals order
- Fix javadoc

Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-09-16 03:56:11 +08:00
Lianet Magrans caeca090b8
MINOR: Improve producer docs and add tests around timeout behaviour on missing topic/partition (#20533)
Clarify timeout errors received on send if the case is topic not in
metadata vs partition not in metadata.  Add integration tests showcases
the difference  Follow-up from 4.1 fix for misleading timeout error
message (https://issues.apache.org/jira/browse/KAFKA-8862)

Reviewers: TengYao Chi <frankvicky@apache.org>, Kuan-Po Tseng
 <brandboat@gmail.com>
2025-09-15 13:28:27 -04:00
Alex 3fcc0c2877
MINOR: Fix an off-by-one issue in ValuesTest (#20520)
This test case ensures that the parser can convert ISO8601 correctly.
However, when the local time falls on a different day than the UTC time,
there will be an off-by-one issue.

I changed the test to convert the local time and then compare it with
the expected local time. This should fix the off-by-one issue.

[Reference
link](https://github.com/apache/kafka/pull/18611#discussion_r2318146619)

Reviewers: Andrew Schofield <aschofield@confluent.io>

---------

Signed-off-by: Alex <wenhsuan.alexyu@gmail.com>
2025-09-15 18:26:47 +01:00
Lucas Brutschy 8628d74c49
KAFKA-19661 [6/N]: Use heaps also on the process-level (#20523)
In the current solution, we only use a heap to select the right process,
but resort to linear search for selecting a member within a process.
This means use cases where a lot of threads run within the same process
can yield slow assignment. The number of threads in a process shouldn’t
scale arbitrarily (our assumed case for benchmarking of 50 threads in a
single process seems quite extreme already), however, we can optimize
for this case to reduce the runtime further.

Other assignment algorithms assign directly on the member-level, but we
cannot do this in Kafka Streams, since we cannot assign tasks to
processes that already own the task. Defining a heap directly on members
would mean that we may have to skip through 10s of member before finding
one that does not belong to a process that does not yet own the member.

Instead, we can define a separate heap for each process, which keeps the
members of the process by load. We can only keep the heap as long as we
are only changing the load of the top-most member (which we usually do).
This means we keep track of a lot of heaps, but since heaps are backed
by arrays in Java, this should not result in extreme memory
inefficiencies.

In our worst-performing benchmark, this improves the runtime by ~2x on
top of the optimization above.

Also piggybacked are some minor optimizations / clean-ups:   -
initialize HashMaps and ArrayLists with the right capacity   - fix some
comments   - improve logging output

Note that this is a pure performance change, so there are no changes to
the unit tests.

Reviewers: Bill Bejeck<bbejeck@apache.org>
2025-09-15 17:19:53 +02:00
Hong-Yi Chen 749c2d91d5
KAFKA-19609 Move TransactionLogTest to transaction-coordinator module (#20460)
This PR migrates the `TransactionLogTest` from Scala to Java for better
consistency with the rest of the test suite and to simplify future
maintenance.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-15 11:25:54 +08:00
Yunchi Pang e1b7699975
MINOR: Bump versions in CI (#20497)
**Summary**

This PR bumps several GitHub Actions and dependencies used in CI
workflows to their latest stable versions. This ensures our CI
environment remains consistent, secure, and aligned with upstream
improvements.

**Changes**

- requests: 2.32.3 → 2.32.4
- actions/checkout: v4 → v5
- actions/setup-python: v5 → v6
- actions/setup-java: v4 → v5
- actions/download-artifact: v4 → v5
- actions/labeler: v5 → v6

related: https://github.com/apache/kafka/pull/19940/files#r2328391161

Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
 <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-15 10:35:42 +08:00
keemsisi 2fd54837f0
MINOR: Update on fixing tag description missing in javadoc (#20380)
* Added tag description to @throws in method javadoc
* Added explicit throws IndexOffsetOverflowException to method signature

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-15 10:13:49 +08:00
NICOLAS GUYOMAR a9e529236f
MINOR: increase Config change throwable log info to error (#14380)
The ApiError.fromThrowable(t) is going to return a generic
Errors.UNKNOWN_SERVER_ERROR back to the calling client (CLI for
instance) (eg if the broker has an authZ issue with ZK)  and such
UnknownServerException should have a matching ERROR level log in the
broker logs IHMO to make it easier to troubleshoot

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-15 10:04:06 +08:00
Mickael Maison 374bc469c7
MINOR: Cleanups in ops docs (#20532)
- Fix typo in `process.role`
- Fix formatting of quorum description commands

Reviewers: Lan Ding <isDing_L@163.com>, Ken Huang <s7133700@gmail.com>,
TengYao Chi <frankvicky@apache.org>
2025-09-14 20:25:12 +08:00
Chang-Chi Hsu 962f4ada75
KAFKA-19203 Replace `ApiError#exception` by `Error#exception` for KafkaAdminClient (#19623)
This pull request addresses KAFKA-19203 by replacing
`ApiError#exception` with `Error#exception` in `KafkaAdminClient`. The
previous use of `ApiError#exception` was redundant, as we only need the
exception without the additional wrapping of `ApiError`.

## Changes

- Replaced some usages of `ApiError#exception` with `Error#exception` in
`KafkaAdminClient`.
- Simplified exception handling logic to reduce unnecessary layers.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-14 07:20:01 +08:00
Logan Zhu 026710cbb4
MINOR: Update ClusterTestExtensions Javadoc example (#20525)
The previous Javadoc example used the deprecated ClusterType.   It is
now updated to use `types = {Type.KRAFT, Type.CO_KRAFT}`

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-13 06:01:00 +08:00
Chang-Chi Hsu af2a8db3c6
KAFKA-18105 Fix flaky PlaintextAdminIntegrationTest#testElectPreferredLeaders (#20068)
## Changes
This PR improves the stability of the
PlaintextAdminIntegrationTest.testElectPreferredLeaders test by
introducing short Thread.sleep( ) delays before invoking:

- changePreferredLeader( )

- waitForBrokersOutOfIsr( )

## Reasons

- Metadata propagation for partition2 :
Kafka requires time to propagate the updated leader metadata across all
brokers. Without waiting,  metadataCache may return outdated leader
information for partition2.

- Eviction of broker1 from the ISR :
To simulate a scenario where broker1 is no longer eligible as leader,
the test relies on broker1 being removed from the ISR (e.g., due to
intentional shutdown). This eviction is not instantaneous and requires a
brief delay before Kafka reflects the change.

Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi
 <kitingiao@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Ken Huang
 <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-13 05:44:17 +08:00
Maros Orsak 54b88f6721
MINOR: Refactor on FeaturesPublisher and ScramPublisher (#20522)
This PR is a follow-up from https://github.com/apache/kafka/pull/20468.
Basically makes two things:
1. Moves the variable to the catch block as it is used only there.
2. Refactor FeaturesPublisher to handle exceptions the same as
ScramPublisher or other publishers :)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

---------

Signed-off-by: see-quick <maros.orsak159@gmail.com>
2025-09-13 05:24:40 +08:00
Lucas Brutschy 20268330d5
MINOR: Deflake and improve SmokeTestDriverIntegrationTest (#20509)
This improves the SmokeTestDriverIntegrationTest in three ways:

1) If a SmokeTestClient fails (enters a terminal ERROR state), the
SmokeTestDriverIntegrationTest currently times out, because it keeps
waiting for state NOT_RUNNING. This makes debugging quite difficult.
This minor  change makes sure to just fail the test immediately, if a
SmokeTestClient enters the ERROR state.

2) If a test times out or fails prematurely, because a SmokeTestClient
crashed, the SmokeTestClients aren't shut down correctly, which will
affect the following test runs. Therefore, I am adding clean-up logic
that running SmokeTestClients in `@AfterAll`.

3) Finally, I found that the processingThread variation of this thread
triggers a subtle race condition. Since this features is currently not
actively developed, I disabled those variations and created a ticket to
reactivate the test.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Chia-Ping Tsai
 <chia7712@gmail.com>, Bill Bejeck <bill@confluent.io>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-12 10:31:19 +02:00
Shashank dd824a2e74
KAFKA-19666: Remove old restoration codepath from RestoreIntegrationTest [4/N] (#20498)
Clean up `RestoreIntegrationTest.java`

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-09-11 16:06:25 +02:00
jimmy 8a79ea2e5b
KAFKA-19676 EpochState should override close to avoid throwing IOException (#20508)
Jira: [KAFKA-19676](https://issues.apache.org/jira/browse/KAFKA-19676)
All subclasses of EpochState do not throw an IOException when closing,
so catching it is unnecessary. We could override close to remove the
IOException declaration.

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TaiJuWu
 <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-09-11 18:21:35 +08:00
Mickael Maison 865beb6ede
MINOR: Remove explicit version list from upgrade.from docs (#20518)
That config has a `Validator` so we already automatically print the
valid values in the generated docs:
https://kafka.apache.org/documentation/#streamsconfigs_upgrade.from

That will be one less place to upgrade each time we make a new release.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-10 20:47:11 +02:00
Sushant Mahajan ff5025a21c
KAFKA-19695: Fix bug in redundant offset calculation. (#20516)
* The `ShareCoordinatorShard` maintains the the record offset
information for `SharePartitionKey`s in the
`ShareCoordinatorOffsetsManager` class.
* Replay of `ShareSnapshot`s in the shards are reflected in the offsets
manager including records created due to delete state.
* However, if the share partition delete is due to topic delete, no
record will ever be written for the same  `SharePartitionKey` post the
delete tombstone (as topic id will not repeat).
As a result the offset manager will always consider the deleted share
partition's offset as the last redundant one.
* The fix is to make the offset manager aware of the tombstone records
and remove them from the redundant offset calculation.
* Unit tests have been updated for the same.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal
 <apoorvmittal10@gmail.com>
2025-09-10 11:38:34 -05:00
Lucas Brutschy 351203873d
KAFKA-19661 [5/N]: Use below-quota as a condition for standby task assignment (#20458)
In the original algorithm, standby tasks are assigned to a  process that
previously owned the task only if it is  “load-balanced”, meaning the
process has fewer tasks that  members, or it is the least loaded
process. This is strong  requirement, and will cause standby tasks to
often not get  assigned to process that previously owned it.
Furthermore,  the condition “is the least loaded process” is hard to
evaluate efficiently in this context.

We propose to instead use the same “below-quota” condition  as in active
task assignment.

We compute a quota for active and standby tasks, by definiing numOfTasks
= numberOfActiveTasks+numberOfStandbyTasks and  defining the quota as
numOfTasks/numberOfMembers rounded up.  Whenever a member becomes “full”
(its assigned number of tasks  is equal to numOfTasks) we deduct its
tasks from numOfTasks and  decrement numberOfMembers and recompute the
quota, which means  that the quota may be reduced by one during the
assignment  process, to avoid uneven assignments.

A standby task can be assigned to a process that previously  owned it,
whenever the process has fewer than  numOfMembersOfProcess*quota.

This condition will, again, prioritize standby stickyness,  and can be
evaluated in constant time.

In our worst-performing benchmark, this improves the runtime  by 2.5x on
top of the previous optimizations, but 5x on the  more important
incremental assignment case.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2025-09-10 17:55:51 +02:00
Shashank 709c5fab22
KAFKA-19666: Remove old restoration codepath from EosIntegrationTest [5/N] (#20499)
clean up `EosIntegrationTest.java`

Reviewers: Lucas Brutschy <lucasbru@apache.org>
2025-09-10 17:10:46 +02:00
Maros Orsak a244565ed2
KAFKA-18708: Move ScramPublisher to metadata module (#20468)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-09-10 16:50:08 +02:00
Mickael Maison 32b8e326da
MINOR: Add 4.1.0 to streams system tests (#20480)
This PR updates all the streams system tests to include 4.1.0.

Reviewers: Lucas Brutschy <lucasbru@apache.org>
2025-09-10 16:23:55 +02:00
Mickael Maison 1ea221c5e9
MINOR: Add 4.1.0 to core system tests (#20477)
This PR updates all the core system tests to include 4.1.0.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-09-10 10:15:33 +02:00
PoAn Yang 675552a724
KAFKA-19490: Remove usages of distutils in docker scripts (#20178)
The
[distutils](https://docs.python.org/3.13/whatsnew/3.12.html#distutils)
package is removed from Python 3.12.

Change `distutils` usage to `shutil`.

Reviewers: Mickael Maison <mimaison@apache.org>

---------

Signed-off-by: PoAn Yang <payang@apache.org>
2025-09-10 13:02:03 +08:00