kafka

Commit Graph

Author	SHA1	Message	Date
Alyssa Huang	c28f46459a	KAFKA-18345; Prevent livelocked elections (#19658 ) CI / build (push) Waiting to run Details At the retry limit binaryExponentialElectionBackoffMs it becomes statistically likely that the exponential backoff returned electionBackoffMaxMs. This is an issue as multiple replicas can get stuck starting elections at the same cadence. This change fixes that by added a random jitter to the max election backoff. Reviewers: José Armando García Sancio <jsancio@apache.org>, TaiJuWu <tjwu1217@gmail.com>, Yung <yungyung7654321@gmail.com>	2025-05-12 16:23:18 -04:00
Jonah Hooper	13fa4537f5	KAFKA-18905; Disable idempotent producer to remove test flakiness (#19644 ) As a result of KAFKA-18905 the reassign test will often have test failures which are unrelated to the actual reassignment of partitions. This failure is mentioned in KAFKA-9199. Quote from KAFKA-9199: "This issue popped up in the reassignment system test. It ultimately caused the test to fail because the producer was stuck retrying the duplicate batch repeatedly until ultimately giving up." Disabling the idempotent producer circumvents this issue and allows the reassignment system tests to succeed reliably. The reassignment test still check that produce batches were not lost. Reviewers: José Armando García Sancio <jsancio@apache.org>	2025-05-12 15:41:02 -04:00
Andrew Schofield	7b8633e36f	MINOR: Add deprecation of listConsumerGroups to upgrade.html (#19684 ) CI / build (push) Waiting to run Details As part of KIP-1043, `Admin.listConsumerGroups()` and variants have been deprecated. This is because there are now 4 types of group and listing has been consolidated under `Admin.listGroups()`. This PR adds the deprecation information to the upgrade documentation. Reviewers: Lianet Magrans <lmagrans@confluent.io>	2025-05-12 18:08:11 +01:00
ChickenchickenLove	62bec20aef	KAFKA-19242: Fix commit bugs caused by race condition during rebalancing. (#19631 ) ### Motivation While investigating “events skipped in group rebalancing” ([spring‑projects/spring‑kafka#3703](https://github.com/spring-projects/spring-kafka/issues/3703)) I discovered a race condition between - the main poll/commit thread, and - the consumer‑coordinator heartbeat thread. If the main thread enters `ConsumerCoordinator.sendOffsetCommitRequest()` while the heartbeat thread is finishing a rebalance (`SyncGroupResponseHandler.handle()`), the group state transitions in the following order: ``` COMPLETING_REBALANCE → (race window) → STABLE ``` Because we read the state twice without a lock: 1. `generationIfStable()` returns `null` (state still `COMPLETING_REBALANCE`), 2. the heartbeat thread flips the state to `STABLE`, 3. the main thread re‑checks with `rebalanceInProgress()` and wrongly decides that a rebalance is still active, 4. a spurious `CommitFailedException` is returned even though the commit could succeed. For more details, please refer to sequence diagram below. <img width="1494" alt="image" src="https://github.com/user-attachments/assets/90f19af5-5e2d-4566-aece-ef764df2d89c" /> ### Impact - The exception is semantically wrong: the consumer is in a stable group, but reports failure. - Frameworks and applications that rely on the semantics of `CommitFailedException` and `RetryableCommitException` (for example `Spring Kafka`) take the wrong code path, which can ultimately skip the events and break “at‑most‑once” guarantees. ### Fix We enlarge the synchronized block in `ConsumerCoordinator.sendOffsetCommitRequest()` so that the consumer group state is examined atomically with respect to the heartbeat thread: ### Jira https://issues.apache.org/jira/browse/KAFKA-19242 https: //github.com/spring-projects/spring-kafka/issues/3703 Signed-off-by: chickenchickenlove <ojt90902@naver.com> Reviewers: David Jacot <david.jacot@gmail.com>	2025-05-12 17:01:29 +02:00
Sean Quah	eb3714f022	KAFKA-19160;KAFKA-19164; Improve performance of fetching stable offsets (#19497 ) CI / build (push) Waiting to run Details When fetching stable offsets in the group coordinator, we iterate over all requested partitions. For each partition, we iterate over the group's ongoing transactions to check if there is a pending transactional offset commit for that partition. This can get slow when there are a large number of partitions and a large number of pending transactions. Instead, maintain a list of pending transactions per partition to speed up lookups. Reviewers: Shaan, Dongnuo Lyu <dlyu@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, David Jaco <djacot@confluent.io>	2025-05-12 00:32:17 -07:00
Matthias J. Sax	b66729e231	MINOR: fit HTML markup (#19676 ) CI / build (push) Waiting to run Details Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-05-11 16:20:25 -07:00
Ming-Yen Chung	57ae6d6706	KAFKA-18695 Remove quorum=kraft and kip932 from all integration tests (#19633 ) CI / build (push) Waiting to run Details Currently, the quorum uses kraft by default, so there's no need to specify it explicitly. For kip932 and isShareGroupTest, they are no longer used after #19542 . Reviewers: PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-12 01:28:30 +08:00
Kuan-Po Tseng	54fd1361e5	KAFKA-19264 Remove fallback for thread pool sizes in RemoteLogManagerConfig (#19673 ) The fallback mechanism for `remote.log.manager.copier.thread.pool.size` and `remote.log.manager.expiration.thread.pool.size` defaulting to `remote.log.manager.thread.pool.size` was introduced in KIP-950. This approach was abandoned in KIP-1030, where default values were changed from -1 to 10, and a configuration validator enforcing a minimum value of 1 was added. As a result, this commit removes the fallback mechanism from `RemoteLogManagerConfig.java` to align with the new defaults and validation. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-05-11 23:48:45 +08:00
Yunchi Pang	f588fa0643	MINOR: Move TxnTransitMetadata to transaction-coordinator (#19662 ) Migrates the `TxnTransitMetadata` class from scala to java, moving it from to the `transaction-coordinator` module. Reviewers: PoAn Yang <payang@apache.org>, Nick Guo <lansg0504@gmail.com>, Ken Huang <s7133700@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-11 22:36:47 +08:00
Nick Guo	707a44a6cb	KAFKA-19068 Eliminate the duplicate type check in creating ControlRecord (#19346 ) CI / build (push) Waiting to run Details jira: https://issues.apache.org/jira/browse/KAFKA-19068 `RecordsIterator#decodeControlRecord` do the type check and then `ControlRecord` constructor does that again. we should add a static method to ControlRecord to create `ControlRecord` with type check, and then `ControlRecord` constructor should be changed to private to ensure all instance is created by the static method. Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-11 00:07:00 +08:00
PoAn Yang	61cb33f347	KAFKA-19109 Don't print null in kafka-metadata-quorum describe status (#19543 ) If directory id is `Uuid.ZERO_UUID`, the command don't print the result. Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-10 23:48:05 +08:00
xijiu	3696c49788	KAFKA-19220 Add tests to ensure the internal configs don't return by public APIs by default (#19650 ) Add tests to check whether the results returned by the API `createTopics` and `describeConfigs` contain internal configurations. Reviewers: PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-10 23:13:58 +08:00
Hong-Yi Chen	a7eae28a67	MINOR: Replaced internal KafkaConfig field in TransactionLogConfig (#19482 ) Retaining a reference to ```AbstractConfig``` introduced coupling and potential inconsistencies with dynamic config updates. This change simplifies ```TransactionLogConfig``` into a POJO by removing the internal ```AbstractConfig``` field, and aligns with feedback from #19439 Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi <frankvicky@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-10 23:06:18 +08:00
Stanislav Kozlovski	0bc8d0c962	MINOR: Add documentation about KIP-405 remote reads serving just one partition per FetchRequest (#19336 ) [As discussed in the mailing list](https://lists.apache.org/thread/m03mpkm93737kk6d1nd6fbv9wdgsrhv9), the broker only fetches remote data for ONE partition in a given FetchRequest. In other words, if a consumer sends a FetchRequest requesting 50 topic-partitions, and each partition's requested offset is not stored locally - the broker will fetch and respond with just one partition's worth of data from the remote store, and the rest will be empty. Given our defaults for total fetch response is 50 MiB and per partition is 1 MiB, this can limit throughput. This patch documents the behavior in 3 configs - `fetch.max.bytes`, `max.partition.fetch.bytes` and `remote.fetch.max.wait.ms` Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Satish Duggana <satishd@apache.org>	2025-05-10 16:48:55 +05:30
Alyssa Huang	042be5b9ac	MINOR: Fix some Request toString methods (#19655 ) CI / build (push) Waiting to run Details Reviewers: Colin P. McCabe <cmccabe@apache.org>	2025-05-09 23:42:34 -07:00
Chia-Ping Tsai	064afe2c65	MINOR: add xijiu from asf.yaml in order to resend invitation (#19663 ) CI / build (push) Waiting to run Details @xijiu's invitation is timeout, so we have to remove the name, then re-add it in a new commit. see https://issues.apache.org/jira/browse/INFRA-26796 Reviewers: PoAn Yang <payang@apache.org>, xijiu <422766572@qq.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>	2025-05-10 05:42:54 +08:00
Matthias J. Sax	0b81d6c780	MINOR: avoid double brace initialization (#19667 ) Reviewers: Bill Bejeck <bill@confluent.io>	2025-05-09 11:52:01 -07:00
Shivsundar R	58c08441d1	KAFKA-19229: Ignore background errors while closing share consumers. (Fix flaky test) (#19647 ) CI / build (push) Waiting to run Details - A couple of newly added tests were found to be flaky in `AuthorizerIntegrationTest.scala`. - `testShareGroupDescribeWithGroupDescribeAndTopicDescribeAcl` and `testShareGroupDescribeWithoutGroupDescribeAcl`. These tests pass locally, so could not replicate the failure. - But logs from develocity indicated that the test fails when the following condition happens : When the background error event arrives after the consumer had unsubscribed, then these events are processed in the `handleCompletedAcknowledgements` method and the exception from the event is thrown, preventing `close()` to complete. - We need to handle this race condition where we might get the background event after unsubscribe and before processing the callbacks. - PR fixes this by ignoring the exceptions in the background queue when the `handleCompletedAcknowledgements` method is called during `close()`. This ensures `close()` completes successfully. - Have added a unit test which mimics the race condition as well. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-05-09 11:20:09 +01:00
Andrew Schofield	70c0aca4b7	KAFKA-17897: Deprecate Admin.listConsumerGroups [2/N] (#19508 ) CI / build (push) Waiting to run Details Admin.listConsumerGroups() was able to use the early versions of ListGroups RPC with the version used dependent upon the filters the user specified. Admin.listGroups(ListGroupsOptions.forConsumerGroups()) inadvertently required ListGroups v5 because it always set a types filter. This patch handles the UnsupportedVersionException and winds back the complexity of the request unless the user has specified filters which demand a higher version. It also adds ListGroupsOptions.forShareGroups() and forStreamsGroups(). The usability of Admin.listGroups() is much improved as a result. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang <payang@apache.org>	2025-05-09 08:38:16 +01:00
ShihYuan Lin	1ccaddaa70	KAFKA-19209: Clarify index.interval.bytes impact on offset and time index (#19657 ) Update docs to note index.interval.bytes sets entry frequency for offset index and, conditionally, time index. Improve clarity and readability of index.interval.bytes description. Reviewers: Luke Chen <showuon@gmail.com>	2025-05-09 09:48:55 +08:00
Manoj	b5c468fd7c	KAFKA-18115; Fix for loading big files while performing load tests (#18391 ) CI / build (push) Waiting to run Details When performing perf tests, we can specify a payload using the "--payloadFile" flag. This file is utilized during the load/performance testing process. This causes the entire file to get loaded into a String and split using the delimiter. However, if the file is large, it may result in NegativeArraySizeException error. Moving the file loading logic to Scanner which doesn't have this issue. Reviewers: José Armando García Sancio <jsancio@apache.org>, Ken Huang <s7133700@gmail.com>, Zhe Guang <zheguang.zhao@alumni.brown.edu>	2025-05-08 17:08:36 -04:00
Chia-Ping Tsai	99ecd5ca08	MINOR: remove xijiu from asf.yaml in order to resend invitation (#19660 ) @xijiu's invitation is timeout, so we have to remove the name, then re-add it in a new commit. see https://issues.apache.org/jira/browse/INFRA-26796 Reviewers: Chih-Yuan Chien <joshua2519@gmail.com>, Kuan-Po Tseng <brandboat@gmail.com>, PoAn Yang <payang@apache.org>, yunchi <yunchipang@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Hong-Yi Chen <apalan60@gmail.com>, Bolin Lin <linbolin1230@gmail.com>, Shih-Yuan Lin <shmily7829@gmail.com>, Mirai1129 <minecraftmiku831@gmail.com>	2025-05-09 02:07:14 +08:00
Uladzislau Blok	0076b65f99	KAFKA-19182 Move SchedulerTest to server module (#19608 ) CI / build (push) Waiting to run Details This PR moves SchedulerTest to server module and rewrite it with java. Please also check updated import control config! Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-09 00:02:38 +08:00
PoAn Yang	9e785cee8d	KAFKA-19087 Move TransactionState to transaction-coordinator module (#19568 ) Move TransactionState to transaction-coordinator module and rewrite it as Java. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-05-08 23:51:51 +08:00
David Jacot	98e535b524	MINOR: Simplify OffsetFetchResponse (#19642 ) While working on https://github.com/apache/kafka/pull/19515, I came to the conclusion that the OffsetFetchResponse is quite messy and overall too complicated. This patch rationalize the constructors. OffsetFetchResponse has a single constructor accepting the OffsetFetchResponseData. A builder is introduced to handle the down conversion. This will also simplify adding the topic ids. All the changes are mechanical, replacing data structures by others. Reviewers: Lianet Magrans <lmagrans@confluent.io>	2025-05-08 14:57:45 +02:00
Apoorv Mittal	2dd6126b5d	KAFKA-18855 Slice API for MemoryRecords (#19581 ) CI / build (push) Waiting to run Details The PR adds `slice` API in `Records.java` and further implementation in `MemoryRecords`. With the addition of ShareFetch and it's support to read from TieredStorage, where ShareFetch might acquire subset of fetch batches and TieredStorage emits MemoryRecords, hence a slice API is needed for MemoryRecords as well to limit the bytes transferred (if subset batches are acquired). MemoryRecords are sliced using `duplicate` and `slice` API of ByteBuffer, which are backed by the original buffer itself hence no-copy is created rather position, limit and offset are changed as per the new position and length. Reviewers: Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-08 14:02:25 +08:00
Calvin Liu	3094ce2c20	KAFKA-19212: Correct the unclean leader election metric calculation (#19590 ) CI / build (push) Waiting to run Details The current ElectionWasClean checks if the new leader is in the previous ISR. However, there is a corner case in the partition reassignment. The partition reassignment can change the partition replicas. If the new preferred leader (the first one in the new replicas) is the last one to join ISR, this preferred leader will be elected in the same partition change. For example: In the previous state, the partition is Leader: 0, Replicas (2,1,0), ISR (1,0), Adding(2), removing(0). Then replica 2 joins the ISR. The new partition would be like: Leader: 2, Replicas (2,1), ISR(1,2). The new leader 2 is not in the previous ISR (1,0) but it is still a clean election. Reviewers: Jun Rao <junrao@gmail.com>	2025-05-07 13:26:53 -07:00
Lianet Magrans	67b46fec15	MINOR: introduce structure to keep member assignment with topic Ids (#19645 ) - Add new DS to wrap the member assignment (containing topic Ids, names and partitions), to easily access the data as needed. This will be used in following PR to integrate assignment with topic IDs into the subscription state. - Improve logging on the client assignment/reconciliation path No changes in logic. Reviewers: TengYao Chi <frankvicky@apache.org>, Andrew Schofield <aschofield@confluent.io>	2025-05-07 13:57:56 -04:00
Kirk True	d3707fc815	KAFKA-19214: Clean up use of Optionals in RequestManagers.entries() (#19609 ) Change: `public List<Optional<? extends RequestManager>> entries();` to: `public List<RequestManager> entries();` and clean up the callers. Reviewers: TengYao Chi <kitingiao@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-07 17:18:12 +01:00
Kevin Wu	6cb6aa2030	MINOR; Add `--standalone --ignore-formatted` formatter test (#19643 ) CI / build (push) Waiting to run Details This PR adds an additional test case to `FormatterTest` that checks that formatting with `--standalone` and then formatting again with `--standalone --ignore-formatted` is indeed a no-op. Reviewers: José Armando García Sancio <jsancio@apache.org>	2025-05-07 10:41:18 -04:00
Chirag Wadhwa	f3a4a1b185	KAFKA-19241: Updated tests in ShareFetchAcknowledgeRequestTest to reuse the socket for subsequent requests (#19640 ) Currently in the tests in ShareFetchAcknowledgeRequestTest, subsequent share fetch / share acknowledge requests creates a new socket everytime, even when the requests are sent by the same member. In reality, a single share consumer clisnet will reuse the same socket for all the share related requests in its lifetime. This PR changes the behaviour in the tests to align with reality and reuse the same socket for all requests by the same share group member. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>	2025-05-07 14:00:41 +01:00
yunchi	d034268312	MINOR: Remove ConstantBrokerOrActiveKController (#19654 ) `ConstantBrokerOrActiveKController` was introduced in #14399, to provide a mechanism for selecting the least loaded broker or the active controller when using `bootstrap.controllers`. Usage was removed in #18002, after `alterConfigs` was deprecated in Kafka 2.4.0. Reviewers: PoAn Yang <payang@apache.org>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Ken Huang <s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-07 20:23:29 +08:00
Abhinav Dixit	33abb655eb	KAFKA-19215: Handle share partition fetch lock cleanly using tokens (#19598 ) ### About Added code to handle share partition fetch lock cleanly in `DelayedShareFetch` to avoid a member incorrectly releasing a share partition's fetch lock ### Testing The code has been tested with the help of unit tests and integration tests. Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>	2025-05-07 11:13:53 +01:00
Lucas Brutschy	3f465fc1b6	KAFKA-19202: Enable KIP-1071 in streams_standby_replica_test.py (#19625 ) New system test for KIP-1071. Standby replicas need to be enabled via `kafka-configs.sh`. Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2025-05-07 09:43:11 +02:00
Lan Ding	e1da318722	MINOR: add boundary IT for delivery count (#19649 ) CI / build (push) Waiting to run Details see https://github.com/apache/kafka/pull/19430#pullrequestreview-2809619176 Add boundary IT for delivery count. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>	2025-05-06 22:05:02 +01:00
Kevin Wu	7953092108	MINOR: support ipv6 in ducker-ak (#19537 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>, Kirk True <kirk@kirktrue.pro>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Ian McDonald <ian_mcdonald@rocketmail.com>	2025-05-06 13:55:18 -07:00
José Armando García Sancio	2df14b1190	MINOR; Log message for unexpected buffer allocation (#19596 ) Log a message when reading a batch that is larger than the currently allocated batch. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, PoAn Yang <payang@apache.org>	2025-05-06 12:01:49 -04:00
Andrew Schofield	7d027a4d83	KAFKA-19218: Add missing leader epoch to share group state summary response (#19602 ) CI / build (push) Waiting to run Details When the persister is responding to a read share-group state summary request, it has no way of including the leader epoch in its response, even though it has the information to hand. This means that the leader epoch information is not initialised in the admin client operation to list share group offsets, and this then means that the information cannot be displayed in kafka-share-groups.sh. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Sushant Mahajan <smahajan@confluent.io>	2025-05-06 14:53:12 +01:00
Dmitry Werner	0810650da1	MINOR: Small cleanups in clients tests (#19634 ) - Removed unused fields and methods in clients tests - Fixed IDEA code inspection warnings Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang <payang@apache.org>, Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, TengYao Chi <frankvicky@apache.org>	2025-05-06 20:19:21 +08:00
PoAn Yang	424e7251d6	KAFKA-19207 Move ForwardingManagerMetrics and ForwardingManagerMetricsTest to server module (#19574 ) 1. Move `ForwardingManagerMetrics` and `ForwardingManagerMetricsTest` to server module. 2. Rewrite them in Java. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-05-06 20:03:17 +08:00
yunchi	4e77466f6a	KAFKA-19170 Move MetricsDuringTopicCreationDeletionTest to client-integration-tests module (#19528 ) rewrite `MetricsDuringTopicCreationDeletionTest` to `ClusterTest` infra and move it to clients-integration-tests module. Reviewers: PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-06 19:57:16 +08:00
Alieh Saeedi	54b3b3debc	MINOR: Convert streams group options to consumer group options in Admin APIs (#19583 ) This PR is fixing the issue introduced in #19120 The input `StreamsGroup`-options must not be ignored, but it must be converted to `ConsumerGroup`-options. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>	2025-05-06 13:26:56 +02:00
Andrew Schofield	d2bd68d50c	MINOR: Improve output for delete-offset of kafka-consumer-groups.sh (#19610 ) The output from the delete-offsets option of kafka-consumer-groups.sh can be improved. For example, the column widths are excessive which looks untidy, and the output messages can be improved. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>	2025-05-06 12:20:36 +01:00
Apoorv Mittal	ac9520b922	KAFKA-19227: Piggybacked share fetch acknowledgements performance issue (#19612 ) The PR fixes the issue when ShareAcknowledgements are piggybacked on ShareFetch. The current default configuration in clients sets `batch size` and `max fetch records` as per the `max.poll.records` config, default 500. Which means all records in a single poll will be fetched and acknowledged. Also the default configuration for inflight records in a partition is 200. Which means prior fetch records has to be acknowledged prior fetching another batch from share partition. The piggybacked share fetch-acknowledgement calls from KafkaApis are async and later the response is combined. If respective share fetch starts waiting in purgatory because all inflight records are currently full, hence when startOffset is moved as part of acknowledgement, then a trigger should happen which should try completing any pending share fetch requests in purgatory. Else the share fetch requests wait in purgatory for timeout though records are available, which dips the share fetch performance. The regular fetch has a single criteria to land requests in purgatory, which is min bytes criteria, hence any produce in respective topic partition triggers to check any pending fetch requests. But share fetch can wait in purgatory because of multiple reasons: 1) Min bytes 2) Inflight records exhaustion 3) Share partition fetch lock competition. The trigger already happens for 1 and current PR fixes 2. We will investigate further if there should be any handling required for 3. Reviewers: Abhinav Dixit <adixit@confluent.io>, Andrew Schofield <aschofield@confluent.io>	2025-05-06 09:58:25 +01:00
Luke Chen	9823d6781c	MINOR: exclude error_prone_annotations lib from caffeine dependency (#19638 ) CI / build (push) Waiting to run Details In https://github.com/apache/kafka/pull/16578 , we tried to exclude both `checker-qual` and `error_prone_annotations`, but when excluding `error_prone_annotations`, the compilation failed. So in the end, we only excluded `checker-qual` and shipped `error_prone_annotations.jar` to users. In Kafka v4.0.0, thanks to jdk 8 removal, we upgraded caffeine to the latest v3.1.8, instead of v2.x.x, and now, we can successfully pass the compilation without error after excluding `error_prone_annotations` from `caffeine`. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ken Huang <s7133700@gmail.com>	2025-05-06 08:42:52 +08:00
Matthias J. Sax	c8005a543e	MINOR: improve AdjustStreamThreadCountTest (#19617 ) CI / build (push) Waiting to run Details The test is failing once in a while but there is not enough information in the logs to determine the root cause. Adding more information, and fixing thread resource leak. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>	2025-05-05 13:23:50 -07:00
Abhinav Dixit	caf4a6cc5f	KAFKA-19216: Eliminate flakiness in kafka.server.share.SharePartitionTest (#19639 ) ### About 11 of the test cases in `SharePartitionTest` have failed at least once in the past 28 days. https://develocity.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Europe%2FLondon&tests.container=kafka.server.share.SharePartitionTest Observing the flakiness, they seem to be caused due to the usage of `SystemTimer` for various acquisition lock timeout related tests. I have replaced the usage of `SystemTimer` with `MockTimer` and also improved the `MockTimer` API with regard to removing the timer task entries that have already been cancelled. Also, this has reduced the time taken to run `SharePartitionTest` from ~6 sec to ~1.5 sec ### Testing The testing has been done with the help of already present unit tests in Apache Kafka. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-05-05 20:04:22 +01:00
Abhinav Dixit	81c3a285a4	KAFKA-19133: Support fetching for multiple remote fetch topic partitions in a single share fetch request (#19592 ) ### About This PR removes the limitation in remote storage fetch for share groups of only performing remote fetch for a single topic partition in a share fetch request. With this PR, share groups can now fetch multiple remote storage topic partitions in a single share fetch request. ### Testing I have followed the [AK documentation](https://kafka.apache.org/documentation/#tiered_storage_config_ex) to test my code locally (by adopting `LocalTieredStorage.java`) and verify with the help of logs that remote storage is happening for multiple topic partitions in a single share fetch request. Also, verified it with the help of unit tests. Reviewers: Jun Rao <junrao@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>	2025-05-05 19:42:02 +01:00
Hong-Yi Chen	c4dc78746e	KAFKA-18537 Fix flaky RemoteIndexCacheTest#testCleanerThreadShutdown (#19628 ) Add a wait for cleaner thread shutdown in `testCleanerThreadShutdown` to eliminate flakiness. After calling `cache.close()`, the test now uses `TestUtils.waitForCondition` to poll until the background “remote-log-index-cleaner” thread has fully exited before asserting that no cleaner threads remain. This ensures the asynchronous shutdown always completes before the final assertions. Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-06 01:05:34 +08:00
TaiJuWu	19530738c4	KAFKA-19240 Move MetadataVersionIntegrationTest to clients-integration-tests module (#19641 ) The PR do following: 1. Move MetadataVersionIntegrationTest to clients-integration-tests module 2. rewrite to java from scala Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-05-06 00:12:57 +08:00

1 2 3 4 5 ...

15748 Commits All Branches Search

15748 Commits

All Branches