kafka

Commit Graph

Author	SHA1	Message	Date
Jinhe Zhang	8ba41a2d0d	MINOR: Expose internal topic creation errors to the user (#20325 ) This PR introduces an ExpiringErrorCache that temporarily stores topic creation errors, allowing the system to provide detailed failure reasons in subsequent heartbeat responses. Key Designs: Time-based expiration: Errors are cached with a TTL based on the streams group heartbeat interval (2x heartbeat interval). This ensures errors remain available for at least one retry cycle while preventing unbounded growth. 2. Priority queue for efficient expiry: Uses a min-heap to track entries by expiration time, enabling efficient cleanup of expired entries during cache operations. 3. Capacity enforcement: Limits cache size to prevent memory issues under high error rates. When capacity is exceeded, oldest entries are evicted first. 4. Reference equality checks: Uses eq for object identity comparison when cleaning up stale entries, avoiding expensive value comparisons while correctly handling entry updates. Reviewers: Lucas Brutschy <lucasbru@apache.org>	2025-09-16 20:52:39 +02:00
Jhen-Yung Hsu	dddb619177	MINOR: Move RaftManager interface to raft module (#20366 ) - Move the `RaftManager` interface to raft module, and remove the `register` and `leaderAndEpoch` methods since they are already part of the RaftClient APIs. - Rename RaftManager.scala to KafkaRaftManager.scala. Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-09-16 16:19:42 +08:00
Lianet Magrans	caeca090b8	MINOR: Improve producer docs and add tests around timeout behaviour on missing topic/partition (#20533 ) Clarify timeout errors received on send if the case is topic not in metadata vs partition not in metadata. Add integration tests showcases the difference Follow-up from 4.1 fix for misleading timeout error message (https://issues.apache.org/jira/browse/KAFKA-8862) Reviewers: TengYao Chi <frankvicky@apache.org>, Kuan-Po Tseng <brandboat@gmail.com>	2025-09-15 13:28:27 -04:00
Hong-Yi Chen	749c2d91d5	KAFKA-19609 Move TransactionLogTest to transaction-coordinator module (#20460 ) This PR migrates the `TransactionLogTest` from Scala to Java for better consistency with the rest of the test suite and to simplify future maintenance. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-09-15 11:25:54 +08:00
NICOLAS GUYOMAR	a9e529236f	MINOR: increase Config change throwable log info to error (#14380 ) The ApiError.fromThrowable(t) is going to return a generic Errors.UNKNOWN_SERVER_ERROR back to the calling client (CLI for instance) (eg if the broker has an authZ issue with ZK) and such UnknownServerException should have a matching ERROR level log in the broker logs IHMO to make it easier to troubleshoot Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-09-15 10:04:06 +08:00
Chang-Chi Hsu	af2a8db3c6	KAFKA-18105 Fix flaky PlaintextAdminIntegrationTest#testElectPreferredLeaders (#20068 ) ## Changes This PR improves the stability of the PlaintextAdminIntegrationTest.testElectPreferredLeaders test by introducing short Thread.sleep( ) delays before invoking: - changePreferredLeader( ) - waitForBrokersOutOfIsr( ) ## Reasons - Metadata propagation for partition2 : Kafka requires time to propagate the updated leader metadata across all brokers. Without waiting, metadataCache may return outdated leader information for partition2. - Eviction of broker1 from the ISR : To simulate a scenario where broker1 is no longer eligible as leader, the test relies on broker1 being removed from the ISR (e.g., due to intentional shutdown). This eviction is not instantaneous and requires a brief delay before Kafka reflects the change. Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi <kitingiao@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-09-13 05:44:17 +08:00
Maros Orsak	54b88f6721	MINOR: Refactor on FeaturesPublisher and ScramPublisher (#20522 ) This PR is a follow-up from https://github.com/apache/kafka/pull/20468. Basically makes two things: 1. Moves the variable to the catch block as it is used only there. 2. Refactor FeaturesPublisher to handle exceptions the same as ScramPublisher or other publishers :) Reviewers: Chia-Ping Tsai <chia7712@gmail.com> --------- Signed-off-by: see-quick <maros.orsak159@gmail.com>	2025-09-13 05:24:40 +08:00
Maros Orsak	a244565ed2	KAFKA-18708: Move ScramPublisher to metadata module (#20468 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2025-09-10 16:50:08 +02:00
lucliu1108	0bc2c6e699	MINOR: Move topic creation before consumer creation in testListGroups integration test (#20496 ) This PR moves the topic creation before consumer creations in `PlaintextAdminIntegrationTest.testListGroups`, to avoid potential errors if consumer creates topic due to metadata update. See discussion https://github.com/apache/kafka/pull/20244#discussion_r2325557949 Reviewers: @chia7712, bbejeck@apache.org	2025-09-09 18:05:45 -04:00
Chirag Wadhwa	d5e624e918	KAFKA-19693: Added PersisterBatch record in Share Partition which includes updatedState and stateBatch (#20507 ) The method rollbackOrProcessStateUpdates in SharePartition received 2 separate lists of updatedStates (InFlightState) and stateBatches (PersisterStateBatch). This PR introduces a new subclass called `PersisterBatch` which encompasses both these objects. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>	2025-09-09 11:21:42 +01:00
lucliu1108	f6f5b4cb27	KAFKA-19565: Integration test for Streams-related Admin APIs [2/N] (#20266 ) Integration tests for Stream Admin related API Previous PR: https://github.com/apache/kafka/pull/20244 This one adds: - Integration test for Admin#listStreamsGroupOffsets API - Integration test for Admin#deleteStreamsGroupOffsets API - Integration test for Admin#alterStreamsGroupOffsets API Reviewers: Alieh Saeedi <asaeedi@confluent.io>, Lucas Brutschy <lucasbru@apache.org>	2025-09-09 10:30:39 +02:00
Sanskar Jhajharia	3c7f99ad31	MINOR: Cleanup Server Module (#20180 ) As the PR title suggests, this PR is an attempt to perform some cleanups in the server module. The changes are mostly around the use of Record type for some classes, changes to use enhanced switch, etc. Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-09-08 07:01:09 +08:00
Ken Huang	0a12eaa80e	KAFKA-19112 Unifying LIST-Type Configuration Validation and Default Values (#20334 ) We add the three main changes in this PR - Disallowing null values for most LIST-type configurations makes sense, since users cannot explicitly set a configuration to null in a properties file. Therefore, only configurations with a default value of null should be allowed to accept null. - Disallowing duplicate values is reasonable, as there are currently no known configurations in Kafka that require specifying the same value multiple times. Allowing duplicates is both rare in practice and potentially confusing to users. - Disallowing empty list, even though many configurations currently accept them. In practice, setting an empty list for several of these configurations can lead to server startup failures or unexpected behavior. Therefore, enforcing non-empty lists helps prevent misconfiguration and improves system robustness. These changes may introduce some backward incompatibility, but this trade-off is justified by the significant improvements in safety, consistency, and overall user experience. Additionally, we introduce two minor adjustments: - Reclassify some STRING-type configurations as LIST-type, particularly those using comma-separated values to represent multiple entries. This change reflects the actual semantics used in Kafka. - Update the default values for some configurations to better align with other configs. These changes will not introduce any compatibility issues. Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-09-06 01:25:55 +08:00
Jonah Hooper	29ce96151c	MINOR; Revert "KAFKA-18681: Created GetReplicaLogInfo RPCs (#19664 )" (#20371 ) This reverts commit `d86ba7f54a`. Reverting since we are planning to change how KIP-966 is implemented. We should revert this RPC until we have more clarity on how this KIP will be executed. Reviewers: José Armando García Sancio <jsancio@apache.org>	2025-09-05 11:31:50 -04:00
lucliu1108	a81f08d368	KAFKA-19550: Integration test for Streams-related Admin APIs [1/N] (#20244 ) This change adds: - Integration test for `Admin#describeStreamsGroups` API - Integration test for `Admin#deleteStreamsGroup` API Reviewers: Alieh Saeedi <asaeedi@confluent.io>, Lucas Brutschy <lucasbru@apache.org> --------- Co-authored-by: Lucas Brutschy <lbrutschy@gmail.com>	2025-09-04 15:09:21 +02:00
PoAn Yang	ea5b5fec32	KAFKA-19432 Add an ERROR log message if broker.heartbeat.interval.ms is too large (#20046 ) * Log error message if `broker.heartbeat.interval.ms * 2` is large than `broker.session.timeout.ms`. * Add test case `testLogBrokerHeartbeatIntervalMsShouldBeLowerThanHalfOfBrokerSessionTimeoutMs`. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-09-04 03:40:21 +08:00
Matthias J. Sax	342a8e6773	MINOR: suppress build warning (#20424 ) Suppress build warning. Reviewers: TengYao Chi <frankvicky@apache.org>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-09-01 11:12:11 -07:00
Lan Ding	4f2114a49e	KAFKA-19645 add a lower bound to num.replica.fetchers (#20414 ) Add a lower bound to num.replica.fetchers. Reviewers: PoAn Yang <payang@apache.org>, TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>, jimmy <wangzhiwang611@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-08-31 11:12:57 +08:00
Kuan-Po Tseng	26fea78ae1	MINOR: Remove default config in creating internal stream topic (#20421 ) Cleanup default configs in AutoTopicCreationManager#createStreamsInternalTopics. The streams protocol would like to be consistent with the kafka streams using the classic protocol - which would create the internal topics using CreateTopic and therefore use the controller config. Reviewers: Lucas Brutschy <lucasbru@apache.org>	2025-08-29 15:23:53 +02:00
Apoorv Mittal	7eeb5c8344	MINOR: Removing incorrect multi threaded state transition tests (#20436 ) These tests were written while finalizing approach for making inflight state class thread safe but later approach changed and the lock is now always required by SharePartition to change inflight state. Hence these tests are incorrect and do not add any value. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-08-29 07:45:07 +01:00
Apoorv Mittal	6956417a3e	MINOR: Updated name from messages to records for consistency in share partition (#20416 ) Minor PR to update name of maxInFlightMessages to maxInFlightRecords to maintain consistency in share partition related classes. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-08-28 13:52:04 +01:00
Apoorv Mittal	c5d0ddd6f7	MINOR: Refactored gap window names in share partition (#20411 ) As per the suggestion by @adixitconfluent and @chirag-wadhwa5, [here](https://github.com/apache/kafka/pull/20395#discussion_r2300810004), I have refactored the code with variable and method names. Reviewers: Andrew Schofield <aschofield@confluent.io>, Chirag Wadhwa <cwadhwa@confluent.io>	2025-08-27 10:06:43 +01:00
Chang-Chi Hsu	c797f85de4	KAFKA-19642 Replace dynamicPerBrokerConfigs with dynamicDefaultConfigs (#20405 ) - Changes: Replace misused dynamicPerBrokerConfigs with dynamicDefaultConfigs - Reasons: KRaft servers don't handle the cluser-level configs in starting from: https://github.com/apache/kafka/pull/18949/files#r2296809389 Reviewers: Jun Rao <junrao@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com> --------- Co-authored-by: PoAn Yang <payang@apache.org>	2025-08-27 14:35:31 +08:00
Abhijeet Kumar	8d93d1096c	KAFKA-17108: Add EarliestPendingUpload offset spec in ListOffsets API (#16584 ) This is the first part of the implementation of [KIP-1023](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset) The purpose of this pull request is for the broker to start returning the correct offset when it receives a -6 as a timestamp in a ListOffsets API request. Added unit tests for the new timestamp. Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>	2025-08-27 08:34:31 +05:30
Ken Huang	a9b2a6d9b6	MINOR: Optimize the entriesWithoutErrorsPerPartition when errorResults is empty (#20410 ) If `errorResults` is empty, there’s no need to create a new `entriesPerPartition` map. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-08-27 03:11:35 +08:00
Apoorv Mittal	49ee1fb4f9	KAFKA-19632: Handle overlap batch on partition re-assignment (#20395 ) The PR fixes the batch alignment issue when partitions are re-assigned. During initial read of state the batches can be broken arbitrarily. Say the start offset is 10 and cache contains [15-18] batch during initialization. When fetch happens at offset 10 and say the fetched batch contain 10 records i.e. [10-19] then correct batches will be created if maxFetchRecords is greater than 10. But if maxFetchRecords is less than 10 then last offset of batch is determined, which will be 19. Hence acquire method will incorrectly create a batch of [10-19] while [15-18] already exists. Below check is required t resolve the issue: ``` if (isInitialReadGapOffsetWindowActive() && lastAcquiredOffset > lastOffset) { lastAcquiredOffset = lastOffset; } ``` While testing with other cases, other issues were determined while updating the gap offset, acquire of records prior share partitions end offset and determining next fetch offset with compacted topics. All these issues can arise mainly during initial read window after partition re-assignment. Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>, Chirag Wadhwa <cwadhwa@confluent.io>	2025-08-26 13:25:57 +01:00
Abhijeet Kumar	614bc3a19d	KAFKA-17344: Add empty replica FollowerFetch tests (#16884 ) Add Unit Tests for an empty follower fetch for various Leader states. \| TieredStorage Enabled \| Leader Log Start Offset \| Leader Local Log Start Offset \| Leader Log End Offset \| Remarks \| \|-----------------------\|-------------------------\|--------------------------------\|-----------------------\|---------------------------------------\| \| N \| 0 \| - \| 200 \| - \| \| N \| 10 \| - \| 200 \| - \| \| Y \| 0 \| 200 \| 200 \| No segments deleted locally \| \| Y \| 0 \| 200 \| 100 \| Segments uploaded and deleted locally \| \| Y \| 0 \| 200 \| 200 \| All segments deleted locally \| \| Y \| 10 \| 10 \| 200 \| No segments deleted locally \| \| Y \| 10 \| 100 \| 200 \| Segments uploaded and deleted locally \| \| Y \| 10 \| 200 \| 200 \| All segments deleted locally \| Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>	2025-08-26 14:19:11 +05:30
PoAn Yang	5bbc421a13	MINOR: update TransactionLog#readTxnRecordValue to initialize TransactionMetadata with non-empty topic partitions (#20370 ) This is followup PR for https://github.com/apache/kafka/pull/19699. * Update TransactionLog#readTxnRecordValue to initialize TransactionMetadata with non-empty topic partitions * Update `TxnTransitMetadata` comment, because it's not immutable. Reviewers: TengYao Chi <kitingiao@gmail.com>, Justine Olshan <jolshan@confluent.io>, Kuan-Po Tseng <brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-08-26 10:36:45 +08:00
Chang-Chi Hsu	c01723340f	MINOR: Migrate EligibleLeaderReplicasIntegrationTest to use new test infra (#20199 ) Changes: Use ClusterTest to rewrite EligibleLeaderReplicasIntegrationTest. Validation: Run the test 50 times locally with consistent success. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-08-24 01:35:20 +08:00
Chirag Wadhwa	def5f16c33	KAFKA-19630: Reordered OR operands in archiveRecords method for SharePartiton (#20391 ) As per the current implementation in archiveRecords, when LSO is updated, if we have multiple record batches before the new LSO, then only the first one gets archived. This is because of the following lines of code -> `isAnyOffsetArchived = isAnyOffsetArchived \|\| archivePerOffsetBatchRecords(inFlightBatch, startOffset, endOffset - 1, initialState);` `isAnyBatchArchived = isAnyBatchArchived \|\| archiveCompleteBatch(inFlightBatch, initialState);` The first record / batch will make `isAnyOffsetArchived` / `isAnyBatchArchived` true, after which this line of code will short-circuit and the methods `archivePerOffsetBatchRecords` / `archiveCompleteBatch` will not be called again. This PR changes the order of the expressions so that the short-circuit does not prevent from archiving all the required batches. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>	2025-08-22 09:23:12 +01:00
Jhen-Yung Hsu	eeb6a0d981	KAFKA-19618 the `record-size` and `throughput`arguments don't work in TestRaftServer (#20379 ) The `record-size` and `throughput` arguments don’t work in `TestRaftServer`. The `recordsPerSec` and `recordSize` values are always hard-coded. - Fix `recordsPerSec` and `recordSize` values hard-coded issue - Add "Required" description to command-line options to make it clear to users. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-08-22 01:43:52 +08:00
Yunchi Pang	4a5562c341	KAFKA-19306 Migrate LogCompactionTester to tools module (#19905 ) jira: [KAFKA-19306](https://issues.apache.org/jira/browse/KAFKA-19306) log ``` Producing 1000000 messages..to topics log-cleaner-test-849894102467800668-0 Logging produce requests to /tmp/kafka-log-cleaner-produced-6049271649847384547.txt Sleeping for 20seconds... Consuming messages... Logging consumed messages to /tmp/kafka-log-cleaner-consumed-7065252868189829937.txt 1000000 rows of data produced, 120176 rows of data consumed (88.0% reduction). De-duplicating and validating output files... Validated 90057 values, 0 mismatches. Data verification is completed ``` result ``` ================================================================================ SESSION REPORT (ALL TESTS) ducktape version: 0.12.0 session_id: 2025-07-10--001 run time: 1 minute 2.051 seconds tests run: 1 passed: 1 flaky: 0 failed: 0 ignored: 0 ================================================================================ test_id: kafkatest.tests.tools.log_compaction_test.LogCompactionTest.test_log_compaction.metadata_quorum=ISOLATED_KRAFT status: PASS run time: 1 minute 1.809 seconds ``` Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-08-18 02:49:06 +08:00
Ming-Yen Chung	cae9848160	KAFKA-19447 Replace PartitionState with PartitionRegistration in makeFollower/makeLeader (#20335 ) Follow-up to [KAFKA-18486](https://issues.apache.org/jira/browse/KAFKA-18486) * Replace PartitionState with PartitionRegistration in makeFollower/makeLeader * Remove PartitionState.java since it is no longer referenced Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-08-17 18:14:09 +08:00
Hong-Yi Chen	bf0e6ba700	KAFKA-19384 The passing of BrokerRegistrationRequestTest is a false positive (#20338 ) Fixes a false positive in `BrokerRegistrationRequestTest` caused by `isMigratingZkBroker`, and migrates the test from Scala to Java. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-08-17 01:19:10 +08:00
PoAn Yang	990cb5c06c	KAFKA-18884 Move TransactionMetadata to transaction-coordinator module (#19699 ) 1. Move TransactionMetadata to transaction-coordinator module. 2. Rewrite TransactionMetadata in Java. 3. The `topicPartitions` field uses `HashSet` instead of `Set`, because it's mutable field. 4. In Scala, when calling `prepare` methods, they can use current value as default input in `prepareTransitionTo`. However, in Java, it doesn't support function default input value. To avoid a lot of duplicated code or assign value to wrong field, we add a private class `TransitionData`. It can get current `TransactionMetadata` value as default value and `prepare` methods just need to assign updated value. Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits <alivshits@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2025-08-16 02:10:52 +08:00
Jhen-Yung Hsu	55260e9835	KAFKA-19042: Move AdminClientWithPoliciesIntegrationTest to clients-integration-tests module (#20339 ) This PR does the following: - Rewrite to new test infra. - Rewrite to java. - Move to clients-integration-tests. - Add `ensureConsistentMetadata` method to `ClusterInstance`, similar to `ensureConsistentKRaftMetadata` in the old infra, and refactors related code. Reviewers: TengYao Chi <frankvicky@apache.org>, Ken Huang <s7133700@gmail.com>	2025-08-15 17:44:47 +08:00
Robert Young	3067f15caf	KAFKA-19596: Improve visibility when topic auto-creation fails (#20340 ) Log a warning for each topic that failed to be created as a result of an automatic creation. This makes the underlying cause more visible so users can take action. Previously, at the default log level, you could only see logs that the broker was attempting to autocreate topics. If the creation failed, then it was logged at debug. Signed-off-by: Robert Young <robertyoungnz@gmail.com> Reviewers: Luke Chen <showuon@gmail.com>, Kuan-Po Tseng <brandboat@gmail.com>	2025-08-14 10:47:12 +08:00
Kevin Wu	92d8cb562a	KAFKA-19078 Automatic controller addition to cluster metadata partition (#19589 ) Add the `controller.quorum.auto.join.enable` configuration. When enabled with KIP-853 supported, follower controllers who are observers (their replica id + directory id are not in the voter set) will: - Automatically remove voter set entries which match their replica id but not directory id by sending the `RemoveVoterRPC` to the leader. - Automatically add themselves as a voter when their replica id is not present in the voter set by sending the `AddVoterRPC` to the leader. Reviewers: José Armando García Sancio [jsancio@apache.org](mailto:jsancio@apache.org), Chia-Ping Tsai [chia7712@gmail.com](mailto:chia7712@gmail.com)	2025-08-13 23:20:18 +08:00
Sanskar Jhajharia	dbf3808f53	MINOR: Add test coverage for StorageTool format command feature validation (#20303 ) ### Summary Adds comprehensive test coverage for the StorageTool format command feature validation, including tests for valid feature overrides, invalid feature detection, and multiple feature specifications. Also adds debug output to help with troubleshooting format operations. ### Changes Made #### Test Coverage Improvements - `testFormatWithReleaseVersionAndFeatureOverride()`: Tests that feature overrides work correctly when specified with `--feature` flag - `testFormatWithInvalidFeatureThrowsError()`: Tests error handling for unsupported features - `testFormatWithMultipleFeatures()`: Tests multiple feature specifications in a single format command #### Debug Output Enhancement - Formatter.java: Added debug output to print bootstrap metadata during format operations - Helps with troubleshooting format issues by showing the complete bootstrap metadata being written - Improves visibility into what features and configurations are being applied #### Test Updates - FormatterTest.java: Updated existing tests to account for the new debug output\ ### Related - KIP-1022: [Formatting and Updating Features ](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1022%3A+Formatting+and+Updating+Features) Reviewers: Kevin Wu <kevin.wu2412@gmail.com>, Justine Olshan <jolshan@confluent.io>	2025-08-12 12:51:39 -07:00
Chirag Wadhwa	43a25043dd	KAFKA-19567: Added the check for underlying partition being the leader in delayedShareFetch tryComplete method (#20280 ) In the current implementation, some delayed share fetch operations get trapped in the delayed share fetch purgatory when the partition leaderships change during share consumption. This is because there is no check in code to make sure the current broker is still the partition leader corresponding to the share partitions. So, when leadership changes, the share partitions cannot be acquired, because they have already been fenced, and tryComplete returns false. Although the operatio does get completed when the timer expires for it, but it is too late by then, and the operation get stuck in the watchers list waiting for it to get purged when estimated operations increase to more than 1000. This Pr resolves this by adding the required check so that if partition leadership changes, then the delayed share fetches waiting on it gets completed instantaneously. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>	2025-08-10 10:14:58 +01:00
Apoorv Mittal	dc96e29499	KAFKA-19476: Correcting max delivery on write state failure and lock timeout (#20310 ) Fixing max delivery check on acquisition lock timeout and write state RPC failure. When acquisition lock is already timed out and write state RPC failure occurs then we need to check if records need to be archived. However with the fix we do not persist the information, which is relevant as some records may be archived or delivery count is bumped. The information will be persisted eventually. The persister call has failed already hence issuing another persister call due to a failed persister call earlier is not correct. Rather let the data persist in future persister calls. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Abhinav Dixit <adixit@confluent.io>	2025-08-07 19:22:00 +01:00
Apoorv Mittal	ddab943b0b	KAFKA-18265: Move persister call outside of the lock (3/N) (#20316 ) Minor PR to move persister call outside of the lock. The lock is not required while making the persister call. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Abhinav Dixit <adixit@confluent.io>	2025-08-07 13:11:49 +01:00
Apoorv Mittal	f12a9d8413	KAFKA-19464: Remove unnecessary update for find next fetch offset (#20315 ) The PR removes unnecessary updates for find next fetch offset. When the state is in transition and not yet completed then anyways respective offsets should not be considered for acquisition. The find next fetch offset is updated finally when transition is completed. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Abhinav Dixit <adixit@confluent.io>	2025-08-07 13:11:07 +01:00
Lan Ding	71442bf42f	MINOR: cleanup in QuotaFactory (#20312 ) cleanup in QuotaFactory. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-08-07 00:05:45 +08:00
Federico Valeri	cd9dde11de	MINOR: Improve skip-record-metadata description (#20291 ) This flag also skips control records, so the description needs to be updated. --------- Signed-off-by: Federico Valeri <fedevaleri@gmail.com> Reviewers: Luke Chen <showuon@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Vincent Potucek	2025-08-05 08:50:50 +08:00
Chang-Chi Hsu	888861d803	MINOR: Replace boundPort with brokerBoundPort (#20297 ) ## Changes: - Replaced all references to boundPort with brokerBoundPort. ## Reasons - boundPort and brokerBoundPort share the same definition and behavior. Reviewers: TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-08-04 16:44:12 +08:00
Apoorv Mittal	05d71ad1a8	KAFKA-19476: Concurrent execution fixes for lock timeout and lso movement (#20286 ) CI / build (push) Has been cancelled Details The PR fixes following: 1. In case share partition arrive at a state which should be treated as final state of that batch/offset (example - LSO movement which causes offset/batch to be ARCHIVED permanently), the result of pending write state RPCs for that offset/batch override the ARCHIVED state. Hence track such updates and apply when transition is completed. 2. If an acquisition lock timeout occurs while an offset/batch is undergoing transition followed by write state RPC failure, then respective batch/offset can land in a scenario where the offset stays in ACQUIRED state with no acquisition lock timeout task. 3. If a timer task is cancelled, but due to concurrent execution of timer task and acknowledgement, there can be a scenario when timer task has processed post cancellation. Hence it can mark the offset/batch re-avaialble despite already acknowledged. Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>	2025-08-01 23:20:25 +01:00
Andrew Schofield	b909544e99	MINOR: Improve consistency of acknowledge type terminology (#20282 ) The code had a mixture of "acknowledgement type" and "acknowledge type". The latter is preferred. Reviewers: TengYao Chi <frankvicky@apache.org>, Lan Ding <isDing_L@163.com>	2025-08-01 21:17:22 +01:00
Now	dda1b5a4e8	MINOR: Fix duplicate 'to' in ExactlyOnceMessageProcessor javadoc (#20228 ) CI / build (push) Waiting to run Details Fixed a simple typo in javadoc comment where "to to" appeared instead of "to". _No functional changes_ Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-07-30 23:59:49 +08:00
Kevin Wu	1bcaa19c46	KAFKA-19489; Extra validation when formatting a node (#20136 ) CI / build (push) Waiting to run Details This PR adds a check to the storage tool's format command which throws a TerseFailure when the controller.quorum.voters config is defined and the node is formatted with the --standalone flag or the --initial-controllers flag. Without this check, it is possible to have two voter sets. For example, in a three node setup, the two nodes that formatted with --no-initial-controllers could form quorum with each other since they have the static voter set, and the --standalone node would ignore the config and read the voter set of itself from its log, forming its own quorum of 1. Reviewers: José Armando García Sancio <jsancio@apache.org>, TaiJuWu <tjwu1217@gmail.com>, Alyssa Huang <ahuang@confluent.io>	2025-07-30 10:58:08 -04:00

1 2 3 4 5 ...

6013 Commits