kafka

Commit Graph

Author	SHA1	Message	Date
Okada Haruki	1dad77615d	KAFKA-19407 Fix potential IllegalStateException when appending to timeIndex (#19972 ) ## Summary - Fix potential race condition in LogSegment#readMaxTimestampAndOffsetSoFar(), which may result in non-monotonic offsets and causes replication to stop. - See https://issues.apache.org/jira/browse/KAFKA-19407 for the details how it happen. Reviewers: Vincent PÉRICART <mauhiz@gmail.com>, Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-06-25 00:40:27 +08:00
Gaurav Narula	3c50e23f1f	KAFKA-19221 Propagate IOException on LogSegment#close (#19607 ) Log segment closure results in right sizing the segment on disk along with the associated index files. This is specially important for TimeIndexes where a failure to right size may eventually cause log roll failures leading to under replication and log cleaner failures. This change uses `Utils.closeAll` which propagates exceptions, resulting in an "unclean" shutdown. That would then cause the broker to attempt to recover the log segment and the index on next startup, thereby avoiding the failures described above. Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-06-11 01:10:37 +08:00
Ritika Reddy	3479ce793b	KAFKA-18202: Add rejection for non-zero sequences in TV2 (KIP-890) (#19902 ) This change handles rejecting non-zero sequences when there is an empty producerIDState with TV2. The scenario will be covered with the re-triable OutOfOrderSequence error. For Transactions V2 with empty state: ✅ Allow only sequence 0 is allowed for new producers or after state cleanup (new validation added) ❌ Don't allow any non-zero sequence is rejected with our specific error message ❌ Don't allow any epoch bumps still require sequence 0 (existing validation remains) For Transactions V1 with empty state: ✅ Allow ANY sequence number is allowed (0, 5, 100, etc.) ❌ Don't allow epoch bumps still require sequence 0 (existing validation) Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits <alivshits@confluent.io>	2025-06-06 09:23:10 -07:00
Ritika Reddy	cc25d217da	KAFKA-18042: Reject the produce request with lower producer epoch early (KIP-890) (#19844 ) CI / build (push) Waiting to run Details With the transaction V2, replica manager checks whether the incoming producer request produces to a partition belonging to a transaction. ReplicaManager figures this out by checking the producer epoch stored in the partition log. However, the current code does not reject the produce request if its producer epoch is lower than the stored producer epoch. It is an optimization to reject such requests earlier instead of sending an AddPartitionToTxn request and getting rejected in the response. Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits <alivshits@confluent.io>	2025-06-04 13:21:53 -07:00
Ken Huang	bcda92b5b9	KAFKA-19080 The constraint on segment.ms is not enforced at topic level (#19371 ) CI / build (push) Waiting to run Details The main issue was that we forgot to set `TopicConfig.SEGMENT_BYTES_CONFIG` to at least `1024 * 1024`, which caused problems in tests with small segment sizes. To address this, we introduced a new internal config: `LogConfig.INTERNAL_SEGMENT_BYTES_CONFIG`, allowing us to set smaller segment bytes specifically for testing purposes. We also updated the logic so that if a user configures the topic-level segment bytes without explicitly setting the internal config, the internal value will no longer be returned to the user. In addition, we removed `MetadataLogConfig#METADATA_LOG_SEGMENT_MIN_BYTES_CONFIG` and added three new internal configurations: - `INTERNAL_MAX_BATCH_SIZE_IN_BYTES_CONFIG` - `INTERNAL_MAX_FETCH_SIZE_IN_BYTES_CONFIG` - `INTERNAL_DELETE_DELAY_MILLIS_CONFIG` Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-25 20:57:22 +08:00
Hong-Yi Chen	69a457d8a5	KAFKA-19034 [1/N] Rewrite RemoteTopicCrudTest by ClusterTest and move it to storage module (#19681 ) CI / build (push) Waiting to run Details This PR rewrites `RemoteTopicCrudTest` in Java using the `@ClusterTest` framework and moves it to the `storage` module. Note: Two test cases have not yet been migrated - `testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic` - `testClusterWithoutTieredStorageStartsSuccessfullyIfTopicWithTieringDisabled` These tests rely on modifying broker configs during the test lifecycle, which `ClusterTest` currently does not support. They will be migrated in a follow-up PR after [#16808](https://github.com/apache/kafka/pull/16808) is merged, which introduces support for config updates in `ClusterTest`. Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-25 14:50:16 +08:00
Yu-Syuan Jheng	1407b12e2f	KAFKA-19313 Replace LogOffsetMetadata#UNIFIED_LOG_UNKNOWN_OFFSET by UnifiedLog.UNKNOWN_OFFSET (#19767 ) CI / build (push) Waiting to run Details Replaces the UNIFIED_LOG_UNKNOWN_OFFSET constant in LogOffsetMetadata with UnifiedLog.UNKNOWN_OFFSET. Reviewers: PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>, YuChia Ma <minecraftmiku831@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-24 23:33:26 +08:00
Lucas Brutschy	bff1602df3	KAFKA-19280: Fix NoSuchElementException in UnifiedLog (#19717 ) In FETCH requests and TXN_OFFSET_COMMIT requests, on current trunk we run into a race condition inside UnifiedLog, causing a `NoSuchElementException` in `UnifiedLog.fetchLastStableOffsetMetadata(UnifiedLog.java:651)`. The cause is that the line a performing an `isPresent` check on a volatile Optional before accessing it in `get`, leaving the door open to a race condition when the optional changes between `isPresent` and `get`. This change takes a copy of the volatile variable first.	2025-05-17 21:17:38 +02:00
Jhen-Yung Hsu	ced56a320b	MINOR: Move logDirs config out of KafkaConfig (#19579 ) CI / build (push) Waiting to run Details Follow up https://github.com/apache/kafka/pull/19460/files#r2062664349 Reviewers: Ismael Juma <ismael@juma.me.uk>, PoAn Yang <payang@apache.org>, TaiJuWu <tjwu1217@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-17 00:52:20 +08:00
Andrew Schofield	7ae9a26fc2	MINOR: Mark RemoteIndexCacheTest.testConcurrentRemoveReadForCache1 flaky (#19732 ) Marking flaky test as a result of 5% failure rate. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-05-16 09:03:08 +01:00
YuChia Ma	05169aa201	MINOR: Add deprecation warning for `log.cleaner.enable` and `log.cleaner.threads` (#19674 ) Add a warning message when using log.cleaner.enable to remind users that this configuration is deprecated. Also, add a warning message for log.cleaner.threads=0 because in version 5.0, the value must be greater than zero. Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang <payang@apache.org>, TengYao Chi <frankvicky@apache.org>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-15 00:18:27 +08:00
Xuan-Zhang Gong	15331bbfc4	KAFKA-19273 Ensure the delete policy is configured when the tiered storage is enabled (#19702 ) We updated the validation rule for cleanup.policy in remote storage mode. If remote log storage is enabled, only cleanup.policy=delete is allowed. Any other value (e.g. compact, compact,delete) will now result in a config validation error. Reviewers: Luke Chen <showuon@gmail.com>, Ken Huang <s7133700@gmail.com>, PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-15 00:13:55 +08:00
Kuan-Po Tseng	54fd1361e5	KAFKA-19264 Remove fallback for thread pool sizes in RemoteLogManagerConfig (#19673 ) The fallback mechanism for `remote.log.manager.copier.thread.pool.size` and `remote.log.manager.expiration.thread.pool.size` defaulting to `remote.log.manager.thread.pool.size` was introduced in KIP-950. This approach was abandoned in KIP-1030, where default values were changed from -1 to 10, and a configuration validator enforcing a minimum value of 1 was added. As a result, this commit removes the fallback mechanism from `RemoteLogManagerConfig.java` to align with the new defaults and validation. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-05-11 23:48:45 +08:00
Stanislav Kozlovski	0bc8d0c962	MINOR: Add documentation about KIP-405 remote reads serving just one partition per FetchRequest (#19336 ) [As discussed in the mailing list](https://lists.apache.org/thread/m03mpkm93737kk6d1nd6fbv9wdgsrhv9), the broker only fetches remote data for ONE partition in a given FetchRequest. In other words, if a consumer sends a FetchRequest requesting 50 topic-partitions, and each partition's requested offset is not stored locally - the broker will fetch and respond with just one partition's worth of data from the remote store, and the rest will be empty. Given our defaults for total fetch response is 50 MiB and per partition is 1 MiB, this can limit throughput. This patch documents the behavior in 3 configs - `fetch.max.bytes`, `max.partition.fetch.bytes` and `remote.fetch.max.wait.ms` Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Satish Duggana <satishd@apache.org>	2025-05-10 16:48:55 +05:30
Hong-Yi Chen	c4dc78746e	KAFKA-18537 Fix flaky RemoteIndexCacheTest#testCleanerThreadShutdown (#19628 ) Add a wait for cleaner thread shutdown in `testCleanerThreadShutdown` to eliminate flakiness. After calling `cache.close()`, the test now uses `TestUtils.waitForCondition` to poll until the background “remote-log-index-cleaner” thread has fully exited before asserting that no cleaner threads remain. This ensures the asynchronous shutdown always completes before the final assertions. Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-06 01:05:34 +08:00
Jhen-Yung Hsu	28ad4dd5c5	MINOR: Remove unnecessary Optional from offsetsToSnapshot (#19613 ) CI / build (push) Waiting to run Details Reviewers: PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-03 12:58:08 +08:00
PoAn Yang	965743c35b	KAFKA-19131: Adjust remote storage reader thread maximum pool size to avoid illegal argument (#19532 ) The remote storage reader thread pool use same count for both maximum and core size. If users adjust the pool size larger than original value, it throws `IllegalArgumentException`. Updated both value to fix the issue. --------- Signed-off-by: PoAn Yang <payang@apache.org> Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>	2025-04-25 15:36:17 +05:30
Xuan-Zhang Gong	18584b11ec	MINOR: ignore null judgement in LogCleaner (#19524 ) about https://github.com/apache/kafka/pull/19387/files#r2052025917 Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>, TengYao Chi <frankvicky@apache.org>	2025-04-21 21:22:56 +08:00
Mickael Maison	7710d1c951	KAFKA-14487: Move LogManager static methods/fields to storage module (#19302 ) Move the static fields/methods Reviewers: Luke Chen <showuon@gmail.com>	2025-04-21 12:03:30 +02:00
TaiJuWu	6e4e0df057	KAFKA-18891: Add KIP-877 support to RemoteLogMetadataManager and RemoteStorageManager (#19286 ) 1. Remove `RemoteLogManager#startup` and `RemoteLogManager#onEndpointCreated` 2. Move endpoint creation to `BrokerServer` 3. Move `RemoteLogMetadataManager#configure` and `RemoteLogStorageManager#configure` to RemoteLogManager constructor Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ken Huang <s7133700@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>	2025-04-18 15:04:37 +02:00
Kamal Chandraprakash	2cd733c9b3	KAFKA-17184: Fix the error thrown while accessing the RemoteIndexCache (#19462 ) For segments that are uploaded to remote, RemoteIndexCache caches the fetched offset, timestamp, and transaction index entries on the first invocation to remote, then the subsequent invocations are accessed from local. The remote indexes that are cached locally gets removed on two cases: 1. Remote segments that are deleted due to breach by retention size/time and start-offset. 2. The number of cached indexes exceed the remote-log-index-cache size limit of 1 GB (default). There are two layers of locks used in the RemoteIndexCache. First-layer lock on the RemoteIndexCache and the second-layer lock on the RemoteIndexCache#Entry. Issue 1. The first-layer of lock coordinates the remote-log reader and deleter threads. To ensure that the reader and deleter threads are not blocked on each other, we only take `lock.readLock()` when accessing/deleting the cached index entries. 2. The issue happens when both the reader and deleter threads took the readLock, then the deleter thread marked the index as `markedForCleanup`. Now, the reader thread which holds the `indexEntry` gets an IllegalStateException when accessing it. 3. This is a concurrency issue, where we mark the entry as `markedForCleanup` before removing it from the cache. See RemoteIndexCache#remove, and RemoteIndexCache#removeAll methods. 4. When an entry gets evicted from cache due to breach by maxSize of 1 GB, then the cache remove that entry before calling the evictionListener and all the operations are performed atomically by caffeine cache. Solution 1. When the deleter thread marks an Entry for deletion, then we rename the underlying index files with ".deleted" as suffix and add a job to the remote-log-index-cleaner thread which perform the actual cleanup. Previously, the indexes were not accessible once it was marked for deletion. Now, we allow to access those renamed files (from entry that is about to be removed and held by reader thread) until those relevant files are removed from disk. 2. Similar to local-log index/segment deletion, once the files gets renamed with ".deleted" as suffix then the actual deletion of file happens after `file.delete.delay.ms` delay of 1 minute. The renamed index files gets deleted after 30 seconds. 3. During this time, if the same index entry gets fetched again from remote, then it does not have conflict with the deleted entry as the file names are different. Reviewers: Satish Duggana <satishd@apache.org>	2025-04-18 16:43:37 +05:30
Mickael Maison	c73d97de0c	KAFKA-14523: Move kafka.log.remote classes to storage (#19474 ) Pretty much a straight forward move of these classes. I just updated `RemoteLogManagerTest` to not use `KafkaConfig` Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-04-17 11:05:14 +02:00
Ken Huang	b4e75fbab1	HOTFIX: add SuppressWarnings to TieredStorageTestUtils (#19494 ) We need add SuppressWarnings annotation, because `log.cleaner.enable` mark deprecated. Reviewers: PoAn Yang <payang@apache.org>, Kuan-Po Tseng <brandboat@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-04-17 11:10:59 +08:00
TengYao Chi	73afcc9b69	KAFKA-13610: Deprecate log.cleaner.enable configuration (#19472 ) JIRA: KAFKA-13610 This patch deprecates the `log.cleaner.enable` configuration. It's part of [KIP-1148](https://cwiki.apache.org/confluence/x/XAyWF). Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>	2025-04-16 10:27:44 +08:00
Mickael Maison	321a380d0a	KAFKA-14523: Decouple RemoteLogManager and Partition (#19391 ) Remove the last dependency in the core module. Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <poan.yang@suse.com>	2025-04-15 09:56:27 +02:00
Mickael Maison	d183cf9ac1	KAFKA-18172 Move RemoteIndexCacheTest to the storage module (#19469 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-04-15 15:53:41 +08:00
Dmitry Werner	7863b35064	KAFKA-14485: Move LogCleaner to storage module (#19387 ) Move LogCleaner and related classes to storage module and rewrite in Java. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Jun Rao <junrao@gmail.com>	2025-04-11 09:21:05 -07:00
Nick Guo	e69a311068	KAFKA-19076 replace `String` by `Supplier<String>` for UnifiedLog#maybeHandleIOException (#19392 ) jira: https://issues.apache.org/jira/browse/KAFKA-19076 the message is used when the function encounters error, so the error message should be created lazy. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-04-07 00:43:44 +08:00
Xuan-Zhang Gong	ab4a0f0ec1	MINOR: small optimization by judgment (#19386 ) judgments can help avoid unnecessary `segments.sizeInBytes()` loops from https://github.com/apache/kafka/pull/18393/files#r2029925512 Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>	2025-04-06 22:08:05 +08:00
TengYao Chi	74acbd200d	KAFKA-16758: Extend Consumer#close with an option to leave the group or not (#17614 ) JIRA: [KAFKA-16758](https://issues.apache.org/jira/browse/KAFKA-16758) This PR is aim to deliver [KIP-1092](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=321719077), please refer to KIP-1092 and KAFKA-16758 for further details. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>, Kirk True <kirk@kirktrue.pro>	2025-04-05 22:02:45 -07:00
Mickael Maison	08a93fe12a	KAFKA-14523: Move DelayedRemoteListOffsets to the storage module (#19285 ) Decouple RemoteLogManager and ReplicaManager. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-04-05 19:51:13 +08:00
Ken Huang	ef73fb921b	MINOR: Remove DeleteSegmentsByRetentionTimeTest#executeTieredStorageTest flaky annotation (#19301 ) This test was fixed by [this commit](https://github.com/apache/kafka/pull/18861) and hasn't failed for about two weeks. Therefore, we can remove the `@Flaky` annotation. Gradle report: https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=1743081652853&search.startTimeMin=1741795200000&search.tags=github%2Ctrunk&search.timeZoneId=Asia%2FTaipei&tests.container=org.apache.kafka.tiered.storage.integration.DeleteSegmentsByRetentionTimeTest Reviewers: TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-04-05 00:05:57 +08:00
Sanskar Jhajharia	03b1b720e9	MINOR: Cleanup Storage Module (#19072 ) Given that now we support Java 17 on our brokers, this PR replace the use of the following in storage module: - Collections.singletonList() and Collections.emptyList() with List.of() - Collections.singletonMap() and Collections.emptyMap() with Map.of() - Collections.singleton() and Collections.emptySet() with Set.of() - Arrays.asList() with List.of() Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-04-04 02:15:58 +08:00
TaiJuWu	f1bb29b93a	MINOR: migrate BrokerCompressionTest to storage module (#19277 ) There are two change for this PR. 1. Move `BrokerCompressionTest ` from core to storage 2. Rewrite `BrokerCompressionTest ` from scala to java Reviewers: TengYao Chi <kitingiao@gmail.com>, PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-04-03 22:43:42 +08:00
PoAn Yang	be80e3cb8a	KAFKA-18923: resource leak in RSM fetchIndex inputStream (#19111 ) Fix resource leak in RSM inputStream. Reviewers: Luke Chen <showuon@gmail.com>	2025-04-03 15:18:05 +08:00
PoAn Yang	4a5ae144ea	KAFKA-19032 Remove TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames (#19270 ) The zookeeper mode was removed in 4.0. The test cases don't need to specify quorum. Following variable and functions can be replaced: - TestWithParameterizedQuorumAndGroupProtocolNames - getTestQuorumAndGroupProtocolParametersClassicGroupProtocolOnly - getTestQuorumAndGroupProtocolParametersConsumerGroupProtocolOnly - getTestQuorumAndGroupProtocolParametersAll Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-03-30 02:11:07 +08:00
PoAn Yang	c125cc7dd1	KAFKA-19036 Rewrite LogAppendTimeTest and move it to storage module (#19282 ) Use Java to rewrite `LogAppendTimeTest` by new test infra and move it to storage module. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-03-29 03:14:53 +08:00
PoAn Yang	b9d5597b44	KAFKA-17830 Cover unit tests for TBRLMM init failure scenarios (#19076 ) Add unit tests for TBRLMM when initializing clients. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-03-27 18:20:02 +08:00
Dmitry Werner	84b8fec089	KAFKA-14486 Move LogCleanerManager to storage module (#19216 ) Move LogCleanerManager and related classes to storage module and rewrite in Java. Reviewers: TengYao Chi <kitingiao@gmail.com>, Jun Rao <junrao@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-03-27 12:35:38 +08:00
Jorge Esteban Quilcate Otoya	f24945b519	KAFKA-15931: Cancel RemoteLogReader gracefully (#19197 ) Reverts commit `2723dbf3a0` and `269e8892ad`. Instead of reopening the transaction index, it cancels the RemoteFetchTask without interrupting it--avoiding to close the TransactionIndex channel. This will lead to complete the execution of the remote fetch but ignoring the results. Given that this is considered a rare case, we could live with this. If it becomes a performance issue, it could be optimized. Reviewers: Jun Rao <junrao@gmail.com>	2025-03-20 10:20:44 -07:00
Ken Huang	31e1a57c41	KAFKA-18989 Optimize FileRecord#searchForOffsetWithSize (#19214 ) The `lastOffset` includes the entire batch header, so we should check `baseOffset` instead. To optimize this, we need to update the search logic. The previous approach simply checked whether each batch's `lastOffset()` was greater than or equal to the target offset. Once it found the first batch that met this condition, it returned that batch immediately. Now that we are using `baseOffset()`, we need to handle a special case: if the `targetOffset` falls between the `lastOffset` of the previous batch and the `baseOffset` of the matching batch, we should select the matching batch. The updated logic is structured as follows: 1. First, if baseOffset exactly equals targetOffset, return immediately. 2. If we find the first batch with baseOffset greater than targetOffset - Check if the previous batch contains the target - If there's no previous batch, return the current batch or the previous batch doesn't contain the target, return the current batch 5. After iterating through all batches, check if the last batch contains the target offset. This code path is not thread-safe, so we need to prevent `EOFException`. To avoid this exception, I am still using an early return. In this scenario, `lastOffset` is still used within the loop, but it should be executed at most once within the loop. Therefore, in the new implementation, `lastOffset` will be executed at most once. In most cases, this results in an optimization. Test: Verifying Memory Usage Improvement To evaluate whether this optimization helps, I followed the steps below to monitor memory usage: 1. Start a Standalone Kafka Server ```sh KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)" bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/server.properties bin/kafka-server-start.sh config/server.properties ``` 2. Use Performance Console Tools to Produce and Consume Records Produce Records: ```sh ./kafka-producer-perf-test.sh \ --topic test-topic \ --num-records 1000000000 \ --record-size 100 \ --throughput -1 \ --producer-props bootstrap.servers=localhost:9092 ``` Consume Records: ```sh ./bin/kafka-consumer-perf-test.sh \ --topic test-topic \ --messages 1000000000 \ --bootstrap-server localhost:9092 ``` It can be observed that memory usage has significantly decreased. trunk: ![CleanShot 2025-03-16 at 11 53 31@2x](https://github.com/user-attachments/assets/eec26b1d-38ed-41c8-8c49-e5c68643761b) this PR: ![CleanShot 2025-03-16 at 17 41 56@2x](https://github.com/user-attachments/assets/c8d4c234-18c2-4642-88ae-9f96cf54fccc) Reviewers: Kirk True <kirk@kirktrue.pro>, TengYao Chi <kitingiao@gmail.com>, David Arthur <mumrah@gmail.com>, Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-03-20 16:33:35 +08:00
Ken Huang	93499df6e9	KAFKA-18924 Running the storage module tests produces a storage/storage.log file (#19147 ) Change log4j2 logging directory to `build/kafka-storage-test/` Reviewers: Sanskar Jhajharia <sjhajharia@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2025-03-15 05:20:45 +08:00
Ken Huang	f50a17fa8d	KAFKA-18606: Flaky test DeleteSegmentsByRetentionTimeTest#executeTieredStorageTest (#18861 ) Jira: https://issues.apache.org/jira/browse/KAFKA-18606 This flaky test is caused by `23c459286b`, it modify the default `LOG_MESSAGE_TIMESTAMP_AFTER_MAX_MS_DEFAULT` from `Long.MAX_VALUE` to `3600000`(1 hour) and remove the remove the produce record timestamp `23c459286b (diff-d157da9c40cb386be02d1f917db9e5f6293cdbc82e45a39115bf8629fc19d55cL59)`. This test case is testing that data handling after the retention period (1ms) has expired then delete of related segmant. The old test https://github.com/apache/kafka/pull/16932 add `TimeUnit.DAYS.toMillis(1))` makes the third record expire, thus this test is flaky a lot due to this record. Reviewers: Jun Rao <jun@confluent.io>	2025-03-14 08:10:36 -07:00
Mickael Maison	759fbbba8b	KAFKA-14484: Move UnifiedLog to storage module (#19030 ) Rewrite UnifiedLog in Java Reviewers: Jun Rao <jun@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2025-03-13 10:49:55 +01:00
David Arthur	0ebc3e83c5	MINOR Mar 12 Flaky tests (#19190 ) Mark the following tests as flaky: * StickyAssignorTest > testLargeAssignmentAndGroupWithUniformSubscription * DeleteSegmentsByRetentionTimeTest * QuorumControllerTest > testUncleanShutdownBrokerElrEnabled Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-03-12 13:47:35 -04:00
Apoorv Mittal	f3da8f500e	KAFKA-18936: Fix share fetch when records are larger than max bytes (#19145 ) The PR fixes the behaviour when records are fetched which are larger than `fetch.max.bytes` config. The usage of `hardMaxBytesLimit` is in ReplicaManager where it decides whether to fetch a single record or not. The file records get sliced based on the bytes requested. However, if `hardMaxBytesLimit` is false then at least one record is fetched and bytes are adjusted accordingly in `localLog`. Reviewers: Jun Rao <junrao@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>	2025-03-12 09:03:35 +00:00
Jorge Esteban Quilcate Otoya	2723dbf3a0	MINOR: fix add missing @Test lost while rebasing (#19149 ) Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>	2025-03-10 22:19:01 +08:00
Lucas Brutschy	fc2e3dfce9	MINOR: Disallow unused local variables (#18963 ) Recently, we found a regression that could have been detected by static analysis, since a local variable wasn't being passed to a method during a refactoring, and was left unused. It was fixed in [`7a749b5`](`7a749b589f`), but almost slipped into 4.0. Unused variables are typically detected by IDEs, but this is insufficient to prevent these kinds of bugs. This change enables unused local variable detection in checkstyle for Kafka. A few notes on the usage: - There are two situations in which people actually want to have a local variable but not use it. First, there are `for (Type ignored: collection)` loops which have to loop `collection.length` number of times, but that do not use `ignored` in the loop body. These are typically still easier to read than a classical `for` loop. Second, some IDEs detect it if a return value of a function such as `File.delete` is not being used. In this case, people sometimes store the result in an unused local variable to make ignoring the return value explicit and to avoid the squiggly lines. - In Java 22, unsued local variables can be omitted by using a single underscore `_`. This is supported by checkstyle. In pre-22 versions, IntelliJ allows such variables to be named `ignored` to suppress the unused local variable warning. This pattern is often (but not consistently) used in the Kafka codebase. This is, however, not supported by checkstyle. Since we cannot switch to Java 22, yet, and we want to use automated detection using checkstyle, we have to resort to prefixing the unused local variables with `@SuppressWarnings("UnusedLocalVariable")`. We have to apply this in 11 cases across the Kafka codebase. While not being pretty, I'd argue it's worth it to prevent bugs like the one fixed in [`7a749b5`](`7a749b589f`). Reviewers: Andrew Schofield <aschofield@confluent.io>, David Arthur <mumrah@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <cadonna@apache.org>, Kirk True <ktrue@confluent.io>	2025-03-10 09:37:35 +01:00
Jorge Esteban Quilcate Otoya	269e8892ad	KAFKA-15931: Reopen TransactionIndex if channel is closed (#15241 ) Cached TransactionIndex may get closed if interrupted, causing following calls to always fail with ClosedChannelException, and forcing process to be restarted. In order to avoid this issue, a new method is exposed by TransactionIndex to validate state of channel; and index is reopened if closed. Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Nikhil Ramakrishnan <ramakrishnan.nikhil@gmail.com>	2025-03-07 19:31:21 +08:00
Xuan-Zhang Gong	45f932819e	KAFKA-18864:remove the Evolving tag from stable public interfaces (#19036 ) The purpose of this PR is to remove the `@InterfaceStability.Evolving` from classes that were created over a year ago. Reviewers: Jun Rao <junrao@gmail.com>	2025-02-28 13:24:24 -08:00

1 2 3 4 5 ...

353 Commits