Commit Graph

265 Commits

Author SHA1 Message Date
Luke Chen 1b11fef5bb
KAFKA-17205: Allow topic config validation in controller level in KRaft mode (#16693)
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Christo Lolov <lolovc@amazon.com>
2024-07-30 17:07:09 +01:00
PoAn Yang a6b9407607
MINOR: add comment for correctness issue to LeaderEpochFileCache (#16660)
Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-24 23:00:41 +08:00
Chung, Ming-Yen 253b36113d
KAFKA-17179 Remove integration tag in class level when using ClusterTestExtensions (#16656)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-24 19:29:34 +08:00
TaiJuWu 4fa1c21940
KAFKA-17104 InvalidMessageCrcRecordsPerSec is not updated in validating LegacyRecord (#16558)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-24 19:25:11 +08:00
Kamal Chandraprakash 539f466ccb
KAFKA-17168 Remove the logPrefix to print the thread name (#16657)
Reviewers: Kuan-Po (Cooper) Tseng <brandboat@gmail.com>, Satish Duggana <satishd@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-24 16:54:02 +08:00
Ken Huang a012af5fb4
KAFKA-17149 Move ProducerStateManagerTest to storage module (#16645)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-23 20:08:08 +08:00
PoAn Yang 44a44b753f
KAFKA-17166 Use NoOpScheduler to rewrite LogManagerTest#testLogRecoveryMetrics (#16641)
Reviewers: Okada Haruki <ocadaruma@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-22 22:05:50 +08:00
Kuan-Po Tseng 7a8f89e0a0
KAFKA-17168 The logPrefix of RemoteLogReader/RemoteStorageThreadPool is not propagated correctly (#16642)
Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-21 22:19:15 +08:00
Kuan-Po Tseng e9a8c3c455
KAFKA-17153 KafkaMetricsGroup#newGauge should accept functional interface instead of `com.yammer.metrics.core.Gague` (#16618)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-21 18:17:04 +08:00
PoAn Yang cf9d517d71
KAFKA-17142 Fix deadlock caused by LogManagerTest#testLogRecoveryMetrics (#16614)
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-18 18:40:53 +08:00
Kuan-Po Tseng 94f5a4f63e
KAFKA-17135 Add unit test for `ProducerStateManager#readSnapshot` and `ProducerStateManager#writeSnapshot` (#16603)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-18 18:23:29 +08:00
Christo Lolov f369771bf2
KAFKA-16851: Add remote.log.disable.policy (#16132)
Add a remote.log.disable.policy on a topic-level only as part of KIP-950

Reviewers: Kamal Chandraprakash <kchandraprakash@uber.com>, Luke Chen <showuon@gmail.com>, Murali Basani <muralidhar.basani@aiven.io>
2024-07-10 11:18:48 +08:00
Kuan-Po Tseng d45596a2a1
MINOR: Move related getters to RemoteLogManagerConfig (#16538)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-08 01:00:42 +08:00
Colin Patrick McCabe ebaa108967
KAFKA-16968: Introduce 3.8-IV0, 3.9-IV0, 3.9-IV1
Create 3 new metadata versions:

- 3.8-IV0, for the upcoming 3.8 release.
- 3.9-IV0, to add support for KIP-1005.
- 3.9-IV1, as the new release vehicle for KIP-966.

Create ListOffsetRequest v9, which will be used in 3.9-IV0 to support KIP-1005. v9 is currently an unstable API version.

Reviewers: Jun Rao <junrao@gmail.com>, Justine Olshan <jolshan@confluent.io>
2024-06-27 14:03:03 -07:00
Murali Basani 87f8147ed0
KAFKA-16855 : Part 1 - New fields tieredEpoch and tieredState (#16257)
Add field tieredEpoch to RemoteLogSegmentMetadata
Update relevant tests
Add two fields tieredEpoch and tieredState to TopicRecord.json

Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>
2024-06-25 15:00:12 +01:00
Wang Xiaoqing 64702bcf6f
KAFKA-16862: Refactor ConsumerTaskTest to be deterministic and avoid tight loops (#16303)
Reviewers: Greg Harris <greg.harris@aiven.io>
2024-06-20 12:35:14 -07:00
Kamal Chandraprakash 4fe08f3b29
KAFKA-16976 Update the current/dynamic config inside RemoteLogManagerConfig (#16394)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-20 23:33:35 +08:00
Kamal Chandraprakash 8abeaf3cb4
KAFKA-15265: Reapply dynamic remote configs after broker restart (#16353)
The below remote log configs can be configured dynamically:
1. remote.log.manager.copy.max.bytes.per.second
2. remote.log.manager.fetch.max.bytes.per.second and
3. remote.log.index.file.cache.total.size.bytes

If those values are configured dynamically, then during the broker restart, it ensures the dynamic values are loaded instead of the static values from the config.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>
2024-06-18 09:39:35 +05:30
PoAn Yang a9d71d1312
KAFKA-16898 move TimeIndexTest and TransactionIndexTest to storage module (#16341)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-17 09:11:17 +08:00
gongxuanzhang 4e846038a6
KAFKA-10787 Apply spotless to `metadata` and `server` and `storage` module (#16297)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-16 05:28:50 +08:00
Omnia Ibrahim e99da2446c
KAFKA-15853: Move KafkaConfig.configDef out of core (#16116)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-14 17:26:00 +02:00
dujian0068 133f2b0f31
KAFKA-16879 SystemTime should use singleton mode (#16266)
Reviewers: Greg Harris <gharris1727@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-14 08:49:19 +08:00
gongxuanzhang 596b945072
KAFKA-16643 Add ModifierOrder checkstyle rule (#15890)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-13 15:39:32 +08:00
Ken Huang 05b1380ecb
KAFKA-16897 Move OffsetIndexTest and OffsetMapTest to storage module (#16244)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-13 06:24:23 +08:00
Kamal Chandraprakash f3dbd7ed08
KAFKA-16904: Metric to measure the latency of remote read requests (#16209)
Reviewers: Satish Duggana <satishd@apache.org>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>
2024-06-11 21:07:12 +05:30
Kamal Chandraprakash f359908fcd
KAFKA-15776: Support added to update remote.fetch.max.wait.ms dynamically (#16203)
Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>
2024-06-10 20:42:12 +05:30
Chia Chuan Yu e5b8712993
KAFKA-16885 Renamed the enableRemoteStorageSystem to isRemoteStorageSystemEnabled (#16256)
Reviewers: Kamal Chandraprakash <kchandraprakash@uber.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-10 02:14:15 +08:00
Kirk True d6cd83e2fb
KAFKA-16200: Enforce that RequestManager implementations respect user-provided timeout (#16031)
Improve consistency and correctness for user-provided timeouts at the Consumer network request layer, per the Java client Consumer timeouts design (https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts). While the changes introduced in KAFKA-15974 enforce timeouts at the Consumer's event layer, this change enforces timeouts at the network request layer.

The changes mostly fit into the following areas:

1. Create shared code and idioms so timeout handling logic is consistent across current and future RequestManager implementations
2. Use deadlineMs instead of expirationMs, expirationTimeoutMs, retryExpirationTimeMs, timeoutMs, etc.
3. Update "preemptive pruning" to remove expired requests that have had at least one attempt

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2024-06-07 09:53:27 +02:00
Murali Basani a41f7a4e13
KAFKA-16884 Refactor RemoteLogManagerConfig with AbstractConfig (#16199)
Reviewers: Greg Harris <gharris1727@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-07 00:06:25 +08:00
Kamal Chandraprakash 0ed104c3dc
MINOR: Cleanup the storage module unit tests (#16202)
- Use SystemTime instead of MockTime when time is not mocked
- Use static assertions to reduce the line length
- Fold the lines if it exceeds the limit
- rename tp0 to tpId0 when it refers to TopicIdPartition

Reviewers: Kuan-Po (Cooper) Tseng <brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-06 23:56:08 +08:00
Okada Haruki 3835515fea
KAFKA-16541 Fix potential leader-epoch checkpoint file corruption (#15993)
A patch for KAFKA-15046 got rid of fsync on LeaderEpochFileCache#truncateFromStart/End for performance reason, but it turned out this could cause corrupted leader-epoch checkpoint file on ungraceful OS shutdown, i.e. OS shuts down in the middle when kernel is writing dirty pages back to the device.

To address this problem, this PR makes below changes: (1) Revert LeaderEpochCheckpoint#write to always fsync
(2) truncateFromStart/End now call LeaderEpochCheckpoint#write asynchronously on scheduler thread
(3) UnifiedLog#maybeCreateLeaderEpochCache now loads epoch entries from checkpoint file only when current cache is absent

Reviewers: Jun Rao <junrao@gmail.com>
2024-06-06 15:10:13 +09:00
Kamal Chandraprakash 02c794dfd3
KAFKA-15776: Introduce remote.fetch.max.timeout.ms to configure DelayedRemoteFetch timeout (#14778)
KIP-1018, part1, Introduce remote.fetch.max.timeout.ms to configure DelayedRemoteFetch timeout

Reviewers: Luke Chen <showuon@gmail.com>
2024-06-05 14:42:23 +08:00
Kamal Chandraprakash cda2df5feb
KAFKA-16882 Migrate RemoteLogSegmentLifecycleTest to ClusterInstance infra (#16180)
- Removed the RemoteLogSegmentLifecycleManager
- Removed the TopicBasedRemoteLogMetadataManagerWrapper, RemoteLogMetadataCacheWrapper, TopicBasedRemoteLogMetadataManagerHarness and TopicBasedRemoteLogMetadataManagerWrapperWithHarness

Reviewers: Kuan-Po (Cooper) Tseng <brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-05 03:11:30 +08:00
Murali Basani 64c50a274b
KAFKA-16880 Update equals and hashcode methods for two attributes (#16173)
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-04 14:05:00 +08:00
Anatoly Popov 8a882a77a4
KAFKA-16105: Reset read offsets when seeking to beginning in TBRLMM (#15165)
Reviewers: Greg Harris <greg.harris@aiven.io>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2024-06-03 13:46:40 -07:00
Murali Basani 734d309a29
KAFKA-16852 Adding two thread pools kafka-16852 (#16154)
Reviewers: Christo Lolov <lolovc@amazon.com>, Chia-Ping Tasi <chia7712@gmail.com>
2024-06-03 09:52:54 +01:00
Kuan-Po (Cooper) Tseng 0c9c1d405d
MINOR: Fix missing wait topic finished in TopicBasedRemoteLogMetadataManagerRestartTest (#16171)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-03 03:59:56 +08:00
Kuan-Po (Cooper) Tseng b05f82d444
KAFKA-16785 Migrate TopicBasedRemoteLogMetadataManagerRestartTest to new test infra (#16170)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-02 22:18:53 +08:00
Kuan-Po (Cooper) Tseng 3d125a2322
MINOR: Add more unit tests to LogSegments (#16085)
add more unit tests to LogSegments and do some small refactor in LogSegments.java

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-31 16:07:38 +08:00
Abhijeet Kumar bb7db87f98
KAFKA-15265: Add Remote Log Manager quota manager (#15625)
Added the implementation of the quota manager that will be used to throttle copy and fetch requests from the remote storage. Reference KIP-956

Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kchandraprakash@uber.com>, Jun Rao <junrao@gmail.com>
2024-05-30 09:06:49 -07:00
Calvin Liu c8af740bd4
Improve producer ID expiration performance (#16075)
Skip using stream when expiring the producer ID. This can improve the performance significantly when the count is high.
Before

Benchmark                                        (numProducerIds)  Mode  Cnt      Score       Error  Units
ProducerStateManagerBench.testDeleteExpiringIds             10000  avgt    3    101.253 ±    28.031  us/op
ProducerStateManagerBench.testDeleteExpiringIds            100000  avgt    3   2297.219 ±  1690.486  us/op
ProducerStateManagerBench.testDeleteExpiringIds           1000000  avgt    3  30688.865 ± 16348.768  us/op
After

Benchmark                                        (numProducerIds)  Mode  Cnt     Score     Error  Units
ProducerStateManagerBench.testDeleteExpiringIds             10000  avgt    3    39.122 ±   1.151  us/op
ProducerStateManagerBench.testDeleteExpiringIds            100000  avgt    3   464.363 ±  98.857  us/op
ProducerStateManagerBench.testDeleteExpiringIds           1000000  avgt    3  5731.169 ± 674.380  us/op
Also, made a change to the JMH testing which excludes the producer ID populating from the testing.

Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-05-29 16:49:55 -07:00
Luke Chen 897cab2a61
KAFKA-16399: Add JBOD support in tiered storage (#15690)
After JBOD is supported in KRaft, we should also enable JBOD support in tiered storage. Unit tests and Integration tests are also added.

Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Igor Soarez <soarez@apple.com>, Mickael Maison <mickael.maison@gmail.com>
2024-05-29 15:30:18 +08:00
Kamal Chandraprakash 524ad1e14b
KAFKA-16452: Don't throw OOORE when converting the offset to metadata (#15825)
Don't throw OFFSET_OUT_OF_RANGE error when converting the offset to metadata, and next time the leader should increment the high watermark by itself after receiving fetch requests from followers. This can happen when checkpoint files are missing and being elected as a leader. 

Reviewers: Luke Chen <showuon@gmail.com>, Jun Rao <junrao@apache.org>
2024-05-27 17:44:23 +08:00
Mickael Maison e4e1116156
MINOR: Move Throttler to storage module (#16023)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-22 18:47:31 +02:00
PoAn Yang e93aae0664
KAFKA-16783: Migrate RemoteLogMetadataManagerTest to new test infra (#15983)
1. Replace TopicBasedRemoteLogMetadataManagerWrapperWithHarness with RemoteLogMetadataManagerTestUtils#builder in RemoteLogMetadataManagerTest.
2. Use ClusterTestExtention for RemoteLogMetadataManagerTest.

Signed-off-by: PoAn Yang <payang@apache.org>

Reviewers: Luke Chen <showuon@gmail.com>
2024-05-22 11:28:01 +08:00
Mickael Maison affe8da54c
KAFKA-7632: Support Compression Levels (KIP-390) (#15516)
Reviewers: Jun Rao <jun@confluent.io>,  Luke Chen <showuon@gmail.com>
Co-authored-by: Lee Dongjin <dongjin@apache.org>
2024-05-21 17:58:49 +02:00
PoAn Yang 9fe3932e5c
KAFKA-16784 Migrate TopicBasedRemoteLogMetadataManagerMultipleSubscriptionsTest to use ClusterTestExtensions (#15992)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-21 22:42:43 +08:00
Chia-Ping Tsai 2c51594607
MINOR: rewrite TopicBasedRemoteLogMetadataManagerTest by ClusterTestExtensions (#15917)
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Luke Chen <showuon@gmail.com>
2024-05-16 21:26:08 +08:00
Gaurav Narula a1c2c68db1
KAFKA-16712 Fix race in TopicBasedRemoteLogMetadataManagerMultipleSubscriptionsTest (#15962)
TopicBasedRemoteLogMetadataManagerMultipleSubscriptionsTest has a race when it sets RemoteLogMetadataTopicPartitioner using the setter.

This change fixes the race condition by passing the RemoteLogMetadataTopicPartitioner instance in a Function<Integer, RemoteLogMetaedataTopicPartitioner> which is used in configure() in TopicBasedRemoteLogMetadataManager.

It also improves the waitingFor condition by spying on RemotePartitionMetadataStore and awaiting on Phasers to ensure ConsumerManager makes progress before performing assertions.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-16 14:56:06 +08:00
Gaurav Narula eb5559a40e
KAFKA-16686 Wait for given offset in TopicBasedRemoteLogMetadataManagerTest (#15885)
Some tests in TopicBasedRemoteLogMetadataManagerTest flake because waitUntilConsumerCatchesUp may break early before consumer manager has caught up with all the events.

This PR adds an expected offsets for leader/follower metadataOffset partitions and ensures we wait for the offset to be at least equal to the argument to avoid flakyness.

Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-15 12:59:38 +08:00
Kamal Chandraprakash 576facfdf2
KAFKA-16696 Removed the in-memory implementation of RSM and RLMM (#15911)
Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-13 19:26:49 +08:00
Linu Shibu aeca384641
KAFKA-16356: Remove class-name dispatch in RemoteLogMetadataSerde (#15620)
Reviewers: Greg Harris <greg.harris@aiven.io>, Luke Chen <showuon@gmail.com>, Igor Soarez <soarez@apple.com>, The-Gamer-01 <19974361760@163.com>
2024-05-06 16:49:35 -07:00
Chia Chuan Yu 55a00be4e9
MINOR: Replaced Utils.join() with JDK API. (#15823)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-06 15:13:01 +08:00
Gaurav Narula 025f9816f1
MINOR: fix javadoc warnings (#15527)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-26 08:31:52 +08:00
PoAn Yang a38185280c
KAFKA-16424: remove truncated logs after alter dir (#15616)
If there are some logs to be deleted during the log dir movement, we'll send for a scheduler to do the deletion later.
However, when the log dir movement completed, the future log is renamed, the async log deletion will fail with no file existed error.

Signed-off-by: PoAn Yang <payang@apache.org>

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, SoontaekLim <soontaek.lim@neya.kr>, Johnny Hsu <johnnyhsu@fb.com>
2024-04-24 17:51:29 +08:00
Omnia Ibrahim cfe5ab5cf2
KAFKA-15853 Move quota configs into server-common package (#15774)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-04-24 13:05:18 +08:00
Cheng-Kai, Zhang b6e70e9a54
MINOR: Add test for PartitionMetadataFile (#15714)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-24 13:01:35 +08:00
Kamal Chandraprakash 18572f5f8f
MINOR: Reduce the time taken to execute the TieredStorage tests. (#15780)
Reduce the time taken to execute the TieredStorage tests

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-04-23 10:21:46 +08:00
Omnia Ibrahim ecb2dd4cdc
KAFKA-15853 Move KafkaConfig log properties and docs out of core (#15569)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Nikolay <nizhikov@apache.org>, Federico Valeri <fvaleri@redhat.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-04-20 04:14:23 +08:00
Josep Prat 8f2fca7bd8
MINOR: Use Parametrized types correctly in RemoteLogMetadataSerde (#13824)
RemoteLogMetadataSerde references RemoteLogMetadataTransform in a Raw
form. Given that the class is parametrized we should make use of it.

Signed-off-by: Josep Prat <josep.prat@aiven.io>

Reviewers:  Matthew de Detrich <matthew.dedetrich@aiven.io>, Mickael Maison <mickael.maison@gmail.com>
2024-04-19 09:04:27 +02:00
Omnia Ibrahim 8c0458861c
KAFKA-15853 Move KafkaConfig Replication properties and docs out of … (#15575)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-16 15:28:35 +08:00
Mickael Maison 3617dda9a5
MINOR: Various cleanups in storage (#15711)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-15 13:30:41 +02:00
Alok Thatikunta c034cf2953
MINOR: Fix incorrect Java equals comparison of Uuid by reference (#15707)
Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-04-13 20:55:48 +08:00
Omnia Ibrahim 61baa7ac6b
KAFKA-15853 Move transactions configs out of core (#15670)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-13 00:29:51 +08:00
Chia-Ping Tsai 9a6760f130
KAFKA-16310 ListOffsets doesn't report the offset with maxTimestamp a… (#15621)
We do iterate the records to find the offsetOfMaxTimestamp instead of returning the cached one when handling ListOffsetsRequest.MAX_TIMESTAMP, since it is hard to align all paths to get correct offsetOfMaxTimestamp. The known paths are shown below.

1. convertAndAssignOffsetsNonCompressed -> we CAN get correct offsetOfMaxTimestamp when validating all records
2. assignOffsetsNonCompressed -> ditto
3. validateMessagesAndAssignOffsetsCompressed -> ditto
4. validateMessagesAndAssignOffsetsCompressed#buildRecordsAndAssignOffsets -> ditto
5. appendAsFollow#append#analyzeAndValidateRecords -> we CAN'T get correct offsetOfMaxTimestamp as iterating all records is expensive when fetching records from leader
6. LogSegment#recover -> ditto

Reviewers: Jun Rao <junrao@gmail.com>
2024-04-10 11:36:07 +08:00
Erik van Oosten 8e61f04228
MINOR: Fix usage of none in javadoc (#15674)
- Use `Empty` instead of 'none' when referring to `Optional` values.
- `Headers.lastHeader` returns `null` when no header is found.
- Fix minor spelling mistakes.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-08 08:43:05 +08:00
Kamal Chandraprakash 2f733ac583
KAFKA-16161: Avoid empty remote metadata snapshot file in partition dir (#15636)
Avoid empty remote metadata snapshot file in partition dir

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Satish Duggana <satishd@apache.org>
2024-04-02 10:07:54 +08:00
Johnny Hsu bf3f088c94
KAFKA-16341 fix the LogValidator for non-compressed type (#15476)
- Fix the verifying logic. If it's LOG_APPEND_TIME, we choose the offset of the first record. Else, we choose the record with the maxTimeStamp.
- rename the shallowOffsetOfMaxTimestamp to offsetOfMaxTimestamp

Reviewers: Jun Rao <junrao@gmail.com>, Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-03-19 23:00:30 +08:00
Luke Chen 834efa6606
KAFKA-16342 fix getOffsetByMaxTimestamp for compressed records (#15474)
Fix getOffsetByMaxTimestamp for compressed records.

This PR adds:

1) For inPlaceAssignment case, compute the correct offset for maxTimestamp when traversing the batch records, and set to ValidationResult in the end, instead of setting to last offset always.

2) For not inPlaceAssignment, set the offsetOfMaxTimestamp for the log create time, like non-compressed, and inPlaceAssignment cases, instead of setting to last offset always.

3) Add tests to verify the fix.

Reviewers: Jun Rao <junrao@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2024-03-15 06:09:45 +08:00
Johnny Hsu 3fcaa9ccc0
MINOR: remove the copy constructor of LogSegment (#15488)
In the LogSegment, the copy constructor is only used in LogLoaderTest

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-10 03:06:41 +08:00
John Yu 554fa57af8
KAFKA-16209 : fetchSnapshot might return null if topic is created before v2.8 (#15444)
Change the function with a better way to deal with the NULL pointer exception.

Reviewers: Luke Chen <showuon@gmail.com>
2024-03-06 09:00:58 +08:00
Nikolay eea369af94
KAFKA-14588 Log cleaner configuration move to CleanerConfig (#15387)
In order to move ConfigCommand to tools we must move all it's dependencies which includes KafkaConfig and other core classes to java. This PR moves log cleaner configuration to CleanerConfig class of storage module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-05 18:11:56 +08:00
John Yu 1bb9a85174
MINOR: Remove the space between two words (#15439)
Remove the space between two words

Reviewers: Luke Chen <showuon@gmail.com>
2024-02-29 08:14:35 +08:00
Satish Duggana fc8b644e56
MINOR Removed unused CommittedOffsetsFile class. (#15209)
`CommittedOffsetsFile` can be introduced when it is required for enhancing TBRLMM to consume from a specific offset when snapshots are implemented.

Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>
2024-02-12 17:35:01 +05:30
Jorge Esteban Quilcate Otoya b25c96a915
KAFKA-16229: Fix slow expired producer id deletion (#15324)
Expiration of ProducerIds is implemented with a slow removal of map keys:
        producers.keySet().removeAll(keys);
Unnecessarily going through all producer ids and then throw all expired keys to be removed.
This leads to exponential time on worst case when most/all keys need to be removed:

Benchmark                                        (numProducerIds)  Mode  Cnt           Score            Error  Units
ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3        9164.043 ±      10647.877  ns/op
ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3      341561.093 ±      20283.211  ns/op
ProducerStateManagerBench.testDeleteExpiringIds             10000  avgt    3    44957983.550 ±    9389011.290  ns/op
ProducerStateManagerBench.testDeleteExpiringIds            100000  avgt    3  5683374164.167 ± 1446242131.466  ns/op
A simple fix is to use map#remove(key) instead, leading to a more linear growth:

Benchmark                                        (numProducerIds)  Mode  Cnt        Score         Error  Units
ProducerStateManagerBench.testDeleteExpiringIds               100  avgt    3     5779.056 ±     651.389  ns/op
ProducerStateManagerBench.testDeleteExpiringIds              1000  avgt    3    61430.530 ±   21875.644  ns/op
ProducerStateManagerBench.testDeleteExpiringIds             10000  avgt    3   643887.031 ±  600475.302  ns/op
ProducerStateManagerBench.testDeleteExpiringIds            100000  avgt    3  7741689.539 ± 3218317.079  ns/op
Flamegraph of the CPU usage at dealing with expiration when producers ids ~1Million:

Reviewers: Justine Olshan <jolshan@confluent.io>
2024-02-09 17:17:17 -08:00
Divij Vaidya 65424ab484
MINOR: New year code cleanup - include final keyword (#15072)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Sagar Rao <sagarmeansocean@gmail.com>
2024-01-11 17:53:35 +01:00
Christo Lolov d4f3bf93d3
KAFKA-16014: Implement RemoteLogSizeBytes (#15050)
This pull request aims to implement RemoteLogSizeBytes from KIP-963.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>,  Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>
2023-12-22 15:00:44 +08:00
Christo Lolov 1a97de2fe6
KAFKA-16002: Implement RemoteCopyLagSegments, RemoteDeleteLagBytes and RemoteDeleteLagSegments (#15005)
This pull request aims to implement RemoteCopyLagSegments, RemoteDeleteLagBytes and RemoteDeleteLagSegments from KIP-963.

Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2023-12-21 14:27:12 +08:00
Luke Chen 4e11de00a7
KAFKA-16014: Add RemoteLogMetadataCount metric (#15026)
Reviewers: Christo Lolov <lolovc@amazon.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Satish Duggana <satishd@apache.org>
2023-12-20 14:21:30 +05:30
Gantigmaa Selenge 7b21da9712
KAFKA-15158: Add metrics for RemoteDelete and BuildRemoteLogAuxState (#14375)
This PR implements part of KIP-963, specifically for adding new metrics.
The metrics added in this PR are:
    RemoteDeleteRequestsPerSec (emitted when expired log segments on remote storage being deleted)
    RemoteDeleteErrorsPerSec (emitted when failed to delete expired log segments on remote storage)
    BuildRemoteLogAuxStateRequestsPerSec (emitted when building remote log aux state for replica fetchers)
    BuildRemoteLogAuxStateErrorsPerSec (emitted when failed to build remote log aux state for replica fetchers)

Reviewers: Luke Chen <showuon@gmail.com>, Nikhil Ramakrishnan <ramakrishnan.nikhil@gmail.com>, Christo Lolov <lolovc@amazon.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>
2023-12-19 15:02:45 +08:00
Luke Chen c240993be2
KAFKA-16014: Add RemoteLogSizeComputationTime metric (#15021)
Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Christo Lolov <lolovc@amazon.com>
2023-12-18 21:39:43 +05:30
Christo Lolov a87e86e015
KAFKA-15883: Implement RemoteCopyLagBytes (#14832)
This pull request implements the first in the list of metrics in KIP-963: Additional metrics in Tiered Storage.

Since each partition of a topic will be serviced by its own RLMTask we need an aggregator object for a topic. The aggregator object in this pull request is BrokerTopicAggregatedMetric. Since the RemoteCopyLagBytes is a gauge I have introduced a new GaugeWrapper. The GaugeWrapper is used by the metrics collection system to interact with the BrokerTopicAggregatedMetric. The RemoteLogManager interacts with the BrokerTopicAggregatedMetric directly.

Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
2023-12-14 09:21:37 +08:00
Okada Haruki d71d0639d9
KAFKA-15046: Get rid of unnecessary fsyncs inside UnifiedLog.lock to stabilize performance (#14242)
While any blocking operation under holding the UnifiedLog.lock could lead to serious performance (even availability) issues, currently there are several paths that calls fsync(2) inside the lock
In the meantime the lock is held, all subsequent produces against the partition may block
This easily causes all request-handlers to be busy on bad disk performance
Even worse, when a disk experiences tens of seconds of glitch (it's not rare in spinning drives), it makes the broker to unable to process any requests with unfenced from the cluster (i.e. "zombie" like status)
This PR gets rid of 4 cases of essentially-unnecessary fsync(2) calls performed under the lock:
(1) ProducerStateManager.takeSnapshot at UnifiedLog.roll
I moved fsync(2) call to the scheduler thread as part of existing "flush-log" job (before incrementing recovery point)
Since it's still ensured that the snapshot is flushed before incrementing recovery point, this change shouldn't cause any problem
(2) ProducerStateManager.removeAndMarkSnapshotForDeletion as part of log segment deletion
This method calls Utils.atomicMoveWithFallback with needFlushParentDir = true internally, which calls fsync.
I changed it to call Utils.atomicMoveWithFallback with needFlushParentDir = false (which is consistent behavior with index files deletion. index files deletion also doesn't flush parent dir)
This change shouldn't cause problems neither.
(3) LeaderEpochFileCache.truncateFromStart when incrementing log-start-offset
This path is called from deleteRecords on request-handler threads.
Here, we don't need fsync(2) either actually.
On unclean shutdown, few leader epochs might be remained in the file but it will be handled by LogLoader on start-up so not a problem
(4) LeaderEpochFileCache.truncateFromEnd as part of log truncation
Likewise, we don't need fsync(2) here, since any epochs which are untruncated on unclean shutdown will be handled on log loading procedure

Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Justine Olshan <jolshan@confluent.io>, Jun Rao <junrao@gmail.com>
2023-11-29 09:43:44 -08:00
Kamal Chandraprakash 20b0bf063b
MINOR: Fix the flaky TBRLMM `testInternalTopicExists` test (#14840)
The internal topic creation is asynchronous so the test gets flaky. To fix the test flakiness and in this test I want to assert that doesTopicExist should return true when a topic exists, so created a dummy internal topic.

Reviewers: Luke Chen <showuon@gmail.com>, Jun Rao <jun@confluent.io>, Satish Duggana <satishd@apache.org>
2023-11-29 10:50:22 +08:00
Kamal Chandraprakash fade3d10ea
KAFKA-15047: Roll active segment when it breaches the retention policy (#14766)
Roll the active segment and offload it to remote storage once it breaches the retention time policy.

A segment is eligible for deletion once it gets uploaded to the remote storage. We have checks to allow only the passive segments to be uploaded, so the active segment never gets removed at all even if breaches the retention time. For low-throughput/stale topics, the active segment can hold the data beyond the configured retention time by the user.

Reviewers: Satish Duggana <satishd@apache.org>, Christo Lolov <lolovc@amazon.com>
2023-11-28 09:38:11 +05:30
Kamal Chandraprakash a42c846336
KAFKA-15241: Compute tiered copied offset by keeping the respective epochs in scope (#14787)
"findHighestRemoteOffset" does not take into account the leader-epoch end offset. This can cause log divergence between the local and remote log segments when there is unclean leader election.

To handle it correctly, the logic to find the highest remote offset can be updated to:
find-highest-remote-offset = min(end-offset-for-epoch-in-the-checkpoint, highest-remote-offset-for-epoch)

Discussion thread: https://github.com/apache/kafka/pull/14004#discussion_r1266864272

Reviewers: Satish Duggana <satishd@apache.org>, Christo Lolov <lolovc@amazon.com>
2023-11-27 09:10:46 +05:30
Jason Gustafson e905ef1edf
MINOR: Small LogValidator clean ups (#14697)
This patch contains a few small clean-ups in LogValidator and associated classes:

1. Set shallowOffsetOfMaxTimestamp consistently as the last offset in the
   batch for v2 compressed and non-compressed data.
2. Rename `RecordConversionStats` to `RecordValidationStats` since one of its
   fields `temporaryMemoryBytes` does not depend on conversion.
3. Rename `batchIndex` in `recordIndex` in loops over the records in each batch
   inside `LogValidator`.

Reviewers: Qichao Chu <5326144+ex172000@users.noreply.github.com>, Jun Rao <junrao@gmail.com>
2023-11-20 10:40:45 -08:00
Kamal Chandraprakash feee616f73
MINOR: Query before creating the internal remote log metadata topic (#14755)
When a node starts (or) restarts, then we send a CREATE_TOPICS request to the controller to create the internal __remote_log_metadata topic.

Topic creation event is costly and handled by the controller. During re-balance, the controller can have pending requests in its queue and can lead to CREATE_TOPICS timeout. Instead of firing the CREATE_TOPICS request when a node restarts, send a METADATA request (topic describe) which is handled by the least loaded node before sending a request to create the topic.

Reviewers: Satish Duggana <satishd@apache.org>, Christo Lolov <lolovc@amazon.com>
2023-11-20 14:50:11 +05:30
David Mao c6ea0a84ab
KAFKA-15780: Wait for consistent KRaft metadata when creating or deleting topics (#14695)
TestUtils.createTopicWithAdmin calls waitForAllPartitionsMetadata which waits for partition(s) to be present in each brokers' metadata cache. This is a sufficient check in ZK mode because the controller sends an LISR request before sending an UpdateMetadataRequest which means that the partition in the ReplicaManager will be updated before the metadata cache.

In KRaft mode, the metadata cache is updated first, so the check may return before partitions and other metadata listeners are fully initialized.

Testing:
Insert a Thread.sleep(100) in BrokerMetadataPublisher.onMetadataUpdate after

      // Publish the new metadata image to the metadata cache.
      metadataCache.setImage(newImage)
and run EdgeCaseRequestTest.testProduceRequestWithNullClientId and the test will fail locally nearly deterministically. After the change(s), the test no longer fails.

Reviewers: Justine Olshan <jolshan@confluent.io>
2023-11-06 17:07:56 -08:00
gongzhongqiang d682b15eeb
KAFKA-15769: Fix logging with exception trace (#14683)
Reviewers: Divij Vaidya <diviv@amazon.com>, hudeqi <1217150961@qq.com>
2023-11-06 11:02:05 +01:00
Alok Thatikunta eca8502990
KAFKA-14484: [1/N] Move PartitionMetadataFile to storage module (#14607)
This PR moves PartitionMetadataFile to the storage module.

Existing unit tests in UnifiedLogTest like testLogFlushesPartitionMetadataOnAppend should suffice.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
2023-11-01 09:40:45 -07:00
Kamal Chandraprakash 57fd8f4c36
KAFKA-15632: Drop the invalid remote log metadata events (#14576)
Reviewers: Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
2023-10-31 15:21:33 +05:30
Calvin Liu 8f8ad6db38
KAFKA-15582: Move the clean shutdown file to the storage package (#14603)
A follow-up change to move the clean shutdown file to the storage package.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
2023-10-30 16:27:40 -07:00
Jotaniya Jeel 4612fe42af
KAFKA-15481: Fix concurrency bug in RemoteIndexCache (#14483)
RemoteIndexCache has a concurrency bug which leads to IOException while fetching data from remote tier.

The bug could be reproduced as per the following order of events:-

Thread 1 (cache thread): invalidates the entry, removalListener is invoked async, so the files have not been renamed to "deleted" suffix yet.
Thread 2: (fetch thread): tries to find entry in cache, doesn't find it because it has been removed by 1, fetches the entry from S3, writes it to existing file (using replace existing)
Thread 1: async removalListener is invoked, acquires a lock on old entry (which has been removed from cache), it renames the file to "deleted" and starts deleting it
Thread 2: Tries to create in-memory/mmapped index, but doesn't find the file and hence, creates a new file of size 2GB in AbstractIndex constructor. JVM returns an error as it won't allow creation of 2GB random access file.

This commit fixes the bug by using EvictionListener instead of RemovalListener to perform the eviction atomically with the file rename. It handles the manual removal (not handled by EvictionListener) by using computeIfAbsent() and enforcing atomic cache removal & file rename.

Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Arpit Goyal 
<goyal.arpit.91@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2023-10-23 14:50:46 +02:00
Justine Olshan e8c8969330
KAFKA-15626: Replace verification guard object with an specific type (#14568)
I've added a new class with an incrementing atomic long to represent the verification guard. Upon creation of verification guard, we will increment this value and assign it to the guard.

The expected behavior is the same as the object guard, but with better debuggability with the string value and type safety (I found a type safety issue in the current code when implementing this)

Reviewers: Ismael Juma <ismael@juma.me.uk>, Artem Livshits <alivshits@confluent.io>
2023-10-20 14:26:20 -07:00
Arpit Goyal dc6a53e196
MINOR: Rename lock variable of the entry class (#14569)
The RemoteIndexCache has a variable lock and the child class also have a variable lock in the same class file. Renaming lock of the entry(child class) to avoid confusion.

Reviewers: Luke Chen <showuon@gmail.com>, hudeqi <1217150961@qq.com>
2023-10-18 18:20:55 +08:00
Ismael Juma 1073d434ec
KAFKA-14481: Move LogSegment/LogSegments to storage module (#14529)
A few notes:
* Delete a few methods from `UnifiedLog` that were simply invoking the related method in `LogFileUtils`
* Fix `CoreUtils.swallow` to use the passed in `logging`
* Fix `LogCleanerParameterizedIntegrationTest` to close `log` before reopening
* Minor tweaks in `LogSegment` for readability
 
For broader context on this change, please check:

* KAFKA-14470: Move log layer to storage module

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>
2023-10-16 06:37:30 -07:00
hudeqi b0b8693c72
KAFKA-15536: Dynamically resize remoteIndexCache (#14511)
Dynamically resize remoteIndexCache

Reviewers: Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2023-10-16 15:24:36 +08:00
Matthias J. Sax dc0f0db864
MINOR: fix typo (#14542)
Reviewers: Bruno Cadonna <bruno@confluent.io>
2023-10-13 09:28:00 -07:00
Arpit Goyal 99ce2e081c
KAFKA-15169: Added TestCase in RemoteIndexCache (#14482)
est Cases Covered

    1. Index Files already exist on disk but not in Cache i.e. RemoteIndexCache should not call remoteStorageManager to fetch it instead cache it from the local index file present.
    2. RSM returns CorruptedIndex File i.e. RemoteIndexCache should throw CorruptedIndexException instead of successfull execution.
    3. Deleted Suffix Indexes file already present on disk i.e. If cleaner thread is slow , then there is a chance of deleted index files present on the disk while in parallel same index Entry is invalidated. To understand more refer https://issues.apache.org/jira/browse/KAFKA-15169

Reviewers: Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
2023-10-11 10:58:17 +08:00