Commit Graph

276 Commits

Author SHA1 Message Date
PoAn Yang b4be178599
KAFKA-17393: Remove log.message.format.version/message.format.version (KIP-724) (#18267)
Based on [KIP-724](https://cwiki.apache.org/confluence/display/KAFKA/KIP-724%3A+Drop+support+for+message+formats+v0+and+v1), the `log.message.format.version` and `message.format.version` can be removed in 4.0.

These configs effectively a no-op with inter-broker protocol version 3.0 or higher
since Apache Kafka 3.0, so the impact should be minimal.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2024-12-21 15:35:15 -08:00
Ismael Juma fe56fc98fa
KAFKA-18269: Remove deprecated protocol APIs support (KIP-896, KIP-724) (#18218)
Included in this change:
1. Remove deprecated protocol api versions from json files.
3. Remove fields that are no longer used from json files (affects ListOffsets, OffsetCommit, DescribeConfigs).
4. Remove record down-conversion support from KafkaApis.
5. No longer return `Errors.UNSUPPORTED_COMPRESSION_TYPE` on the fetch path[1].
6. Deprecate `TopicConfig. MESSAGE_DOWNCONVERSION_ENABLE_CONFIG` and made the relevant
configs (`message.downconversion.enable` and `log.message.downcoversion.enable`) no-ops since
down-conversion is no longer supported. It was an oversight not to deprecate this via KIP-724.
7. Fix `shouldRetainsBufferReference` to handle null request schemas for a given version.
8. Simplify producer logic since it only supports the v2 record format now.
9. Fix tests so they don't exercise protocol api versions that have been removed.
10. Add upgrade note.

Testing:
1. System tests have a lot of failures, but those tests fail for trunk too and I didn't see any issues specific to this change - it's hard to be sure given the number of failing tests, but let's not block on that given the other testing that has been done (see below).
3. Java producers and consumers with version 0.9-0.10.1 don't have api versions support and hence they fail in an ungraceful manner: the broker disconnects and the clients reconnect until the relevant timeout is triggered.
4. Same thing seems to happen for the console producer 0.10.2 although it's unclear why since api versions should be supported. I will look into this separately, it's unlikely to be related to this PR.
5. Console consumer 0.10.2 fails with the expected error and a reasonable message[2].
6. Console producer and consumer 0.11.0 works fine, newer versions should naturally also work fine.
7. kcat 1.5.0 (based on librdkafka 1.1.0) produce and consume fail with a reasonable message[3][4].
8. kcat 1.6.0-1.7.0 (based on librdkafka 1.5.0 and 1.7.0 respectively) consume fails with a reasonable message[5].
9. kcat 1.6.0-1.7.0 produce works fine.
10. kcat 1.7.1  (based on librdkafka 1.8.2) works fine for consumer and produce.
11. confluent-go-client (librdkafka based) 1.8.2 works fine for consumer and produce.
12. I will test more clients, but I don't think we need to block the PR on that.

Note that this also completes part of KIP-724: produce v2 and lower as well as fetch v3 and lower are no longer supported.

Future PRs will remove conditional code that is no longer needed (some of that has been done in KafkaApis,
but only what was required due to the schema changes). We can probably do that in master only as it does
not change behavior.

Note that I did not touch `ignorable` fields even though some of them could have been
changed. The reasoning is that this could result in incompatible changes for clients
that use new protocol versions without setting such fields _if_ we don't manually
validate their presence. I will file a JIRA ticket to look into this carefully for each
case (i.e. if we do validate their presence for the appropriate versions, we can
set them to ignorable=false in the json file).

[1] We would return this error if a fetch < v10 was used and the compression topic config was set
to zstd, but we would not do the same for the case where zstd was compressed at the producer
level (the most common case). Since there is no efficient way to do the check for the common
case, I made it consistent for both by having no checks.
[2] ```org.apache.kafka.common.errors.UnsupportedVersionException: The broker is too new to support JOIN_GROUP version 1```
[3]```METADATA|rdkafka#producer-1| [thrd:main]: localhost:9092/bootstrap: Metadata request failed: connected: Local: Required feature not supported by broker (0ms): Permanent```
[4]```METADATA|rdkafka#consumer-1| [thrd:main]: localhost:9092/bootstrap: Metadata request failed: connected: Local: Required feature not supported by broker (0ms): Permanent```
[5] `ERROR: Topic test-topic [0] error: Failed to query logical offset END: Local: Required feature not supported by broker`

Reviewers: David Arthur <mumrah@gmail.com>
2024-12-20 19:52:00 -08:00
Ken Huang e8863c9ee2
KAFKA-18180: Move OffsetResultHolder to storage module (#18100)
Reviewers: Christo Lolov <lolovc@amazon.com>
2024-12-20 11:52:34 +00:00
Mickael Maison 3fafa096b1
KAFKA-18207: Serde for handling transaction records (#18136)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2024-12-19 21:39:09 +01:00
Kamal Chandraprakash 139e5b15a1
KAFKA-17928: Make remote log manager thread-pool configs dynamic (#17859)
- Disallow configuring -1 for copier and expiration thread pools dynamically

Co-authored-by: Peter Lee <peterxcli@gmail.com>

Reviewers: Peter Lee <peterxcli@gmail.com>, Satish Duggana <satishd@apache.org>
2024-12-14 13:14:05 +05:30
TengYao Chi b37b89c668
KAFKA-9366 Upgrade log4j to log4j2 (#17373)
This pull request replaces Log4j with Log4j2 across the entire project, including dependencies, configurations, and code. The notable changes are listed below:

1. Introduce Log4j2 Instead of Log4j
2. Change Configuration File Format from Properties to YAML
3. Adds warnings to notify users if they are still using Log4j properties, encouraging them to transition to Log4j2 configurations

Co-authored-by: Lee Dongjin <dongjin@apache.org>

Reviewers: Luke Chen <showuon@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 01:14:31 +08:00
Jason Taylor 3b1bd3812e
KAFKA-16368: Update remote.log.manager.* default thread pool values for KIP-1030 (#18137)
Reviewers: Divij Vaidya <diviv@amazon.com>
2024-12-12 23:51:26 +01:00
Mickael Maison 7591868aea
KAFKA-18179: Move AsyncOffsetReadFutureHolder to storage module (#18095)
Reviewers: Christo Lolov <lolovc@amazon.com>
2024-12-11 09:56:47 +00:00
Mickael Maison 4630628701
KAFKA-18144 Move the storage exceptions out of the core module (#18021)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-07 22:43:00 +08:00
Mickael Maison e255433374
KAFKA-18162 Move LocalLogTest to storage module (#18057)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-07 10:19:56 +08:00
Mickael Maison 417bd22a06
KAFKA-18163: Move VerificationGuardTest to storage module (#18058)
Reviewers: Justine Olshan <jolshan@confluent.io>
2024-12-06 10:18:07 -08:00
Ken Huang 2b43c49f51
KAFKA-18050 Upgrade the checkstyle version to 10.20.2 (#17999)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-05 10:59:18 +08:00
Kamal Chandraprakash 65fb274d29
KAFKA-17998: Fix the flaky OffloadAndTxnConsumeFromLeaderTest (#17959)
Reviewers: Satish Duggana <satishd@apache.org>
2024-12-02 16:59:16 +05:30
HYUNSANG HAN (한현상, Travis) 700bdd5fee
KAFKA-17997 Remove deprecated config log.message.timestamp.difference.max.ms (#17928)
Reviewers: Divij Vaidya <diviv@amazon.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-11-29 03:15:46 +08:00
Ken Huang c32a49549d
MINOR: Remove duplicate valid value in document (#17947)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-27 21:07:36 +08:00
Manikumar Reddy 3268435fd6
KAFKA-18013: Add AutoOffsetResetStrategy internal class (#17858)
- Deprecates OffsetResetStrategy enum
- Adds new internal class AutoOffsetResetStrategy
- Replaces all OffsetResetStrategy enum usages with AutoOffsetResetStrategy
- Deprecate old/Add new constructors to MockConsumer

 Reviewers: Andrew Schofield <aschofield@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2024-11-25 19:11:12 +05:30
David Arthur 48ff6a6b53
MINOR Fix a few test names (#17788)
Remove or update custom display names to make sure we actually include the test method as the first part of the display name.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Bill Bejeck <bill@confluent.io>
2024-11-13 13:28:38 -05:00
Ken Huang 6bc7be70d7
KAFKA-17922 add helper to ClusterInstance to create client component (#17666)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-13 09:39:15 +08:00
David Arthur edab667a9a
MINOR Quarantine some flaky tests (#17779)
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>
2024-11-12 19:34:44 -05:00
Kamal Chandraprakash 5914013219
KAFKA-17980: Introduce `isReady` API in RemoteLogMetadataManager (#17737)
- The isReady API in RemoteLogMetadataManager (RLMM) is used to denote whether the partition metadata is ready for remote storage operations. The plugin implementors can use this API to denote the partition status while bootstrapping the RLMM.
- Using this API, we are gracefully starting the remote log components. The segment copy, delete, and other operations that hits remote storage will be invoked once the metadata is ready for a given partition.
- See KIP-1105 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-1105%3A+Make+remote+log+manager+thread-pool+configs+dynamic) for more details.

Reviewers: Federico Valeri <fvaleri@redhat.com>, Satish Duggana <satishd@apache.org>
2024-11-12 22:47:48 +05:30
Kirk True 42ea29c421
KAFKA-17925 Convert Kafka Client integration tests to use KRaft (#17670)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-10 10:38:01 +08:00
Kamal Chandraprakash b9976437e1
KAFKA-16780: Txn consumer exerts pressure on remote storage when collecting aborted txns (#17659)
- KIP-1058 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-1058:+Txn+consumer+exerts+pressure+on+remote+storage+when+collecting+aborted+transactions)
- Unit and Integration tests added.

Reviewers: Divij Vaidya <diviv@amazon.com>
2024-11-08 14:49:09 +05:30
Linsiyuan9 af53758746
KAFKA-17814 Use `final` declaration to replace the suppression `this-escape` (#17613)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-03 15:00:02 +08:00
kevin-wu24 568b9e8a6c
KAFKA-17803: LogSegment#read should return the base offset of the batch that contains startOffset rather than startOffset (#17528)
Reviewers: Jose Sancio <jsancio@gmail.com>, Jun Rao <junrao@gmail.com>
2024-11-01 09:32:00 -07:00
Mickael Maison 6e88c10ed5
KAFKA-14483 Move LocalLog to storage module (#17587)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-28 20:41:46 +08:00
Federico Valeri 363bf3cab4
MINOR: Fix the valid values generated doc of the RLM thread pools (#17575)
This patch fixes the valid values generated doc of remote.log.manager.copier.thread.pool.size and remote.log.manager.expiration.thread.pool.size.

Signed-off-by: Federico Valeri <fedevaleri@gmail.com>

Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
2024-10-24 09:58:39 +08:00
Federico Valeri 84ab3b9a5c
KAFKA-17031: Make RLM thread pool configurations public and fix default handling (#17499)
According to KIP-950, remote.log.manager.thread.pool.size should be marked as deprecated and replaced by two new configurations: remote.log.manager.copier.thread.pool.size and remote.log.manager.expiration.thread.pool.size. Fix default handling so that -1 works as expected.

Reviewers: Luke Chen <showuon@gmail.com>, Gaurav Narula <gaurav_narula2@apple.com>, Satish Duggana <satishd@apache.org>, Colin P. McCabe <cmccabe@apache.org>
2024-10-21 10:39:11 -07:00
Eric Chang 6b28e81ba1
KAKFA-17173 move quota config params from KafkaConfig to QuotaConfig (#17505)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-19 18:01:06 +08:00
Anshul Goyal 5cf112dc39
KAFKA-17766: Fixing deadlock in TopicBasedRemoteLogMetadataManager (#17492)
KAFKA-17766: Issue Details: Inside TopicBasedRemoteLogMetadataManager::close, one thread(t1) is calling join on initializationThread thread after taking writeLock on "lock" object => t1 will wait for initializationThread to complete. Internally initializationThread is also using writeLock on "lock" object. This can cause deadlock in below situation

initializationThread is started
close has been invoked as part of a separate thread. But this thread is not yet scheduled by OS.
At line 430, initializationThread is preempted and OS has started running close thread. close takes writeLock and invoked join on initializationThread.
Now OS schedules initializationThread again and at line 433 this thread also tries to take writeLock. But since writeLock is already held by close thread => both are waiting on each other to complete. initializationThread will wait on close to release the writeLock, while close thread will wait for completion of initializationThread

Fix Details: We can avoid taking lock inside close() method as there no operations with any side effects. closing instance variable is of type AtomicBoolean => no race condition when updating it to true.

Co-authored-by: Anshul Goyal <anshul.goyal@broadcom.com>
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2024-10-18 17:43:47 +05:30
Linsiyuan9 76a1af984b
KAFKA-17746 Replace JavaConverters with CollectionConverters (#17451)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-14 17:13:20 +08:00
Chengyan a880c846ae
KAFKA-17722 Fix "this-escape" compiler warnings (BrokerTopicMetrics) for JDK 23 (#17439)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-13 00:04:28 +08:00
Gaurav Narula b03fe66cfe
KAFKA-17759 Remove Utils.mkSet (#17460)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-11 21:20:43 +08:00
Mickael Maison dcd3bbe592
KAFKA-17749: Fix Throttler metrics name (#17430)
Reviewers: Josep Prat <josep.prat@aiven.io>
2024-10-10 17:25:59 +02:00
TengYao Chi 9b62c861fa
KAFKA-17739 Clean up build.gradle to adopt the minimum Java version as 11 (#17426)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-10 14:22:38 +08:00
bboyleonp666 8be808ea4a
KAFKA-17285 Consider using `Utils.closeQuietly` to replace `CoreUtils.swallow` when handling Closeable objects (#16843)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-03 10:45:01 +08:00
Chia-Ping Tsai 979740b49d
KAFKA-17589 Move JUnit extensions to test-common module (#17318)
This patch completely removes the compile-time dependency on core for both test and main sources by introducing two new modules.

1) `test-common` include all the common test implementation code (including dependency on :core for BrokerServer, ControllerServer, etc)
2) `test-common:api` new sub-module that just includes interfaces including our junit extension

Reviewers: David Arthur <mumrah@gmail.com>
2024-10-03 10:28:37 +08:00
Apoorv Mittal 05366d2fa7
KAFKA-17626: Move common fetch related classes from storage to server-common (#17289)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>
2024-09-26 20:31:28 -04:00
xijiu 3e7080f295
KAFKA-17512 Move LogSegmentTest to storage module (#17174)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-09-25 01:11:31 +08:00
Mickael Maison f1c011a8b5
KAFKA-14482 Move LogLoader to storage module (#17042)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-09-17 00:37:49 +08:00
Kamal Chandraprakash 344d8a60af
KAFKA-15859 Make RemoteListOffsets call an async operation (#16602)
This is the part-2 of the KIP-1075

To find the offset for a given timestamp, ListOffsets API is used by the client. When the topic is enabled with remote storage, then we have to fetch the remote indexes such as offset-index and time-index to serve the query. Also, the ListOffsets request can contain the query for multiple topics/partitions.

The time taken to read the indexes from remote storage is non-deterministic and the query is handled by the request-handler threads. If there are multiple LIST_OFFSETS queries and most of the request-handler threads are busy in reading the data from remote storage, then the other high-priority requests such as FETCH and PRODUCE might starve and be queued. This can lead to higher latency in producing/consuming messages.

In this patch, we have introduced a delayed operation for remote list-offsets call. If the timestamp need to be searched in the remote-storage, then the request-handler threads will pass-on the request to the remote-log-reader threads. And, the request gets handled in asynchronous fashion.

Covered the patch with unit and integration tests.

Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-09-16 07:25:06 +08:00
xijiu 7a321f29a2
KAFKA-17513 Move LogSegmentsTest to storage module (#17173)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-09-15 00:26:07 +08:00
PoAn Yang d95e384146
KAFKA-17508: Adding some guard for fallback deletion logic (#17154)
In KAFKA-16424, we added a fallback logic to delete the logs, but the file has no parent. It'd be better we have some guard from it.

Signed-off-by: PoAn Yang <payang@apache.org>

Reviewers: Luke Chen <showuon@gmail.com>
2024-09-13 19:45:30 +08:00
Dmitry Werner 5fd7ce2ace
KAFKA-17414 Move RequestLocal to server-common module (#16986)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-09-04 16:12:20 +08:00
Mickael Maison 839431e591
KAFKA-17468 Move kafka/log/remote/quota classes to storage module (#17074)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-09-04 02:18:47 +08:00
Kamal Chandraprakash 88b9ff30ad
KAFKA-15859 Introduce remote.list.offsets.request.timeout.ms dynamic config (#17045)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-09-03 23:06:01 +08:00
Omnia Ibrahim f59d829381
KAFKA-15853 Move TransactionLogConfig and TransactionStateManagerConfig getters out of KafkaConfig (#16665)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-09-03 18:24:12 +08:00
Colin Patrick McCabe ca0cc355f6
KAFKA-12670: Support configuring unclean leader election in KRaft (#16866)
Previously in KRaft mode, we could request an unclean leader election for a specific topic using
the electLeaders API. This PR adds an additional way to trigger unclean leader election when in
KRaft mode via the static controller configuration and various dynamic configurations.

In order to support all possible configuration methods, we have to do a multi-step configuration
lookup process:

1. check the dynamic topic configuration for the topic.
2. check the dynamic node configuration.
3. check the dynamic cluster configuration.
4. check the controller's static configuration.

Fortunately, we already have the logic to do this multi-step lookup in KafkaConfigSchema.java.
This PR reuses that logic. It also makes setting a configuration schema in
ConfigurationControlManager mandatory. Previously, it was optional for unit tests.

Of course, the dynamic configuration can change over time, or the active controller can change
to a different one with a different configuration. These changes can make unclean leader
elections possible for partitions that they were not previously possible for. In order to address
this, I added a periodic background task which scans leaderless partitions to check if they are
eligible for an unclean leader election.

Finally, this PR adds the UncleanLeaderElectionsPerSec metric.

Co-authored-by: Luke Chen showuon@gmail.com

Reviewers: Igor Soarez <soarez@apple.com>, Luke Chen <showuon@gmail.com>
2024-08-28 14:13:20 -07:00
Mickael Maison b9fe9f532f
KAFKA-16972: Move BrokerTopicStats to storage module (#17003)
Reviewers: Luke Chen <showuon@gmail.com>
2024-08-27 11:39:37 +02:00
Kuan-Po Tseng 11966a209a
KAFKA-17360 local log retention ms/bytes "-2" is not treated correctly (#16932)
1) When the local.retention.ms/bytes is set to -2, we didn't replace it with the server-side retention.ms/bytes config, so the -2 local retention won't take effect.
2) When setting retention.ms/bytes to -2, we can notice this log message:

```
Deleting segment LogSegment(baseOffset=10045, size=1037087, lastModifiedTime=1724040653922, largestRecordTimestamp=1724040653835) due to local log retention size -2 breach. Local log size after deletion will be 13435280. (kafka.log.UnifiedLog) [kafka-scheduler-6]
```
This is not helpful for users. We should replace -2 with real retention value when logging.

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-25 19:46:15 +08:00
Mickael Maison e23172a48a
MINOR: Move OffsetCheckpointFile to storage module (#16917)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-20 16:29:24 +02:00