Commit Graph

4395 Commits

Author SHA1 Message Date
Dongnuo Lyu 7bdd1a015e
KAFKA-15647: Fix the different behavior in error handling between the old and new group coordinator (#14589)
In `KafkaApis.scala`, we build the API response differently if exceptions are thrown during the API execution. Since the new group coordinator only populates the response with error code instead of throwing an exception when an error occurs, there may be different behavior between the existing group coordinator and the new one.

This patch:
- Fixes the response building in `KafkaApis.scala` for the two APIs affected by such difference -- OffsetFetch and OffsetDelete.
- In `GroupCoordinatorService.java`, returns a response with error code instead of a failed future when the coordinator is not active.

Reviewers: David Jacot <djacot@confluent.io>
2023-10-31 03:11:52 -07:00
Calvin Liu 8f8ad6db38
KAFKA-15582: Move the clean shutdown file to the storage package (#14603)
A follow-up change to move the clean shutdown file to the storage package.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
2023-10-30 16:27:40 -07:00
Kirk True 2e2f32c050
KAFKA-15628: Refactor ConsumerRebalanceListener invocation for reuse (#14638)
Straightforward refactoring to extract an inner class and methods related to `ConsumerRebalanceListener` for reuse in the KIP-848 implementation of the consumer group protocol. Also using `Optional` to explicitly mark when a `ConsumerRebalanceListener` is in use or not, allowing us to make some (forthcoming) optimizations when there is no listener to invoke.

Reviewers: David Jacot <djacot@confluent.io>
2023-10-30 11:51:30 -07:00
Igor Soarez 9dbee599f1
MINOR: Rename log dir UUIDs (#14517)
After a late discussion in the voting thread for KIP-858 we
decided to improve the names for the designated reserved
log directory UUID values.

Reviewers: Christo Lolov <lolovc@amazon.com>, Ismael Juma <ismael@juma.me.uk>,  Ziming Deng <dengziming1993@gmail.com>.
2023-10-30 19:10:57 +08:00
David Arthur 37715862d7 KAFKA-15704: Set missing ZkMigrationReady field on ControllerRegistrationRequest
This field was missed by the initial KIP-919 PR(s). The result is that migrations can't begin since
the controllers will never become ready. This patch fixes that as well as pulls over some fixes
from the 3.6 branch.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-10-27 14:16:24 -07:00
David Arthur 339d2556c6
KAFKA-15605: Fix topic deletion handling during ZK migration (#14545)
This patch adds reconciliation logic to migrating ZK brokers to deal with pending topic deletions as well as missed StopReplicas.

During the hybrid mode of the ZK migration, the KRaft controller is asynchronously sending UMR and LISR to the ZK brokers to propagate metadata. Since this process is essentially "best effort" it is possible for a broker to miss a StopReplicas. The new logic lets the ZK broker examine its local logs compared with the full set of replicas in a "Full" LISR. Any local logs which are not present in the set of replicas in the request are removed from ReplicaManager and marked as "stray".

To avoid inadvertent data loss with this new behavior, the brokers do not delete the "stray" partitions. They will rename the directories and log warning messages during log recovery. It will be up to the operator to manually delete the stray partitions. We can possibly enhance this in the future to clean up old stray logs.

This patch makes use of the previously unused Type field on LeaderAndIsrRequest. This was added as part of KIP-516 but never implemented. Since its introduction, an implicit 0 was sent in all LISR. The KRaft controller will now send a value of 2 to indicate a full LISR (as specified by the KIP). The presence of this value acts as a trigger for the ZK broker to perform the log reconciliation.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-10-26 18:13:52 -04:00
hudeqi b559942c17
KAFKA-15671: Fix flaky test RemoteIndexCacheTest.testClearCacheAndIndexFilesWhenResizeCache (#14622)
Reviewers: Divij Vaidya <diviv@amazon.com>

---------

Co-authored-by: Deqi Hu <deqi.hu@shopee.com>
2023-10-25 11:18:55 +02:00
dengziming 03ea24aa1d
MINOR: Fix flaky testFollowerCompleteDelayedFetchesOnReplication (#14616)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-10-25 09:39:29 +08:00
Nikolay e0121a38b1
MINOR: Deduplicating ConsumerGroupCommand print formating (#14610)
ConsumerGroupCommand contains code duplications for table row format.
This PR reduces code duplication and make it more clear and easy to understand.

Reviewers: Luke Chen <showuon@gmail.com>, hudeqi <1217150961@qq.com>
2023-10-24 15:16:32 +08:00
Jotaniya Jeel 4612fe42af
KAFKA-15481: Fix concurrency bug in RemoteIndexCache (#14483)
RemoteIndexCache has a concurrency bug which leads to IOException while fetching data from remote tier.

The bug could be reproduced as per the following order of events:-

Thread 1 (cache thread): invalidates the entry, removalListener is invoked async, so the files have not been renamed to "deleted" suffix yet.
Thread 2: (fetch thread): tries to find entry in cache, doesn't find it because it has been removed by 1, fetches the entry from S3, writes it to existing file (using replace existing)
Thread 1: async removalListener is invoked, acquires a lock on old entry (which has been removed from cache), it renames the file to "deleted" and starts deleting it
Thread 2: Tries to create in-memory/mmapped index, but doesn't find the file and hence, creates a new file of size 2GB in AbstractIndex constructor. JVM returns an error as it won't allow creation of 2GB random access file.

This commit fixes the bug by using EvictionListener instead of RemovalListener to perform the eviction atomically with the file rename. It handles the manual removal (not handled by EvictionListener) by using computeIfAbsent() and enforcing atomic cache removal & file rename.

Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Arpit Goyal 
<goyal.arpit.91@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2023-10-23 14:50:46 +02:00
Gantigmaa Selenge 84a58d75bb
KAFKA-15566: Fix test FetchRequestTest.testLastFetchedEpochValidation for KRaft mode (#14563)
Fix test FetchRequestTest.testLastFetchedEpochValidation for KRaft mode
The test fails due to unexpected error (OFFSET_OUT_OF_RANGE) when enabled with KRaft mode.
The reason it takes longer to set the leader epoch in KRaft mode is because of the way the topic partitions are created differently than Zookeeper. In Zookeeper mode, we create the topic partitions directly with Zookeeper therefore seem to take less time to create the logs and set leader epoch on broker. In KRaft mode, we use Admin client to create topic partitions. Even though the test waits for topic partitions to get created and appear in metadata cache, it doesn’t seem to be sufficient time for leader epoch to get set on the brokers.

Reviewers: Luke Chen <showuon@gmail.com>, dengziming <dengziming1993@gmail.com>
2023-10-23 11:05:57 +08:00
Justine Olshan e8c8969330
KAFKA-15626: Replace verification guard object with an specific type (#14568)
I've added a new class with an incrementing atomic long to represent the verification guard. Upon creation of verification guard, we will increment this value and assign it to the guard.

The expected behavior is the same as the object guard, but with better debuggability with the string value and type safety (I found a type safety issue in the current code when implementing this)

Reviewers: Ismael Juma <ismael@juma.me.uk>, Artem Livshits <alivshits@confluent.io>
2023-10-20 14:26:20 -07:00
hudeqi 21ebbe6b28
MINOR:Remove unused method parameter in ConsumerGroupCommand (#14585)
In ConsumerGroupCommand, there are two methods: getLogEndOffsets and getLogStartOffsets, the first parameter groupId is not used, so remove it.

Reviewers: Luke Chen <showuon@gmail.com>
2023-10-20 10:05:47 +08:00
Gantigmaa Selenge 486d5f6c64
KAFKA-15566: Fix flaky tests in FetchRequestTest.scala in KRaft mode (#14573)
Fixed some of the failing tests in FetchRequestTest.

testFetchWithPartitionsWithIdError and testCreateIncrementalFetchWithPartitionsInErrorV12 fail with the following error when enabled with KRaft mode. These tests only fail sometimes when running locally but consistently failed when running in the Jenkins Pipeline.

Tests will call the utility function TestUtils.waitUntilLeaderIsKnown after creating the topic partitions so that they wait for the logs to be created on the leader before sending fetch requests.

Enabled all tests except checkLastFetchedEpochValidation with KRaft mode.
Looking at the build history in Jenkins, all the other tests except these 2 tests and checkLastFetchedEpochValidation were passing when they were enabled with KRaft mode. Therefore enabled them with KRaft mode again but left checkLastFetchedEpochValidation to be investigated further.

Reviewers: Luke Chen <showuon@gmail.com>, dengziming <dengziming1993@gmail.com>
2023-10-20 09:59:21 +08:00
Calvin Liu af747fbfed
KAFKA-15581: Introduce ELR (#14312)
This patch introduces preliminary changes for Eligible Leader Replicas (KIP-966)

* New MetadataVersion 16 (3.7-IV1)
* New record versions for PartitionRecord and PartitionChangeRecord
* New tagged fields on PartitionRecord and PartitionChangeRecord
* New static config "eligible.leader.replicas.enable" to gate the whole feature

Reviewers: Artem Livshits <alivshits@confluent.io>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2023-10-19 14:05:15 -04:00
Calvin Liu 14029e2ddd
KAFKA-15582: Identify clean shutdown broker (#14465)
The PR includes:

* Added a new class of CleanShutdownFile which helps write and read from a clean shutdown file.
* Updated the BrokerRegistration API.
* Client side handling for the broker epoch.
* Minimum work on the controller side.

Reviewers: Jun Rao <junrao@gmail.com>
2023-10-19 10:25:23 -07:00
Apoorv Mittal 36abc8dcea
KAFKA-15604: Telemetry API request and response schemas and classes (KIP-714) (#14554)
Initial PR for [KIP-714](https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability) - [KAFKA-15601](https://issues.apache.org/jira/browse/KAFKA-15601).

This PR defines json request and response schemas for the new Telemetry APIs and implements the corresponding java classes.

Reviewers: 
Andrew Schofield <andrew_schofield@uk.ibm.com>, Kirk True <ktrue@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Walker Carlson <wcarlson@apache.org>
2023-10-19 10:55:21 -05:00
Mickael Maison 8aee297669
MINOR: Various Java cleanups in core (#14561)
Reviewers: Josep Prat <josep.prat@aiven.io>
2023-10-18 11:49:25 +02:00
Matthias J. Sax 9b468fb278
MINOR: Do not end Javadoc comments with `**/` (#14540)
Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bill@confluent.io>, Hao Li <hli@confluent.io>, Josep Prat <josep.prat@aiven.io>
2023-10-17 21:11:04 -07:00
Jeff Kim abee8f711c
KAFKA-14519; [1/N] Implement coordinator runtime metrics (#14417)
Implements the following metrics:

kafka.server:type=group-coordinator-metrics,name=num-partitions,state=loading
kafka.server:type=group-coordinator-metrics,name=num-partitions,state=active
kafka.server:type=group-coordinator-metrics,name=num-partitions,state=failed
kafka.server:type=group-coordinator-metrics,name=event-queue-size
kafka.server:type=group-coordinator-metrics,name=partition-load-time-max
kafka.server:type=group-coordinator-metrics,name=partition-load-time-avg
kafka.server:type=group-coordinator-metrics,name=thread-idle-ratio-min
kafka.server:type=group-coordinator-metrics,name=thread-idle-ratio-avg
The PR makes these metrics generic so that in the future the transaction coordinator runtime can implement the same metrics in a similar fashion.

Also, CoordinatorLoaderImpl#load will now return LoadSummary which encapsulates the start time, end time, number of records/bytes.

Co-authored-by: David Jacot <djacot@confluent.io>

Reviewers:  Ritika Reddy <rreddy@confluent.io>, Calvin Liu <caliu@confluent.io>, David Jacot <djacot@confluent.io>, Justine Olshan <jolshan@confluent.io>
2023-10-17 16:06:23 -07:00
Mickael Maison 9d04c7a045
MINOR: Various Scala cleanups in core (#14558)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2023-10-17 12:04:14 +02:00
Omnia G.H Ibrahim 9af1e74b5e
KAFKA-14596: Move TopicCommand to tools (#13201)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
2023-10-17 11:40:15 +02:00
Ismael Juma 69e591db3a
MINOR: Rewrite/Move KafkaNetworkChannel to the `raft` module (#14559)
This is now possible since `InterBrokerSend` was moved from `core` to `server-common`.
Also rewrite/move `KafkaNetworkChannelTest`.

The scala version of `KafkaNetworkChannelTest` passed with the changes here (before I
deleted it).

Reviewers: Justine Olshan <jolshan@confluent.io>, José Armando García Sancio <jsancio@users.noreply.github.com>
2023-10-16 20:10:31 -07:00
dengziming 5c9db5e735
KAFKA-15390: Do not return fenced broker in FetchResponse.preferredReplica (#14272)
Do not return fenced brokers from metadataCache.getPartitionReplicaEndpoints, since that could lead to
them getting used as preferred read replicas.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-10-16 15:08:40 -07:00
Ismael Juma 1073d434ec
KAFKA-14481: Move LogSegment/LogSegments to storage module (#14529)
A few notes:
* Delete a few methods from `UnifiedLog` that were simply invoking the related method in `LogFileUtils`
* Fix `CoreUtils.swallow` to use the passed in `logging`
* Fix `LogCleanerParameterizedIntegrationTest` to close `log` before reopening
* Minor tweaks in `LogSegment` for readability
 
For broader context on this change, please check:

* KAFKA-14470: Move log layer to storage module

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>
2023-10-16 06:37:30 -07:00
hudeqi b0b8693c72
KAFKA-15536: Dynamically resize remoteIndexCache (#14511)
Dynamically resize remoteIndexCache

Reviewers: Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2023-10-16 15:24:36 +08:00
Ismael Juma 4cf86c5d2f
KAFKA-15492: Upgrade and enable spotbugs when building with Java 21 (#14533)
Spotbugs was temporarily disabled as part of KAFKA-15485 to support Kafka build with JDK 21. This PR upgrades the spotbugs version to 4.8.0 which adds support for JDK 21 and enables it's usage on build again.

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-10-12 14:09:10 +02:00
Jeff Kim 7b5d640cc6
KAFKA-14987; Implement Group/Offset expiration in the new coordinator (#14467)
This patch implements the groups and offsets expiration in the new group coordinator.

Reviewers: Ritika Reddy <rreddy@confluent.io>, David Jacot <djacot@confluent.io>
2023-10-11 23:45:13 -07:00
dengziming 674285b86b
MINOR: Disable flaky kraft-test in FetchRequestTest (#14525)
We introduced a bunch of flaky tests in #14295 , which are normal when running locally but will always fail in CI, lets rollback them unless we find the cause before the end of today.

Reviewers: Luke Chen <showuon@gmail.com>, Justine Olshan <jolshan@confluent.io>
2023-10-11 15:22:10 -07:00
Calvin Liu d46781d4db
KAFKA-15221; Fix the race between fetch requests from a rebooted follower. (#14053)
A race can happen in the following sequence.
1. Stale Fetch triggers the ISR expansion.
2. The first time we check whether the replica is eligible. Catch up? Yes. broker epoch match? Yes (the metadata cache update has not happened)
3. Metadata cache update happens.
4. During the second time check the eligibility
    a. Catch up? Yes
    b. A new fetch request comes in. It cancels the replica caught-up and updates the broker epoch
    c. broker epoch match? Yes. New fetch epoch = new metadata cache epoch
5. Send an AlterPartition request with the new broker epoch.
----------------
The solution is to make sure that the 4.a) ,4.c) and 5) use the same replica state.

Reviewers: David Mao <47232755+splett2@users.noreply.github.com>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>
2023-10-11 09:44:36 -07:00
Gantigmaa Selenge 3c9031c624
KAFKA-15507: Make AdminClient throw non-retriable exception for a new call while closing (#14455)
AdminClient will throw IllegalStateException instead of TimeoutException if it receives new calls while closing down. This is more consistent with how Consumer and Producer clients handle new calls after closed down.

Reviewers: Luke Chen <showuon@gmail.com>, Kirk True <kirk@kirktrue.pro>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, vamossagar12 <sagarmeansocean@gmail.com>
2023-10-11 11:41:46 +08:00
Arpit Goyal 99ce2e081c
KAFKA-15169: Added TestCase in RemoteIndexCache (#14482)
est Cases Covered

    1. Index Files already exist on disk but not in Cache i.e. RemoteIndexCache should not call remoteStorageManager to fetch it instead cache it from the local index file present.
    2. RSM returns CorruptedIndex File i.e. RemoteIndexCache should throw CorruptedIndexException instead of successfull execution.
    3. Deleted Suffix Indexes file already present on disk i.e. If cleaner thread is slow , then there is a chance of deleted index files present on the disk while in parallel same index Entry is invalidated. To understand more refer https://issues.apache.org/jira/browse/KAFKA-15169

Reviewers: Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
2023-10-11 10:58:17 +08:00
Aman Singh 6e164bb9ac
KAFKA-14927: Add validation to be config keys in ConfigCommand tool (#14514)
Added validation in ConfigCommand tool, only allow characters
'([a-z][A-Z][0-9][._-])*' for config keys.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2023-10-10 13:19:13 +05:30
mannoopj bf51a50a56
MINOR: KRaft support for Integration Tests (#14295)
Enable kraft mode for some producer/fetcher tests.
2023-10-09 16:07:22 +08:00
hudeqi 1c3eb4395a
KAFKA-14912:Add a dynamic config for remote index cache size (#14381)
Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>, Subhrodip Mohanta <hello@subho.xyz>
2023-10-08 13:24:09 +05:30
Igor Soarez 7e1c453af9
KAFKA-15356: Generate and persist directory IDs (#14291)
Reviewers: Proven Provenzano <pprovenzano@confluent.io>, Ron Dagostino <rdagostino@confluent.io>
2023-10-06 13:03:40 -04:00
David Arthur 3c054833ac
KAFKA-15552 Fix Producer ID ZK migration (#14506)
This patch fixes a problem where we migrate the current producer ID batch to KRaft instead of the next producer ID batch. Since KRaft stores the next batch in the log, we end up serving up a duplicate batch to the first caller of AllocateProducerIds once the KRaft controller has taken over.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-10-05 21:49:31 -07:00
David Mao 2c925e9f33
KAFKA-15526: Simplify the LogAppendInfo class (#14470)
The LogAppendInfo class is a bit bloated in terms of class fields. That's because it is used as an umbrella class for both leader log appends and follower log appends and needs to carry fields for both. This makes the constructor for the class a bit cludgy to use. It also ends up being a bit confusing when fields are important and when they aren't. I noticed there were a few fields that didn't seem necessary.

Below is a description of changes:

firstOffset is a LogOffsetMetadata but there are no readers of the field that use anything but the messageOffset field - simplified to a long.
LogAppendInfo.errorMessage is only set in one context - when calling LogAppendInfo.unknownLogAppendInfoWithAdditionalInfo. When we use this constructor, we pass up the original exception in LogAppendResult anyway, so the field is redundant with the LogAppendResult.exception field. This allows us to simplify the handling in KAFKA-15459: Convert coordinator retriable errors to a known producer response error #14378 since there are no custom error messages we just return whatever is in the exception message.
We only use targetCompressionType when constructing the LogValidator - just inline the call instead of including it in the LogAppendInfo.
offsetsMonotonic is only used when not assigning offsets to throw an exception - just throw the exception instead of setting a field to throw later.
shallowCount is only there to determine whether there are any messages in the append. Instead, we can just check validBytes which is incremented with a non-zero value every time we increment shallowCount.

Reviewers: Justine Olshan <jolshan@confluent.io>
2023-10-03 17:32:44 -07:00
Omnia G.H Ibrahim 7553d3f562
KAFKA-14593: Move LeaderElectionCommand to tools (#13204)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
2023-10-03 11:59:56 +02:00
Nikolay 8f8dbad564
KAFKA-14595 ReassignPartitionsIntegrationTest rewritten in java (#14456)
This PR is part of #13247
It contains ReassignPartitionsIntegrationTest rewritten in java.
Goal of PR is reduce changes size in main PR.

Reviewers: Taras Ledkov  <tledkov@apache.org>, Justine Olshan <jolshan@confluent.io>
2023-10-02 13:22:17 -07:00
Justine Olshan b6c7855475
KAFKA-15449: Verify transactional offset commits (KIP-890 part 1) (#14370)
Previous commits left out TxnOffsetCommits which go through the group coordinator (not directly from the producer).

I've wired up the methods to include the transactional id and state partition to do the verification.

I've also updated UnifiedLog to verify on client and coordinator requests that are transactional.
I've not updated any sequence check logic since the sequence is always 0 on group coordinator initiated writes.

Added returned errors to Response files. Both InvalidPidMapping and InvalidTxnState will be returned and be fatal for the transactional OffsetCommit requests.

Reviewers:  David Jacot <david.jacot@gmail.com>,  Artem Livshits <alivshits@confluent.io>
2023-10-02 10:40:06 -07:00
David Jacot 6acf69d7a2
MINOR: Remove the client side assignor from the ConsumerGroupHeartbeat API (#14469)
As a first step, we plan to release a preview of the new consumer group rebalance protocol without the client side assignor. This patch removes all the related fields from the ConsumerGroupHeartbeat API for now. Removing fields is fine here because this API is not released yet and not exposed by default. We will add them back while bumping the version of the request when we release this part in the future.

Reviewers: Justine Olshan <jolshan@confluent.io>
2023-10-02 04:59:20 -07:00
iit2009060 13b119aa62
KAFKA-15511: Handle CorruptIndexException in RemoteIndexCache (#14459)
A bug in the RemoteIndexCache leads to a situation where the cache does not replace the corrupted index with a new index instance fetched from remote storage. This commit fixes the bug by adding correct handling for `CorruptIndexException`.

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Alexandre Dupriez <duprie@amazon.com>
2023-09-29 12:26:46 +02:00
Gantigmaa Selenge b0fd99106d
MINOR: Close UnifiedLog created in tests to avoid resource leak (#14453)
Reviewers: Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>
2023-09-29 12:00:01 +02:00
chern dedfed06f7
KAFKA-15510: Fix follower's lastFetchedEpoch when fetch response has … (#14457)
When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
2023-09-28 14:14:42 +01:00
Luke Chen 86450bf9ac
KAFKA-15498: bump snappy-java version to 1.1.10.4 (#14434)
bump snappy-java version to 1.1.10.4, and add more tests to verify the compressed data can be correctly decompressed and read.

For LogCleanerParameterizedIntegrationTest, we increased the message size for snappy decompression since in the new version of snappy, the decompressed size is increasing compared with the previous version. But since the compression algorithm is not kafka's scope, all we need to do is to make sure the compressed data can be successfully decompressed and parsed/read.

Reviewers: Divij Vaidya <diviv@amazon.com>, Ismael Juma <ismael@juma.me.uk>, Josep Prat <josep.prat@aiven.io>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2023-09-27 19:00:50 +08:00
Colin Patrick McCabe fcac880fd5
KAFKA-15466: Add KIP-919 support for some admin APIs (#14399)
Add support for --bootstrap-controller in the following command-line tools:
    - kafka-cluster.sh
    - kafka-configs.sh
    - kafka-features.sh
    - kafka-metadata-quorum.sh

To implement this, the following AdminClient APIs now support the new bootstrap.controllers
configuration:
    - Admin.alterConfigs
    - Admin.describeCluster
    - Admin.describeConfigs
    - Admin.describeFeatures
    - Admin.describeMetadataQuorum
    - Admin.incrementalAlterConfigs
    - Admin.updateFeatures

Command-line tool changes:
    - Add CommandLineUtils.initializeBootstrapProperties to handle parsing --bootstrap-controller
      in addition to --bootstrap-server.
    - Add --bootstrap-controller to ConfigCommand.scala, ClusterTool.java, FeatureCommand.java, and
      MetadataQuorumCommand.java.

KafkaAdminClient changes:
    - Add the AdminBootstrapAddresses class to handle extracting bootstrap.servers or
      bootstrap.controllers from the config map for KafkaAdminClient.
    - In AdminMetadataManager, store the new usingBootstrapControllers boolean. Generalize
      authException to encompass the concept of fatal exceptions in general. (For example, the
      fatal exception where we talked to the wrong node type.) Treat
      MismatchedEndpointTypeException and UnsupportedEndpointTypeException as fatal exceptions.
    - Extend NodeProvider to include information about whether bootstrap.controllers is supported.
    - Modify the APIs described above to support bootstrap.controllers.

Server-side changes:
    - Support DescribeConfigsRequest on kcontrollers.
    - Add KRaftMetadataCache to the kcontroller to simplify implemeting describeConfigs (and
      probably more APIs in the future). It's mainly a wrapper around MetadataImage, so there is
      essentially no extra resource consumption.
    - Split RuntimeLoggerManager out of ConfigAdminManager to handle the incrementalAlterConfigs
      support for BROKER_LOGGER. This is now supported on kcontrollers as well as brokers.
    - Fix bug in AuthHelper.computeDescribeClusterResponse that resulted in us always sending back
      BROKER as the endpoint type, even on the kcontroller.

Miscellaneous:
    - Fix a few places in exceptions and log messages where we wrote "broker" instead of "node".
      For example, an exception in NodeApiVersions.java, and a log message in NetworkClient.java.
    - Fix the slf4j log prefix used by KafkaRequestHandler logging so that request handlers on a
      controller don't look like they're on a broker.
    - Make the FinalizedVersionRange constructor public for the sake of a junit test.
    - Add unit and integration tests for the above.

Reviewers: David Arthur <mumrah@gmail.com>, Doguscan Namal <namal.doguscan@gmail.com>
2023-09-26 14:43:42 -07:00
David Jacot 08aa33127a
MINOR: Push logic to resolve the transaction coordinator into the AddPartitionsToTxnManager (#14402)
This patch refactors the ReplicaManager.appendRecords method and the AddPartitionsToTxnManager class in order to move the logic to identify the transaction coordinator based on the transaction id from the former to the latter. While working on KAFKA-14505, I found pretty annoying that we require to pass the transaction state partition to appendRecords because we have to do the same from the group coordinator. It seems preferable to delegate that job to the AddPartitionsToTxnManager.

Reviewers: Justine Olshan <jolshan@confluent.io>
2023-09-25 09:05:30 -07:00
Kamal Chandraprakash a15012078b
KAFKA-15479: Remote log segments should be considered once for retention breach (#14407)
When a remote log segment contains multiple epoch, then it gets considered for multiple times during breach by retention size/time/start-offset. This will affect the deletion by remote log retention size as it deletes the number of segments less than expected. This is a follow-up of KAFKA-15352

Reviewers: Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Satish Duggana <satishd@apache.org>
2023-09-25 17:41:53 +05:30
Ismael Juma 7ba6d7a0b4
MINOR: Update to Scala 2.13.12 (#14430)
It offers a quickfix action for certain errors, includes a number of bug fixes and it
introduces a new warning by default (https://github.com/scala/scala/pull/10462).

In addition to the scala version bump, we also fix the new compiler warnings and
bump the scalafmt version (the previous version failed with the new scala version).

Release notes: https://github.com/scala/scala/releases/tag/v2.13.12

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>
2023-09-24 06:05:12 -07:00