Commit Graph

3456 Commits

Author SHA1 Message Date
Kuan-Po Tseng 190df07ace
KAFKA-17265 Fix flaky MemoryRecordsBuilderTest#testBuffersDereferencedOnClose (#17092)
Reviewers: TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-09-05 19:47:16 +08:00
Luke Chen eb9cfb06c0
KAFKA-17412: add doc for `unclean.leader.election.enable` in KRaft (#17051)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-09-03 16:11:46 -07:00
TengYao Chi 2f9b236259
KAFKA-17294 Handle retriable errors when fetching offsets in new consumer (#16833)
The original behavior was implemented to maintain the behavior of the Classic consumer, where the ConsumerCoordinator would do the same when handling the OffsetFetchResponse. This behavior is being updated for the legacy coordinator as part of KAFKA-17279, to retry on all retriable errors.

We should review and update the CommitRequestManager to align with this, and retry on all retriable errors, which seems sensible when fetching offsets.

The corresponding PR for classic consumer is #16826

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-09-03 19:05:29 +08:00
ShivsundarR 743e185c8b
KAFKA-17450 Reduced the handlers for handling ShareAcknowledgeResponse (#17061)
Currently there are 4 handler functions present for handling ShareAcknowledge responses. ShareConsumeRequestManager had an interface and the respective handlers would implement it. Instead of having 4 different handlers for this, now using AcknowledgeRequestType, we could merge the code and have only 2 handler functions, one for ShareAcknowledge success and one for ShareAcknowledge failure, eliminating the need for the interface.

This PR also fixes a bug - We were not using the time at which the response was received while handling the ShareAcknowledge response, we were using an outdated time. Now the bug is fixed.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-09-03 11:31:49 +08:00
TengYao Chi 6a2789cf70
KAFKA-17293: New consumer HeartbeatRequestManager should rediscover disconnected coordinator (#16844)
Reviewers: Lianet Magrans <lianetmr@gmail.com>, TaiJuWu <tjwu1217@gmail.com>
2024-09-01 09:59:21 -04:00
TaiJuWu dc650fbd73
MINOR: add UT for consumer.poll (#17047)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-09-01 10:09:18 +08:00
PoAn Yang 4a2577b801
KAFKA-17395 Flaky test testMissingOffsetNoResetPolicy for new consumer (#17056)
In AsyncKafkaConsumer, FindCoordinatorRequest is sent by background thread. In MockClient#prepareResponseFrom, it only matches the response to a future request. If there is some race condition, FindCoordinatorResponse may not match to a FindCoordinatorRequest. It's better to put MockClient#prepareResponseFrom before the request to avoid flaky test.

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-31 23:57:19 +08:00
Eric Chang a6062d0868
KAFKA-17137 Feat admin client it acl configs (#16648)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-31 12:29:39 +08:00
PoAn Yang 70dd577286
KAFKA-15909 Throw error when consumer configured with empty/whitespace-only group.id for LegacyKafkaConsumer (#16933)
Per KIP-289, the use of an empty value for group.id configuration was deprecated back in 2.2.0.

In 3.7, the AsyncKafkaConsumer implementation will throw an error (see KAFKA-14438).

This task is to update the LegacyKafkaConsumer implementation to throw an error in 4.0.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-30 23:24:36 +08:00
PoAn Yang 2b495945a2
KAFKA-17377: Consider using defaultApiTimeoutMs in AsyncKafkaConsumer#unsubscribe (#17030)
Reviewers: Kirk True <ktrue@confluent.io>, Lianet Magrans <lianetmr@gmail.com>
2024-08-29 20:24:25 -04:00
PoAn Yang 8db80d1f07
KAFKA-17064: New consumer assign should update assignment in background thread (#16673)
Reviewers: Kirk True <ktrue@confluent.io>, Lianet Magrans <lianetmr@gmail.com>
2024-08-29 23:01:36 +02:00
Ken Huang 28e2e8631f
KAFKA-17170: Add test to ensure new consumer acks reconciled assignment even if first HB with ack lost (#16694)
Reviewers: Kirk True <ktrue@confluent.io>, Lianet Magrans <lianetmr@gmail.com>
2024-08-29 13:26:40 +02:00
Kirk True dd7d7c3145
KAFKA-17335 Lack of default for URL encoding configuration for OAuth causes NPE (#16990)
AccessTokenRetrieverFactory uses the value of sasl.oauthbearer.header.urlencode provided by the user, or null if no value was provided for that configuration. When the HttpAccessTokenRetriever is created the JVM attempts to unbox the value into a boolean, a NullPointerException is thrown.

The fix is to explicitly check the Boolean, and if it's null, use Boolean.FALSE.

Reviewers: bachmanity1 <81428651+bachmanity1@users.noreply.github.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-28 23:11:41 +08:00
Lianet Magrans f9e30289d9
KAFKA-17403 Generate HB to leave on pollOnClose if needed (#16974)
Fix to ensure that the HB request to leave the group is generated when closing the HBRequestManager if the state is LEAVING. This is needed because we could end up closing the network thread without giving a chance to the HBManager to generate the request. This flow on consumer.close with short timeout:

1. app thread triggers releaseAssignmentAndLeaveGroup
2. background thread transitions to LEAVING
    2.1 the next run of the background thread should poll the HB manager and generate a request
3. app thread releaseAssignmentAndLeaveGroup times out and moves on to close the network thread (stops polling managers. Calls pollOnClose to gather the final requests and send them along with the unsent)

If 3 happens in the app thread before 2.1 happens in the background, the HB manager won't have a chance to generate the request to leave. This PR implements the pollOnClose to generate the final request if needed.


Reviewers: Kirk True <kirk@kirktrue.pro>, TaiJuWu <tjwu1217@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-28 09:04:25 +08:00
DL1231 006af8b939
KAFKA-17327; Add support of group in kafka-configs.sh (#16887)
The patch adds support of alter/describe configs for group in kafka-configs.sh.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
2024-08-27 02:16:46 -07:00
Kuan-Po Tseng 5557720246
KAFKA-17038 KIP-919 supports for `alterPartitionReassignments` and `listPartitionReassignments` (#16644)
This is a follow-up after KIP-919, extending support for BOOTSTRAP_CONTROLLERS_CONFIG to both Admin#alterPartitionReassignments and Admin#listPartitionReassignments.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-27 17:12:16 +08:00
Andrew Schofield ccd24f2cf6
KAFKA-17341 Refactor consumer heartbeat request managers (#16963)
Refactor the heartbeat request managers for consumer groups and share groups. Almost all of the code can be shared which is definitely good.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-27 16:37:37 +08:00
xijiu a39037e55c
KAFKA-17399 Apply LambdaValidator to code base (#16980)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-26 21:53:49 +08:00
TengYao Chi d67c18b4ae
KAFKA-17331 Set correct version for EarliestLocalSpec and LatestTieredSpec (#16876)
Add the version check to client side when building ListOffsetRequest for the specific timestamp:
1) the version must be >=8 if timestamp=-4L (EARLIEST_LOCAL_TIMESTAMP)
2) the version must be >=9 if timestamp=-5L (LATEST_TIERED_TIMESTAMP)

Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-25 17:39:28 +08:00
xijiu e750f44cf8
KAFKA-17409 Remove deprecated constructor of org.apache.kafka.clients.producer.RecordMetadata (#16979)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-25 06:02:42 +08:00
Andrew Schofield ffc865c432
KAFKA-17291: Add integration test for share group list and describe (#16920)
Add an integration test for share group list and describe admin operations.

Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
2024-08-23 16:31:04 +05:30
xijiu 246b165456
KAFKA-17359 Add tests and enhance the docs of `Admin#describeConfigs` for the case of nonexistent resource (#16947)
- The different behavior of nonexistent resource. For example: nonexistent broker will cause timeout; nonexistent topic will produce UnknownTopicOrPartitionException; nonexistent group will return static/default configs; client_metrics will return empty configs
- The resources (topic and broker resource types are currently supported) this description is out-of-date
- Add some junit test

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-23 03:23:34 +08:00
Andrew Schofield 94f5039350
KAFKA-17378 Fixes for performance testing (#16942)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-22 06:53:21 +08:00
Sean Quah c207438823
KAFKA-17279: Handle retriable errors from offset fetches (#16826)
Handle retriable errors from offset fetches in ConsumerCoordinator.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, David Jacot <djacot@confluent.io>
2024-08-20 06:13:25 -07:00
ShivsundarR 932e84096a
KAFKA-17325: Updated result handling in ShareConsumeRequestManager::commitAsync(). (#16903)
Currently we were not updating the result count when we merged commitAsync() requests into one batch in ShareConsumeRequestManager, so this led to lesser acknowledgements sent to the application thread (ShareConsumerImpl) than expected.
Fix : Now if the acknowledge response came from a commitAsync, then we do not wait for other requests to complete, we always prepare a background event to be sent.

This PR also fixes a bug in ShareConsumeRequestManager, where during the final ShareAcknowledge sent during close(), we also pick up any piggybacked acknowledgements which were waiting to be sent along with ShareFetch.

 Reviewers:  Andrew Schofield <aschofield@confluent.io>,  Manikumar Reddy <manikumar.reddy@gmail.com>
2024-08-20 16:44:53 +05:30
David Schlosnagle 050edfaf00
KAFKA-14336: MetadataResponse#convertToNodeArray uses iteration (#12782)
Avoids stream allocation on hot code path in Admin#listOffsets

This patch avoids allocating the stream reference pipeline & spliterator for this case by explicitly allocating the pre-sized Node[] and using a for loop with int induction over the specified IDs List argument.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Kirk True <kirk@kirktrue.pro>, David Arthur <mumrah@gmail.com>
2024-08-19 19:46:51 -04:00
PoAn Yang 2f0ae82d4a
KAFKA-12989 MockClient should respect the request matcher passed to prepareUnsupportedVersionResponse (#16849)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-19 23:19:01 +08:00
lushilin 5f02ef952e
KAFKA-17340 correct the docs of allow.auto.create.topics (#16880)
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-19 03:56:25 +08:00
xijiu 21dd5cd421
KAFKA-16818 Move event processing-related tests from ConsumerNetworkThreadTest to ApplicationEventProcessorTest (#16875)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-16 02:27:52 +08:00
Andrew Schofield 7031855570
KAFKA-17318 ConsumerRecord.deliveryCount and remove deprecations (#16872)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-16 00:11:08 +08:00
Ken Huang b767c65527
KAFKA-17326 The LIST_OFFSET request is removed from the "Api Keys" page (#16870)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-15 18:59:38 +08:00
DL1231 3a0efa2845
KAFKA-14510; Extend DescribeConfigs API to support group configs (#16859)
This patch extends the DescribeConfigs API to support group configs.

Reviewers: Andrew Schofield <aschofield@confluent.io>, David Jacot <djacot@confluent.io>
2024-08-14 06:37:57 -07:00
xijiu 75bcb9eb42
KAFKA-17239 add request-latency metrics for node in admin client (#16832)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-14 04:33:47 +08:00
Stanislav Knot 43a49f0b82
MINOR: typo in the lz4java exception handling (#16869)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-13 22:19:20 +08:00
PoAn Yang dc636ea1ff
KAFKA-17309 Fix flaky testCallFailWithUnsupportedVersionExceptionDoesNotHaveConcurrentModificationException (#16854)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-13 21:06:20 +08:00
Ken Huang 98d7618b94
KAFKA-17319 change ListOffsetsRequest latestVersionUnstable to false (#16865)
Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <payang@apache.org>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-13 20:44:20 +08:00
Chirag Wadhwa 5b47975bb9
KAFKA-16746: Implemented handleShareAcknowledgeRequest RPC including unit tests (#16792)
Implemented handleShareAcknowledge request RPC in KafkaApis.scala. This method is called whenever the client sends a Share Acknowledge request to the broker. The acknowledge logic is handles asynchronously and the results are handled appropriately.

Reviewers:  Apoorv Mittal <apoorvmittal10@gmail.com>, Jun Rao <junrao@gmail.com>
2024-08-12 07:41:59 -07:00
DL1231 bbdf79e1b4
KAFKA-14511; Extend AlterIncrementalConfigs API to support group config (#15067)
This patch add resources to store and handle consumer group's config.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>, David Jacot <djacot@confluent.io>
2024-08-12 00:40:13 -07:00
Colin Patrick McCabe e1b2adea07
KAFKA-17190: AssignmentsManager gets stuck retrying on deleted topics (#16672)
In MetadataVersion 3.7-IV2 and above, the broker's AssignmentsManager sends an RPC to the
controller informing it about which directory we have chosen to place each new replica on.
Unfortunately, the code does not check to see if the topic still exists in the MetadataImage before
sending the RPC. It will also retry infinitely. Therefore, after a topic is created and deleted in
rapid succession, we can get stuck including the now-defunct replica in our subsequent
AssignReplicasToDirsRequests forever.

In order to prevent this problem, the AssignmentsManager should check if a topic still exists (and
is still present on the broker in question) before sending the RPC. In order to prevent log spam,
we should not log any error messages until several minutes have gone past without success.
Finally, rather than creating a new EventQueue event for each assignment request, we should simply
modify a shared data structure and schedule a deferred event to send the accumulated RPCs. This
will improve efficiency.

Reviewers: Igor Soarez <i@soarez.me>, Ron Dagostino <rndgstn@gmail.com>
2024-08-10 12:31:45 +01:00
Andrew Schofield 0a4a12fbc4
KAFKA-17225: Refactor consumer membership managers (#16751)
The initial drop of ShareMembershipManager contained a lot of code duplicated from MembershipManagerImpl. The plan was always to share as much code as possible between the membership managers for consumer groups and share groups. This issue refactors the membership managers so that almost all of the code is in common.

Reviewers:  Lianet Magrans <lianetmr@gmail.com>,  Chia Chuan Yu <yujuan476@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
2024-08-09 16:18:07 +05:30
José Armando García Sancio 8ce514a52e
KAFKA-16534; Implemeent update voter sending (#16837)
This change implements the KRaft voter sending UpdateVoter request. The
UpdateVoter RPC is used to update a voter's listeners and supported
kraft versions. The UpdateVoter RPC is sent if the replicated voter set
(VotersRecord in the log) doesn't match the local voter's supported
kraft versions and controller listeners.

To not starve the Fetch request, the UpdateVoter request is sent at most
every 3 fetch timeouts. This is required to make sure that replication
is making progress and eventually the voter set in the replicated log
matches the local voter configuration.

This change also modifies the semantic for UpdateVoter. Now the
UpdateVoter response is sent right after the leader has created the new
voter set. This is required so that updating voter can transition from
sending UpdateVoter request to sending Fetch request. If the leader
waits for the VotersRecord control record to commit before sending the
UpdateVoter response, it may never send the UpdateVoter response. This
can happen if the leader needs that voter's Fetch request to commit the
control record.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-08-08 16:16:09 -07:00
Colin Patrick McCabe 6a44fb154d
KAFKA-16523; kafka-metadata-quorum: support add-controller and remove-controller (#16774)
This PR adds support for add-controller and remove-controller in the kafka-metadata-quorum.sh
command-line tool. It also fixes some minor server-side bugs that blocked the tool from working.

In kafka-metadata-quorum.sh, the implementation of remove-controller is fairly straightforward. It
just takes some command-line flags and uses them to invoke AdminClient. The add-controller
implementation is a bit more complex because we have to look at the new controller's configuration
file. The parsing logic for the advertised.listeners and listeners server configurations that we
need was previously implemented in the :core module. However, the gradle module where
kafka-metadata-quorum.sh lives, :tools, cannot depend on :core. Therefore, I moved listener parsing
into SocketServerConfigs.listenerListToEndPoints. This will be a small step forward in our efforts
to move Kafka configuration out of :core.

I also made some minor changes in kafka-metadata-quorum.sh and Kafka-storage-tool.sh to handle
--help without displaying a backtrace on the screen, and give slightly better error messages on
stderr. Also, in DynamicVoter.toString, we now enclose the host in brackets if it contains a colon
(as IPV6 addresses can).

This PR fixes our handling of clusterId in addRaftVoter and removeRaftVoter, in two ways. Firstly,
it marks clusterId as nullable in the AddRaftVoterRequest.json and RemoveRaftVoterRequest.json
schemas, as it was always intended to be. Secondly, it allows AdminClient to optionally send
clusterId, by using AddRaftVoterOptions and RemoveRaftVoterOptions. We now also remember to
properly set timeoutMs in AddRaftVoterRequest. This PR adds unit tests for
KafkaAdminClient#addRaftVoter and KafkaAdminClient#removeRaftVoter, to make sure they are sending
the right things.

Finally, I fixed some minor server-side bugs that were blocking the handling of these RPCs.
Firstly, ApiKeys.ADD_RAFT_VOTER and ApiKeys.REMOVE_RAFT_VOTER are now marked as forwardable so that
forwarding from the broker to the active controller works correctly. Secondly,
org.apache.kafka.raft.KafkaNetworkChannel has now been updated to enable API_VERSIONS_REQUEST and
API_VERSIONS_RESPONSE.

Co-authored-by: Murali Basani muralidhar.basani@aiven.io
Reviewers: José Armando García Sancio <jsancio@apache.org>, Alyssa Huang <ahuang@confluent.io>
2024-08-08 15:54:12 -07:00
bboyleonp666 e7317b37bf
KAFKA-17269 Fix ConcurrentModificationException caused by NioEchoServer.closeNewChannels (#16817)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-09 03:24:59 +08:00
PoAn Yang 130af38481
KAFKA-17223 Retrying the call after encoutering UnsupportedVersionException will cause ConcurrentModificationException (#16753)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-09 01:07:54 +08:00
Alyssa Huang 3066019efa
KAFKA-16521: Have Raft endpoints printed as name://host:port (#16830)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-08-08 09:22:23 -07:00
ShivsundarR 0cbc5e083a
KAFKA-17217: Batched acknowledge requests per node in ShareConsumeRequestManager (#16727)
In ShareConsumeRequestManager, currently every time we perform a commitSync/commitAsync/acknowledgeOnClose we create one AcknowledgeRequestState for each call. But this can be optimised further as we can batch up the acknowledgements to be sent to the same node before the next poll() is invoked.

This will ensure that between 2 polls, the acknowledgements are accumulated in one request per node and then sent during poll, resulting in lesser RPC calls.

To achieve this, we are storing a pair of acknowledge request states for every node, the first value denotes the requestState for commitAsync() and the second value denotes the requestState for commitSync() and acknowledgeOnClose(). All the acknowledgements to be sent to a particular node are stored in the corresponding acknowledgeRequestState based on whether it was synchronous or asynchronous.

Reviewers:  Andrew Schofield <aschofield@confluent.io>,  Manikumar Reddy <manikumar.reddy@gmail.com>
2024-08-08 20:43:37 +05:30
Luke Chen 164f899605
KAFKA-17236: Handle local log deletion when remote.log.copy.disabled=true (#16765)
Handle local log deletion when remote.log.copy.disabled=true based on the KIP-950.

When tiered storage is disabled or becomes read-only on a topic, the local retention configuration becomes irrelevant, and all data expiration follows the topic-wide retention configuration exclusively.

- added remoteLogEnabledAndRemoteCopyEnabled method to check if this topic enables tiered storage and remote log copy is enabled. We should adopt local.retention.ms/bytes when remote.storage.enable=true,remote.log.copy.disable=false.
- Changed to use retention.bytes/retention.ms when remote copy disabled.
- Added validation to ask users to set local.retention.ms == retention.ms and local.retention.bytes == retention.bytes
- Added tests

Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Satish Duggana <satishd@apache.org>, Christo Lolov <lolovc@amazon.com>
2024-08-08 17:07:40 +05:30
Ken Huang 786d2c9975
KAFKA-17276; replicaDirectoryId for Fetch and FetchSnapshot should be ignorable (#16819)
The replicaDirectoryId field for FetchRequest and FetchSnapshotRequest should be ignorable. This allows data objects with the directory id to be serialized to any version of the requests.

Reviewers: José Armando García Sancio <jsancio@apache.org>, Chia-Ping Tsai <chia7712@apache.org>
2024-08-07 20:53:15 -04:00
Andrew Schofield b9099301fb
KAFKA-16716: Add admin list and describe share groups (#16827)
Adds the methods to the admin client for listing and describing share groups.
There are some unit tests, but integration tests will be in a follow-on PR.

Reviewers:  Manikumar Reddy <manikumar.reddy@gmail.com>
2024-08-08 00:46:25 +05:30
Tim Fox 837684a1b9
MINOR: Correct type in README.md from "boolean" to "bool" (#16706)
The actual type name in the JSON descriptor files is "bool" not "boolean"

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Andrew Schofield <andrew_schofield@live.com>
2024-08-07 17:34:53 +02:00
Mickael Maison 7c5d339d07
KAFKA-17227: Refactor compression code to only load codecs when used (#16782)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Josep Prat <josep.prat@aiven.io>
2024-08-06 11:01:21 +02:00
Philip Nee c18463ec2d
MINOR: Rethrow caught exception instead of wrapping into LoginException (#16769)
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2024-08-06 09:48:07 +01:00
José Armando García Sancio e49c8df1f7 KAFKA-16533; Update voter handling
Add support for handling the update voter RPC. The update voter RPC is used to automatically update
the voters supported kraft versions and available endpoints as the operator upgrades and
reconfigures the KRaft controllers.

The add voter RPC is handled as follow:

1. Check that the leader has fenced the previous leader(s) by checking that the HWM is known;
   otherwise, return the REQUEST_TIMED_OUT error.

2. Check that the cluster supports kraft.version 1; otherwise, return the UNSUPPORTED_VERSION error.

3. Check that there are no uncommitted voter changes, otherwise return the REQUEST_TIMED_OUT error.

4. Check that the updated voter still supports the currently finalized kraft.version; otherwise
   return the INVALID_REQUEST error.

5. Check that the updated voter is still listening on the default listener.

6. Append the updated VotersRecord to the log. The KRaft internal listener will read this
   uncommitted record from the log and update the voter in the set of voters.

7. Wait for the VotersRecord to commit using the majority of the voters. Return a REQUEST_TIMED_OUT
   error if it doesn't commit in time.

8. Send the UpdateVoter successful response to the voter.

This change also implements the ability for the leader to update its own entry in the voter
set when it becomes leader for an epoch. This is done by updating the voter set and writing a
control batch as the first batch in a new leader epoch.

Finally, fix a bug in KafkaAdminClient's handling of removeRaftVoterResponse where we tried to cast
the response to the wrong type.

Reviewers: Alyssa Huang <ahuang@confluent.io>, Colin P. McCabe <cmccabe@apache.org>
2024-08-05 11:32:21 -07:00
TengYao Chi 736bd3cd09
KAFKA-17201 SelectorTest.testInboundConnectionsCountInConnectionCreationMetric leaks sockets and threads (#16723)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-05 22:47:06 +08:00
brenden20 0b78d8459c
MINOR: Remove ConsumerTestBuilder.java (#16691)
The purpose of this PR is to remove ConsumerTestBuilder.java since it is no longer needed. The following PRs have eliminated the use of ConsumerTestBuilder:
#14930
#16140
#16200
#16312

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-08-05 11:05:53 +08:00
Kuan-Po Tseng 84add30ea5
KAFKA-16154: Broker returns offset for LATEST_TIERED_TIMESTAMP (#16783)
This pr support EarliestLocalSpec LatestTierSpec in GetOffsetShell, and add integration tests.

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang <payang@apache.org>
2024-08-05 10:41:14 +08:00
Lianet Magrans 2a7bad8ca0
MINOR: Fix consumer log on fatal error & improve memberId logging (#16720)
Fix log on consumer fatal error, to show member ID only if present. If no member ID the log will clearly indicate that the member has no member ID (instead of showing empty as it used to)

Apply same fix consistently to all other log lines that include member ID.

Reviewers: Kirk True <kirk@kirktrue.pro>, PoAn Yang <payang@apache.org>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-04 02:18:14 +08:00
Luke Chen 9f7e8d478a
KAFKA-16855: remote log disable policy in KRaft (#16653)
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Christo Lolov <lolovc@amazon.com>
2024-08-03 09:38:41 +01:00
PoAn Yang 6e324487fa
KAFKA-16480: ListOffsets change should have an associated API/IBP version update (#16781)
1. Use oldestAllowedVersion as 9 if using ListOffsetsRequest#EARLIEST_LOCAL_TIMESTAMP or ListOffsetsRequest#LATEST_TIERED_TIMESTAMP.
   2. Add test cases to ListOffsetsRequestTest#testListOffsetsRequestOldestVersion to make sure requireTieredStorageTimestamp return 9 as minVersion.
   3. Add EarliestLocalSpec and LatestTierSpec to OffsetSpec.
   4. Add more cases to KafkaAdminClient#getOffsetFromSpec.
   5. Add testListOffsetsEarliestLocalSpecMinVersion and testListOffsetsLatestTierSpecSpecMinVersion to KafkaAdminClientTest to make sure request builder has oldestAllowedVersion as 9.

Signed-off-by: PoAn Yang <payang@apache.org>

Reviewers: Luke Chen <showuon@gmail.com>
2024-08-03 14:27:27 +08:00
Alyssa Huang bc4df734b5
KAFKA-16521; kafka-metadata-quorum describe command changes for KIP-853 (#16759)
describe --status now includes directory id and endpoint information for voter and observers.
describe --replication now includes directory id.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@apache.org>
2024-08-01 15:28:57 -04:00
Apoorv Mittal 902fc33b27
KAFKA-17230: Fix for consumer node latency metrics (#16755)
Kafka Consumer client registers node/connection latency metrics in Selector.java but the values against the metric is never recorded. This seems to be an issue since inception. 

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>
2024-08-01 12:04:19 -07:00
Kondrat Bertalan 8d8c367066
KAFKA-17192 Fix MirrorMaker2 worker config does not pass config.provi… (#16678)
Reviewers: Chris Egerton <chrise@aiven.io>
2024-07-31 16:50:22 -04:00
Linu Shibu 9d634629f2
KAFKA-16899 MembershipManagerImpl: rebalanceTimeoutMs variable name changed to commitTimeoutDuringReconciliation (#16334)
As mentioned in the ticket, the config property name "rebalanceTimeoutMs" is misleading since the property only handles the client's commit portion of the process.It is used in MembershipManagerImpl as a means to limit the client's efforts in the case where it is repeatedly trying to commit but failing. Considering the same, the property name has been updated to "commitTimeoutDuringReconciliation" (suggested in ticket) in GroupRebalanceConfig and in all other classes where the property/variable is used

Reviewers: Kirk True <kirk@kirktrue.pro>, Chia-Ping Tsai <chia7712@gmail.com>
2024-08-01 00:51:46 +08:00
Philip Nee 51c1e1e147
KAFKA-17188: Ensure login and callback handler are closed upon encountering LoginException (#16724)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
2024-07-31 08:51:54 +01:00
Nancy 9e06767ffa
KAFKA-13898 Updated docs for metrics.recording.level (#16402)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-31 04:29:33 +08:00
Chung, Ming-Yen 7c0a96d08d
KAFKA-17185 Declare Loggers as static to prevent multiple logger instances (#16680)
As discussed in #16657 (comment) , we should make logger as static to avoid creating multiple logger instances.
I use the regex private.*Logger.*LoggerFactory to search and check all the results if certain logs need to be static.

There are some exceptions that loggers don't need to be static:
1) The logger in the inner class. Since java8 doesn't support static field in the inner class.
        https://github.com/apache/kafka/blob/trunk/clients/src/test/java/org/apache/kafka/clients/consumer/internals/FetchRequestManagerTest.java#L3676

2) Custom loggers for each instance (non-static + non-final). In this case, multiple logger instances is actually really needed.
        https://github.com/apache/kafka/blob/trunk/storage/src/test/java/org/apache/kafka/server/log/remote/storage/LocalTieredStorage.java#L166

3) The logger is initialized in constructor by LogContext. Many non-static but with final modifier loggers are in this category, that's why I use .*LoggerFactory to only check the loggers that are assigned initial value when declaration.
    
4) protected final Logger log = Logger.getLogger(getClass())
    This is for subclass can do logging with subclass name instead of superclass name.
    But in this case, if the log access modifier is private, the purpose cannot be achieved since subclass cannot access the log defined in superclass. So if access modifier is private, we can replace getClass() with <className>.class

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-31 02:37:36 +08:00
Kirk True d260b06180
KAFKA-17060 Rename LegacyKafkaConsumer to ClassicKafkaConsumer (#16683)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-29 20:56:23 +08:00
Alyssa Huang da8fe6355b
KAFKA-16915; LeaderChangeMessage supports directory id (#16668)
Extend LeaderChangeMessage schema to support version 1 of the message. The leader will continue to write version 0 of the schema. This is needed so that in the future the leader can write version 1 of the message and be guaranteed that all of the replicas in the cluster support version 1 of the schema.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2024-07-28 11:12:42 -04:00
José Armando García Sancio da32dcab2c
KAKFA-16537; Implement remove voter RPC (#16670)
Implement the RemoveVoter RPC. The general algorithm is as follow:

1. Check that the leader has fenced the previous leader(s) by checking that the HWM is known;
  otherwise return the REQUEST_TIMED_OUT error.
2. Check that the cluster supports kraft.version 1; otherwise return the UNSUPPORTED_VERSION error.
3. Check that there are no uncommitted voter changes; otherwise return the REQUEST_TIMED_OUT error.
4. Append the updated VotersRecord to the log. The KRaft internal listener will read this uncommitted
  record from the log and add the new voter to the set of voters.
5. Wait for the VotersRecord to commit using the majority of the new set of voters. Return a 
  REQUEST_TIMED_OUT error if it doesn't commit in time.
6. Send the RemoveVoter successful response to the client.
7. Resign the leadership if the leader is not in the new voter set

One thing to note is that now that KRaft supports both the remove voter and add voter RPC. Only one
change can be pending at once. This is achieved in the following ways. The AddVoter RPC checks if
there are pending AddVoter or RemoveVoter RPC. The RemoveVoter RPC checks if there are any
pending AddVoter or RemoveVoter RPC. Both RPCs check that there is no uncommitted VotersRecord.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-07-26 16:25:41 -07:00
PaulRMellor 738d8cc91e
MINOR: Update bootstrap.servers doc string (#16655)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2024-07-26 15:08:01 +02:00
brenden20 41beee508e
KAFKA-16558: Implemented HeartbeatRequestState toStringBase() and added a test for it (#16373)
Reviewers: Kirk True <kirk@kirktrue.pro>, Matthias J. Sax <matthias@confluent.io>
2024-07-25 13:27:17 -07:00
brenden20 cca0390822
KAFKA-15999 Migrate HeartbeatRequestManagerTest away from ConsumerTestBuilder (#16200)
In this PR I have completely migrated HeartbeatRequestManagerTest away from ConsumerTestBuilder. I have removed any instances of spy objects and used mocks wherever possible. 31/31 tests are passing. I also removed one test that existed in another test file.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-25 14:12:51 +08:00
Zhengke Zhou a38829e61b
KAFKA-16765: Close leaked accepted sockets in EchoServer, NioEchoServer, ServerShutdownTest (#16576)
Reviewers: Greg Harris <greg.harris@aiven.io>
2024-07-24 10:20:51 -07:00
TaiJuWu 4fa1c21940
KAFKA-17104 InvalidMessageCrcRecordsPerSec is not updated in validating LegacyRecord (#16558)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-24 19:25:11 +08:00
TaiJuWu f9f480ac7c
KAFKA-17163 revisit testSubscriptionOnInvalidTopic and testPollAuthenticationFailure (#16638)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-24 16:40:35 +08:00
José Armando García Sancio 9db5c2481f KAFKA-16535; Implement KRaft add voter handling
This change implements the AddVoter RPC. The high-level algorithm is as follow:

1. Check that the leader has fenced the previous leader(s) by checking that the HWM is known;
   otherwise, return the REQUEST_TIMED_OUT error.
2. Check that the cluster supports kraft.version 1; otherwise return the UNSUPPORTED_VERSION error.
3. Check that there are no uncommitted voter changes; otherwise return the REQUEST_TIMED_OUT error.
4. Check that the new voter's id is not part of the existing voter set, otherwise return the
   DUPLICATE_VOTER error.
5. Send an API_VERSIONS RPC to the first (default) listener to discover the supported kraft.version
   of the new voter.
6. Check that the new voter supports the current kraft.version, otherwise return the
   INVALID_REQUEST error.
7. Check that the new voter is caught up to the log end offset of the leader, otherwise return a
   REQUEST_TIMED_OUT error.
8. Append the updated VotersRecord to the log. The KRaft internal listener will read this
   uncommitted record from the log and add the new voter to the set of voters.
9.  Wait for the VotersRecord to commit using the majority of the new set of voters. Return a
    REQUEST_TIMED_OUT error if it doesn't commit in time.
10. Send the AddVoter successful response to the client.

The leader implements the above algorithm by tracking 3 events: the ADD_VOTER request is received,
the API_VERSIONS response is received and finally the HWM is updated.

The state of the ADD_VOTER operation is tracked by LeaderState using the AddVoterHandlerState. The
algorithm is implemented by the AddVoterHandler type.

This change also fixes a small issue introduced by the bootstrap checkpoint (0-0.checkpoint). The
internal partition listener (KRaftControlRecordStateMachine) and the external partition listener
(KafkaRaftClient.ListenerContext) were using "nextOffset = 0" as the initial state of the reading
cursor. This was causing the bootstrap checkpoint to keep getting reloaded until the leader wrote a
record to the log. Changing the initial offset (nextOffset) to -1 allows the listeners to
distinguish between the initial state (nextOffset == -1) and the bootstrap checkpoint was loaded
but the log segment is empty (nextOffset == 0).

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-07-22 13:07:38 -07:00
PoAn Yang defcbb51ee
KAFKA-17082 replace kafka.utils.LogCaptureAppender with org.apache.kafka.common.utils.LogCaptureAppender (#16601)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-21 18:22:05 +08:00
Andrew Schofield 28f6fbdd55
KAFKA-17144 Retract unimplemented ListGroups v6 added in error (#16635)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-21 15:03:23 +08:00
TaiJuWu f09ead1483
KAFKA-17132 Revisit testMissingOffsetNoResetPolicy for AsyncConsumer (#16587)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-19 18:46:04 +08:00
Lianet Magrans 931bb62a23
KAFKA-16984: Complete consumer leave on response to leave request (#16569)
Improvement to ensure that the consumer unsubscribe operation waits for a response to the leave group request before moving on to close the consumer. This makes it consistent with the behaviour of the legacy consumer.

This will avoid undesired interactions on close, that triggers a leave group, and shuts down the network thread when it completes (which before this PR could led to responses to disconnected clients).

Note that this PR does not change the transitions of the state machine on leave group, only the completion of the leave group future.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Kirk True <ktrue@confluent.io>
2024-07-19 10:08:40 +02:00
Chung, Ming-Yen 66655ab49a
KAFKA-17095 Fix the typo from "CreateableTopicConfig" to "CreatableTopicConfig" (#16623)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-19 11:09:08 +08:00
PoAn Yang 6acc220e03
KAFKA-15773 Group protocol configuration should be validated (#16543)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-18 18:31:36 +08:00
PoAn Yang c7c0de3889
KAFKA-17141 'DescribeTopicPartitions API is not supported' warning message confuses users (#16612)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-18 18:00:35 +08:00
Greg Harris c97421c100
KAFKA-17150: Use Utils.loadClass instead of Class.forName to resolve aliases correctly (#16608)
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewers: Chris Egerton <chrise@aiven.io>, Chia-Ping Tsai <chia7712@gmail.com>, Josep Prat <josep.prat@aiven.io>
2024-07-17 16:00:45 -07:00
PoAn Yang b015a83f6d
KAFKA-17017 AsyncKafkaConsumer#unsubscribe clean the assigned partitions (#16449)
Reviewers: Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-17 04:18:33 +08:00
Colin Patrick McCabe 4e66836ca0
MINOR: fix JavaDoc error in SupportedVersionRange.java (#16605)
Reviewers: José Armando García Sancio <jsancio@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-17 04:12:53 +08:00
Alyssa Huang d0f71486c2
MINOR: Improve the logging of IllegalStateException exceptions thrown from SslFactory (#16346)
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Rajini Sivaram <rajinisivaram@googlemail.com>
2024-07-16 11:01:53 -07:00
Colin Patrick McCabe 4d3e366bc2
KAFKA-16772: Introduce kraft.version to support KIP-853 (#16230)
Introduce the KRaftVersion enum to describe the current value of kraft.version. Change a bunch of places in the code that were using raw shorts over to using this new enum.

In BrokerServer.scala, fix a bug that could cause null pointer exceptions during shutdown if we tried to shut down before fully coming up.

Do not send finalized features that are finalized as level 0, since it is a no-op.

Reviewers: dengziming <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@apache.org>
2024-07-16 09:31:10 -07:00
Andrew Schofield 1e16e16f64
KAFKA-16730: Initial version of share group consumer client code (#16461)
This is the initial version of the share group consumer client code. It implements the complete ShareConsumer interface.

There are unit tests, but not integration tests yet since those would depend upon complete broker code, which is not available at this point.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>,  Lianet Magrans <lianetmr@gmail.com>
2024-07-16 16:45:38 +05:30
Demonic 43235c2796
KAFKA-17133 add unit test to make sure `ConsumerRecords#recoreds` returns immutable object (#16588)
Reviewers: TingIāu "Ting" Kì <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-16 17:05:13 +08:00
Ken Huang 20ee83c462
KAFKA-17102 FetchRequest#forgottenTopics would return incorrect data (#16557)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-16 05:12:43 +08:00
xijiu f749af557a
MINOR: add some unit test for common Utils (#16549)
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-15 21:18:51 +08:00
PoAn Yang 2cf1ef99d7
KAFKA-17129 Revisit FindCoordinatorResponse in KafkaConsumerTest (#16589)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-15 08:17:53 +08:00
PoAn Yang e104974b74
KAFKA-17110 Enable valid test case in KafkaConsumerTest for AsyncKafkaConsumer (#16566)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-14 09:27:41 +08:00
TaiJuWu 75075002b5
KAFKA-17106 enable testFetchProgressWithMissingPartitionPosition for AsyncConsumer (#16564)
Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-14 09:04:20 +08:00
Ken Huang ec9dabf86f
KAFKA-17092 Revisit `KafkaConsumerTest#testBeginningOffsetsTimeout` for AsyncConsumer (#16541)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-12 05:17:48 +08:00
Colin Patrick McCabe ede289db93
KAFKA-17011: Fix a bug preventing features from supporting v0 (#16421)
As part of KIP-584, brokers expose a range of supported versions for each feature. For example, metadata.version might be supported from 1 to 21. (Note that feature level ranges are always inclusive, so this would include both level 1 and 21.)

These supported ranges are supposed to be able to include 0. For example, it should be possible for a broker to support a kraft.version between 0 and 1. However, in older software versions, there is an assertion in org.apache.kafka.common.feature.SupportedVersionRange that prevents this. This causes problems when the older software attempts to deserialize an ApiVersionsResponse containing such a range.

In order to resolve this dilemma, this PR bumps the version of ApiVersionsRequest from 3 to 4. Clients which send v4 promise to be able to handle ranges including 0. Clients which send v3 will not be exposed to these ranges. The feature will show up as having a minimum version of 1 instead. This work is part
of KIP-1022.

Similarly, this PR also introduces a new version of BrokerRegistrationRequest, and specifies that the
older versions of that RPC cannot handle supported version ranges including 0. Therefore, 0 is translated to 1 in the older requests.

Reviewers: Jun Rao <junrao@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>, Justine Olshan <jolshan@confluent.io>
2024-07-10 16:10:25 -07:00
Omnia Ibrahim 25d775b742
KAFKA-15853 Refactor ShareGroupConfig with AbstractConfig (#16506)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-11 01:37:50 +08:00
Chung, Ming-Yen 9b02260ac6
KAFKA-17090 Add reminder to CreateTopicsResult#config for null values of type and documentation (#16539)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-11 01:32:50 +08:00
NICOLAS GUYOMAR 406d4c5ef9
MINOR: Log client idle disconnect events at DEBUG level (#16089)
We recently increased that log verbosity from INFO to DEBUG, but this is causing some confusion among users when an idle connection is disconnected, which is a normal activity. This makes it clear that the connection being disconnect has expired

Reviewers: Dimitar Dimitrov <ddimitrov@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-07-10 09:47:54 -07:00
Christo Lolov f369771bf2
KAFKA-16851: Add remote.log.disable.policy (#16132)
Add a remote.log.disable.policy on a topic-level only as part of KIP-950

Reviewers: Kamal Chandraprakash <kchandraprakash@uber.com>, Luke Chen <showuon@gmail.com>, Murali Basani <muralidhar.basani@aiven.io>
2024-07-10 11:18:48 +08:00
bachmanity1 cfc7bb90ae
KAFKA-16345: Optionally URL-encode clientID and clientSecret in authorization header (#15475)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Kirk True <kirk@kirktrue.pro>
2024-07-09 18:53:45 +02:00
Ken Huang 5608b5cc3a
KAFKA-16684: Remove caching in FetchResponse.responseData (#16532)
The response data should change accordingly to the input, however
with the current design, it will not change even if the input changes.
We remove this cache logic to avoid returning wrong data.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>, Igor Soarez <soarez@apple.com>
2024-07-09 15:46:22 +01:00
Kuan-Po Tseng a533e246e3
KAFKA-17081 Tweak GroupCoordinatorConfig: re-introduce local attributes and validation (#16524)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-08 01:15:18 +08:00
Arun Mathew 79244fe235
KAFKA-13403 Fix broker crash due to race condition when deleting topics (#11438)
Update the walkFileTree override implementation to handle parallel file deletion.
So as to prevent crashing of Kafka broker process when logs are deleted by
other threads due to retention expiry.

Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Igor Soarez <soarez@apple.com>
2024-07-06 08:18:35 +01:00
Nancy b4d77e047b
KAFKA-4374 Update log message for clarity metadata response errors (#16508)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-05 15:15:22 +08:00
José Armando García Sancio 9f7afafefe
KAFKA-16529; Implement raft response handling (#16454)
This change implements response handling for the new version of Vote, Fetch, FetchSnapshot, BeginQuorumEpoch and EndQuorumEpoch. All of these responses were extended to include the leader's endpoint when the leader is known.

This change also includes sending the new version of the requests for Vote, Fetch, FetchSnapshot, BeginQuorumEpoch and EndQuorumEpoch. The two most notable changes are that:
1.  The leader is going to include all of its endpoints in the BeginQuorumEpoch and EndQuorumEpoch requests.
2. The replica is going to include the destination replica key for the Vote and BeginQuorumEpoch request.

QuorumState was extended so that the replica transitions to UnattachedState instead of FollowerState during startup, if the leader is known but the leader's endpoint is not known. This can happen if the known leader is not part of the voter set replicated by the replica. The expectation is that the replica will rediscover the leader from Fetch responses from the bootstrap servers or from the BeginQuorumEpoch request from the leader.

To make sure that replicas never forget the leader of a given epoch the unattached state was extended to allow an optional leader id for when the leader is known but the leader's endpoint is not known.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-07-03 16:52:35 -04:00
Chris Egerton 27220d146c
KAFKA-10816: Add health check endpoint for Kafka Connect (#16477)
Reviewers: Greg Harris <gharris1727@gmail.com>
2024-07-03 14:15:15 -04:00
TingIāu "Ting" Kì c97d4ce026
KAFKA-17032 NioEchoServer should generate meaningful id instead of incremental number (#16460)
Reviewers: Greg Harris <greg.harris@aiven.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-07-03 03:28:42 +08:00
Ken Huang 2bc5e01121
KAFKA-17051 ApiKeys#toHtml should exclude the APIs having unstable latest version (#16480)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-03 02:52:14 +08:00
abhi-ksolves 6897b06b03
KAFKA-3346 Rename Mode to ConnectionMode (#16403)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-07-03 02:46:04 +08:00
David Jacot 9a78122fb0
MINOR: Refactor GroupMetadataManager#consumerGroupHeartbeat and GroupMetadataManager#classicGroupJoinToConsumerGroup (#16371)
This patch is an attempt to simplifying GroupMetadataManager#consumerGroupHeartbeat and GroupMetadataManager#classicGroupJoinToConsumerGroup by sharing more of the common logic. It slightly change how static members are replaced too. Now, we generate the records to replace the member and then we update the member if needed.

Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Justine Olshan <jolshan@confluent.io>
2024-06-30 23:16:52 -07:00
NICOLAS GUYOMAR 3c4b1db93b
MINOR: Fix Trace log typo in RecordAccumulator.java (#16487) 2024-06-30 00:49:07 +08:00
brenden20 836f52b0ba
KAFKA-16000: Migrated MembershipManagerImplTest away from ConsumerTestBuilder (#16312)
Finishing migration of MembershipManagerImplTest away from ConsumerTestBuilder and removed all spy objects.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Philip Nee <pnee@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2024-06-28 12:13:27 -07:00
José Armando García Sancio 9be27e715a
MINOR; Fix incompatible change to the kafka config (#16464)
Prior to KIP-853, users were not allow to enumerate listeners specified in `controller.listener.names` in the `advertised.listeners`. This decision was made in 3.3 because the `controller.quorum.voters` property is in effect the list of advertised listeners for all of the controllers.

KIP-853 is moving away from `controller.quorum.voters` in favor of a dynamic set of voters. This means that the user needs to have a way of specifying the advertised listeners for controller.

This change allows the users to specify listener names in `controller.listener.names` in `advertised.listeners`. To make this change forwards compatible (use a valid configuration from 3.8 in 3.9), the controller's advertised listeners are going to get computed by looking up the endpoint in `advertised.listeners`. If it doesn't exist, the controller will look up the endpoint in the `listeners` configuration.

This change also includes a fix the to the BeginQuorumEpoch request where the default value for VoterId was 0 instead of -1.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-06-27 21:24:25 -04:00
Colin Patrick McCabe ebaa108967
KAFKA-16968: Introduce 3.8-IV0, 3.9-IV0, 3.9-IV1
Create 3 new metadata versions:

- 3.8-IV0, for the upcoming 3.8 release.
- 3.9-IV0, to add support for KIP-1005.
- 3.9-IV1, as the new release vehicle for KIP-966.

Create ListOffsetRequest v9, which will be used in 3.9-IV0 to support KIP-1005. v9 is currently an unstable API version.

Reviewers: Jun Rao <junrao@gmail.com>, Justine Olshan <jolshan@confluent.io>
2024-06-27 14:03:03 -07:00
Ken Huang 9b4f13efbc
KAFKA-15623 Remove junit 4 from stream module (#16447)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-27 15:11:32 +08:00
José Armando García Sancio adee6f0cc1
KAFKA-16527; Implement request handling for updated KRaft RPCs (#16235)
Implement request handling for the updated versions of the KRaft RPCs (Fetch, FetchSnapshot, Vote,
BeginQuorumEpoch and EndQuorumEpoch). This doesn't add support for KRaft replicas to send the new
version of the KRaft RPCs. That will be implemented in KAFKA-16529.

All of the RPCs responses were extended to include the leader's endpoint for the listener of the
channel used in the request. EpochState was extended to include the leader's endpoint information
but only the FollowerState and LeaderState know the leader id and its endpoint(s).

For the Fetch request, the replica directory id was added. The leader now tracks the follower's log
end offset using both the replica id and replica directory id.

For the FetchSnapshot request, the replica directory id was added. This is not used by the KRaft
leader and it is there for consistency with Fetch and for help debugging.

For the Vote request, the replica key for both the voter (destination) and the candidate (source)
were added. The voter key is checked for consistency. The candidate key is persisted when the vote
is granted.

For the BeginQuorumEpoch request, all of the leader's endpoints are included. This is needed so
that the voters can return the leader's endpoint for all of the supported listeners.

For the EndQuorumEpoch request, all of the leader's endpoints are included. This is needed so that
the voters can return the leader's endpoint for all of the supported listeners. The successor list
has been extended to include the directory id. Receiving voters can use the entire replica key when
searching their position in the successor list.

Updated the existing test in KafkaRaftClientTest and KafkaRaftClientSnapshotTest to execute using
both the old version and new version of the RPCs.

Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2024-06-25 13:45:15 -07:00
Chia Chuan Yu 5b0e96d785
KAFKA-17034 Tweak some descriptions in FeatureUpdate (#16448)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-26 03:05:34 +08:00
Andrew Schofield 63304fb6e5
KAFKA-17028: FindCoordinator v6 initial implementation (#16440)
KIP-932 introduces FindCoordinator v6 for finding share coordinators. The initial implementation:

Checks that share coordinators are only requested with v6 or above.
Share coordinator requests are authorized as cluster actions (this is for inter-broker use only)
Responds with COORDINATOR_NOT_AVAILABLE because share coordinators are not yet available.
When the share coordinator code is delivered, the request handling will be gated by configurations which enable share groups and the share coordinator specifically. If these are not enabled, COORDINATOR_NOT_AVAILABLE is the response.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2024-06-25 21:13:16 +05:30
Alieh Saeedi f4cbf71ea6
KAFKA-16965: Throw cause of TimeoutException (#16344)
Add the cause of TimeoutException for Producer send() errors.

Reviewers: Matthias J. Sax <matthias@confluent.io>,  Lianet Magrans <lianetmr@gmail.com>, Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-06-24 14:51:27 -07:00
TingIāu "Ting" Kì d78ff06476
KAFKA-16967 NioEchoServer fails to register connection and causes flaky failure. (#16384)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-24 21:39:40 +08:00
Ken Huang 8a109d87d1
KAFKA-17009 Add unit test to query nonexistent replica by describeReplicaLogDirs (#16423)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-23 16:25:57 +08:00
gongxuanzhang e294ea433c
KAFKA-17012 Enable CONSUMER protocol for some tests of KafkaConsumerTest (#16415)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-21 21:58:24 +08:00
gongxuanzhang 80f31224aa
KAFKA-10787 Apply spotless to `clients` module (#16393)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-20 17:43:25 +08:00
Chia Chuan Yu 4ff83dc733
KAFKA-16957 Enable KafkaConsumerTest#configurableObjectsShouldSeeGeneratedClientId to work with CLASSIC and CONSUMER (#16370)
Reviewers: Kirk True <kirk@kirktrue.pro>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-19 09:35:10 +08:00
Lianet Magrans 8199290500
MINOR: consumer log fixes (#16345)
Reviewers: Kirk True <kirk@kirktrue.pro>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-18 22:51:33 +08:00
dujian0068 823d6f7555
KAFKA-16958 add STRICT_STUBS to EndToEndLatencyTest, OffsetCommitCallbackInvokerTest, ProducerPerformanceTest, and TopologyTest (#16348)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-18 18:51:43 +08:00
Lianet Magrans 6c4e777079
KAFKA-16954: fix consumer close to release assignment in background (#16343)
This PR fixes consumer close to avoid updating the subscription state object in the app thread. Now the close simply triggers an UnsubscribeEvent that is handled in the background to trigger callbacks, clear assignment, and send leave heartbeat. Note that after triggering the event, the unsubscribe will continuously process background events until the event completes, to ensure that it allows for callbacks to run in the app thread.
The logic around what happens if the unsubscribe fails remain unchanged: close will log, keep the first exception and carry on.

It also removes the redundant LeaveOnClose event (it used to do the the exact same thing as the UnsubscribeEvent, both calling membershipMgr.leaveGroup).

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-06-17 21:27:33 +02:00
Dongnuo Lyu 21d60eabab
KAFKA-16673; Simplify `GroupMetadataManager#toTopicPartitions` by using `ConsumerProtocolSubscription` instead of `ConsumerPartitionAssignor.Subscription` (#16309)
In `GroupMetadataManager#toTopicPartitions`, we generate a list of `ConsumerGroupHeartbeatRequestData.TopicPartitions` from the input deserialized subscription. Currently the input subscription is `ConsumerPartitionAssignor.Subscription`, where the topic partitions are stored as (topic-partition) pairs, whereas in `ConsumerGroupHeartbeatRequestData.TopicPartitions`, we need the topic partitions to be stored as (topic-partition list) pairs.

`ConsumerProtocolSubscription` is an intermediate data structure in the deserialization where the topic partitions are stored as (topic-partition list) pairs. This pr uses `ConsumerProtocolSubscription` instead as the input subscription to make `toTopicPartitions` more efficient. 

Reviewers: David Jacot <djacot@confluent.io>
2024-06-17 02:47:52 -07:00
Chia Chuan Yu 768e90f667
KAFKA-16669 Remove extra collection copy when generating DescribeAclsResource (#15924)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-17 14:47:44 +08:00
Gaurav Narula 4a37c2e18f
KAFKA-16219 set SO_TIMEOUT in EchoServer (#16354)
We observed some runs of the test suite caused CI pipelines to stall.

A thread dump revealed that the test runner was blocked trying to read from a
socket, while attempting to close the socket [[0]]. It turns out this is
due to a bug in JDK which is very similar to JDK-8274524, but it affects
the else branch of `SSLSocketImpl::bruteForceCloseInput` [[1]] which wasn't
fixed in JDK-8274524.

Since the blocking happens in a native call, the test runner's timeouts have
no effect as the blocked test runner thread doesn't seem to respond to
interrupts.

As a mitigation in Kafka's test suite, this change adds `SO_TIMEOUT` of
30 seconds to all the TLS sockets handled by `EchoServer`. The timeout is
reasonably high for tests and a finite upper bound avoids infinite
blocking of the test suite.

[0]: https://issues.apache.org/jira/secure/attachment/13066427/timeout.log
[1]: 890adb6410/src/java.base/share/classes/sun/security/ssl/SSLSocketImpl.java (L808)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-17 14:11:37 +08:00
Andrew Schofield fecbfb8133
KAFKA-16950: Define Persister interfaces and RPCs (#16335)
Define the interfaces and RPCs for share-group persistence. (KIP-932). This PR is just RPCs and interfaces to allow building of the broker components which depend upon them. The implementation will follow in subsequent PRs.

Reviewers:  Manikumar Reddy <manikumar.reddy@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
2024-06-15 20:52:49 +05:30
gongxuanzhang 3a9d877686
MINOR: refactor BuiltInPartitioner to remove mockRandom from production code (#16277)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-15 12:18:42 +08:00
Kirk True 8f86b9c4ec
KAFKA-16637 AsyncKafkaConsumer removes offset fetch responses from cache too aggressively (#16310)
Allow the committed offsets fetch to run for as long as needed. This handles the case where a user invokes Consumer.poll() with a very small timeout (including zero).

Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-15 08:48:53 +08:00
TingIāu "Ting" Kì 09bc5be63e
KAFKA-16946: Utils.getHost/getPort cannot parse SASL_PLAINTEXT://host:port (#16319)
In previous PR(#16048), I mistakenly excluded the underscore (_) from the set of valid characters for the protocol,
resulting in the inability to correctly parse the connection string for SASL_PLAINTEXT. This bug fix addresses the
issue and includes corresponding tests.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Luke Chen <showuon@gmail.com>
2024-06-14 13:07:11 -07:00
TingIāu "Ting" Kì 4e2f26bfc6
KAFKA-16917 DescribeTopicsResult should use mutable map in order to keep compatibility (#16250)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-14 23:48:35 +08:00
Omnia Ibrahim e99da2446c
KAFKA-15853: Move KafkaConfig.configDef out of core (#16116)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-14 17:26:00 +02:00
Lianet Magrans 46714dbaed
KAFKA-16933: New consumer unsubscribe close commit fixes (#16272)
Fixes for the leave group flow (unsubscribe/close):

Fix to send Heartbeat to leave group on close even if the callbacks fail
fix to ensure that if a member gets fenced while blocked on callbacks (ex. on unsubscribe), it will clear its epoch to not include it in commit requests
fix to avoid race on the subscription state object on unsubscribe, updating it only on the background thread when the callbacks to leave complete (success or failure).
Also improving logging in this area.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Philip Nee <pnee@confluent.io>
2024-06-14 13:03:58 +02:00
Kuan-Po (Cooper) Tseng 888a177603
KAFKA-12708 Rewrite org.apache.kafka.test.Microbenchmarks by JMH (#16231)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-14 16:47:34 +08:00
dujian0068 133f2b0f31
KAFKA-16879 SystemTime should use singleton mode (#16266)
Reviewers: Greg Harris <gharris1727@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-14 08:49:19 +08:00
brenden20 a0b716ec9f
KAFKA-16001: Migrated ConsumerNetworkThreadTest away from ConsumerTestBuilder (#16140)
Completely migrates ConsumerNetworkThreadTest away from ConsumerTestBuilder and removes all usages of spy objects and replaced with mocks. Removes testEnsureMetadataUpdateOnPoll() since it was doing integration testing. Also I adds new tests to get more complete test coverage of ConsumerNetworkThread.

Reviewers: Kirk True <kirk@kirktrue.pro>, Lianet Magrans <lianetmr@gmail.com>, Philip Nee <pnee@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2024-06-13 13:35:36 -07:00
gongxuanzhang 596b945072
KAFKA-16643 Add ModifierOrder checkstyle rule (#15890)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-13 15:39:32 +08:00
brenden20 e59c887bfd
KAFKA-16557 Implemented OffsetFetchRequestState toStringBase and added a test for it (#16291)
Reviewers: Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-13 15:30:05 +08:00
TingIāu "Ting" Kì dd6fcc650e
KAFKA-16901 Add unit tests for ConsumerRecords#records(String) (#16227)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-13 14:35:33 +08:00
Lianet Magrans fe98888960
MINOR: Improving log for outstanding requests on close and cleanup (#16304)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-13 14:31:16 +08:00
Gantigmaa Selenge 6d1f8f8727
MINOR: Clean up for KafkaAdminClientTest (#16285)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-13 09:42:39 +08:00
Ivan Yurchenko dd755b7ea9
KAFKA-8206: KIP-899: Allow client to rebootstrap (#13277)
This commit implements KIP-899: Allow producer and consumer clients to rebootstrap. It introduces the new setting `metadata.recovery.strategy`, applicable to all the types of clients.

Reviewers: Greg Harris <gharris1727@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
2024-06-12 20:48:32 +01:00
Edoardo Comar 615e6e705c
KAFKA-16570 FenceProducers API returns "unexpected error" when succes… (#16229)
KAFKA-16570 FenceProducers API returns "unexpected error" when successful

* Client handling of ConcurrentTransactionsException as retriable
* Unit test
* Integration test

Reviewers: Chris Egerton <chrise@aiven.io>, Justine Olshan <jolshan@confluent.io>
2024-06-12 17:07:33 +01:00
Gantigmaa Selenge 9368ef81b5
KAFKA-16865: Add IncludeTopicAuthorizedOperations option for DescribeTopicPartitionsRequest (#16136)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Calvin Liu <caliu@confluent.io>, Andrew Schofield <andrew_schofield@live.com>, Apoorv Mittal <amittal@confluent.io>
2024-06-12 17:04:24 +02:00
David Jacot 638844f833
KAFKA-16770; [2/2] Coalesce records into bigger batches (#16215)
This patch is the continuation of https://github.com/apache/kafka/pull/15964. It introduces the records coalescing to the CoordinatorRuntime. It also introduces a new configuration `group.coordinator.append.linger.ms` which allows administrators to chose the linger time or disable it with zero. The new configuration defaults to 10ms.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-06-11 23:29:50 -07:00
Murali Basani 226ac5e8fc
KAFKA-16922 Adding unit tests for NewTopic (#16255)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-12 11:38:50 +08:00
Nikolay aecaf44475
KAFKA-16520: Support KIP-853 in DescribeQuorum (#16106)
Add support for KIP-953 KRaft Quorum reconfiguration in the DescribeQuorum request and response.
Also add support to AdminClient.describeQuorum, so that users will be able to find the current set of
quorum nodes, as well as their directories, via these RPCs.

Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Andrew Schofield <aschofield@confluent.io>
2024-06-11 10:01:35 -07:00
Gyeongwon, Do 1426e8e920
KAFKA-16764: New consumer should throw InvalidTopicException on poll when invalid topic in metadata. (#16043)
Propagate metadata error from background thread to application thread.
So, this fix ensures that metadata errors are thrown to the user on consumer.poll()

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Philip Nee <pnee@confluent.io>, Lianet Magrans <lianetmr@gmail.com>
2024-06-10 18:30:29 +02:00
Apoorv Mittal 96036aee85
KAFKA-16916: Fixing error in completing future (#16249)
Fix to complete Future which was stuck due the exception.getCause() throws an error.

The fix in the #16217 unblocked blocking thread but exception in catch block from blocking get call was wrapped in ExecutionException which is not the case when moved to async workflow hence getCause is not needed.

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-08 14:25:54 +08:00
Okada Haruki d13a693ea7
KAFKA-16916: Disable blocking test temporarily (#16248)
The test runs forever. We disable the test temporarily to unblock CI

Reviewers: Luke Chen <showuon@gmail.com>
2024-06-08 09:09:00 +08:00
NICOLAS GUYOMAR eb352828ed
MINOR: Remove AK 1.0.0 reference from NetworkClient.java (#16223)
Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-08 06:54:09 +08:00
Phuc-Hong-Tran 5cd6944eaa
KAFKA-16493: Avoid unneeded subscription regex check if metadata version unchanged (#15869)
This PR includes changes for AsyncKafkaConsumer to avoid evaluating the subscription regex on every poll if metadata hasn't changed. The metadataVersionSnapshot was introduced to identify whether metadata has changed or not, if yes then the current subscription regex will be evaluated.

This is the same mechanism used by the LegacyConsumer.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Matthias J. Sax <matthias@confluent.io>
2024-06-07 10:30:07 -07:00
Kirk True d6cd83e2fb
KAFKA-16200: Enforce that RequestManager implementations respect user-provided timeout (#16031)
Improve consistency and correctness for user-provided timeouts at the Consumer network request layer, per the Java client Consumer timeouts design (https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts). While the changes introduced in KAFKA-15974 enforce timeouts at the Consumer's event layer, this change enforces timeouts at the network request layer.

The changes mostly fit into the following areas:

1. Create shared code and idioms so timeout handling logic is consistent across current and future RequestManager implementations
2. Use deadlineMs instead of expirationMs, expirationTimeoutMs, retryExpirationTimeMs, timeoutMs, etc.
3. Update "preemptive pruning" to remove expired requests that have had at least one attempt

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2024-06-07 09:53:27 +02:00
brenden20 247132569e
KAFKA-16912 Migrate ConsumerNetworkThreadTest.testPollResultTimer() to NetworkClientDelegateTest (#16234)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-07 14:39:33 +08:00
Apoorv Mittal c01279b92a
KAFKA-16905: Fix blocking DescribeCluster call in AdminClient DescribeTopics (#16217)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, David Arthur <mumrah@gmail.com>
2024-06-06 18:11:43 -04:00
Mickael Maison 79ea7d6122
MINOR: Various cleanups in clients (#16193)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-06 20:28:39 +02:00
Lianet Magrans b74b182841
KAFKA-16786: Remove old assignment strategy usage in new consumer (#16214)
Remove usage of the partition.assignment.strategy config in the new consumer. This config is deprecated with the new consumer protocol, so the AsyncKafkaConsumer should not use or validate the property.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-06-06 09:45:36 +02:00
Kamal Chandraprakash 02c794dfd3
KAFKA-15776: Introduce remote.fetch.max.timeout.ms to configure DelayedRemoteFetch timeout (#14778)
KIP-1018, part1, Introduce remote.fetch.max.timeout.ms to configure DelayedRemoteFetch timeout

Reviewers: Luke Chen <showuon@gmail.com>
2024-06-05 14:42:23 +08:00
Colin P. McCabe 9ceed8f18f KAFKA-16535: Implement AddVoter, RemoveVoter, UpdateVoter RPCs
Implement the add voter, remove voter, and update voter RPCs for
KIP-853. This is just adding the RPC handling; the current
implementation in RaftManager just throws UnsupportedVersionException.

Reviewers: Andrew Schofield <aschofield@confluent.io>, José Armando García Sancio <jsancio@apache.org>
2024-06-04 14:07:48 -07:00
TingIāu "Ting" Kì 8b3c77c671
KAFKA-15305 The background thread should try to process the remaining task until the shutdown timer is expired. (#16156)
Reviewers: Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-05 04:21:20 +08:00
Chris Egerton 2b4779840c
MINOR: Fix return tag on Javadocs for consumer group-related Admin methods (#16197)
Reviewers: Greg Harris <greg.harris@aiven.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-04 15:04:34 -04:00
Edoardo Comar eea9ebf8a3
KAFKA-16047: Use REQUEST_TIMEOUT_MS_CONFIG in AdminClient.fenceProducers (#16151)
Use REQUEST_TIMEOUT_MS_CONFIG in AdminClient.fenceProducers, 
or options.timeoutMs if specified, as transaction timeout.

No transaction will be started with this timeout, but
ReplicaManager.appendRecords uses this value as its timeout.
Use REQUEST_TIMEOUT_MS_CONFIG like a regular producer append
to allow for replication to take place.

Co-Authored-By: Adrian Preston <prestona@uk.ibm.com>
2024-06-04 11:45:11 +01:00
Sanskar Jhajharia e4ff54073a
MINOR: Code Cleanup (Clients Module) (#16049)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-04 16:49:59 +08:00
Andrew Schofield c3a1bef429
KAFKA-16715: Create KafkaShareConsumer interfaces (#16134)
This PR introduces ShareConsumer and KafkaShareConsumer. It is focused entirely on the minimal additions required to introduce the external programming interfaces.

Reviewers: Apoorv Mittal <amittal@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
2024-06-04 09:05:59 +05:30
GANESH SADANALA 04e6ef4750
KAFKA-15156 Update cipherInformation correctly in DefaultChannelMetadataRegistry (#16169)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-04 00:32:47 +08:00
Andrew Schofield 8f82f14a48
KAFKA-16713: Define initial set of RPCs for KIP-932 (#16022)
This PR defines the initial set of RPCs for KIP-932. The RPCs for the admin client and state management are not in this PR.

Reviewers: Apoorv Mittal <amittal@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
2024-06-03 11:52:35 +05:30
TingIāu "Ting" Kì ca9f4aeda7
KAFKA-16639 Ensure HeartbeatRequestManager generates leave request regardless of in-flight heartbeats. (#16017)
Fix the bug where the heartbeat is not sent when a newly created consumer is immediately closed.

When there is a heartbeat request in flight and the consumer is then closed. In the current code, the HeartbeatRequestManager does not correctly send the closing heartbeat because a previous heartbeat request is still in flight. However, the closing heartbeat is only sent once, so in this situation, the broker will not know that the consumer has left the consumer group until the consumer's heartbeat times out.
This situation causes the broker to wait until the consumer's heartbeat times out before triggering a consumer group rebalance, which in turn affects message consumption.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-01 04:14:15 +08:00
TingIāu "Ting" Kì 0971924ebc
KAFKA-16824: Utils.getHost and Utils.getPort do not catch a lot of invalid host and ports. (#16048)
Modify regex of HOST_PORT_PATTERN to prevent malformed hosts and ports.

Reviewers: Luke Chen <showuon@gmail.com>
2024-05-31 16:50:27 +08:00
Lianet Magrans eb39031cd0
KAFKA-16766: offset fetch timeout exception in new consumer consistent with legacy (#16125)
* Timeout exception fetching offsets

* Tests
2024-05-31 10:33:20 +02:00
Alyssa Huang a8e99eb969
KAFKA-16833 Fixing PartitionInfo and Cluster equals and hashCode (#16062)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-31 05:00:42 +08:00
Ahmed Najiub 33a292e4dd
MINOR: Adds a test case to test that an exception is thrown in invalid ports (#16112)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-31 04:38:23 +08:00
Justine Olshan 5e3df22095
KAFKA-16308 [1/N]: Create FeatureVersion interface and add `--feature` flag and handling to StorageTool (#15685)
As part of KIP-1022, I have created an interface for all the new features to be used when parsing the command line arguments, doing validations, getting default versions, etc.

I've also added the --feature flag to the storage tool to show how it will be used.

Created a TestFeatureVersion to show an implementation of the interface (besides MetadataVersion which is unique) and added tests using this new test feature.

I will add the unstable config and tests in a followup.

Reviewers: David Mao <dmao@confluent.io>, David Jacot <djacot@confluent.io>, Artem Livshits <alivshits@confluent.io>, Jun Rao <junrao@apache.org>
2024-05-29 16:36:06 -07:00
Mickael Maison 3f3f3ac155
MINOR: Delete KafkaSecurityConfigs class (#16113)
Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-30 05:55:24 +08:00
Eugene Mitskevich 862ea12cd7
MINOR: Fix rate metric spikes (#15889)
Rate reports value in the form of sumOrCount/monitoredWindowSize. It has a bug in monitoredWindowSize calculation, which leads to spikes in result values.

Reviewers: Jun Rao <junrao@gmail.com>
2024-05-29 13:14:37 -07:00
Frederik Rouleau 4eb60b5104
KAFKA-16507 Add KeyDeserializationException and ValueDeserializationException with record content (#15691)
Implements KIP-1036.

Add raw ConsumerRecord data to RecordDeserialisationException to make DLQ implementation easier.

Reviewers: Kirk True <ktrue@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2024-05-28 14:56:47 -07:00
Lianet Magrans 0143c72e50
KAFKA-16815: Handle FencedInstanceId in HB response (#16047)
Handle FencedInstanceIdException that a consumer may receive in the heartbeat response. This will be the case when a static consumer is removed from the group by and admin client, and another member joins with the same group.instance.id (allowed in). The first member will receive a FencedInstanceId on its next heartbeat. The expectation is that this should be handled as a fatal error.

There are no actual changes in logic with this PR, given that without being handled, the FencedInstanceId was being treated as an "unexpected error", which are all treated as fatal errors, so the outcome remains the same. But we're introducing this small change just for accuracy in the logic and the logs: FencedInstanceId is expected during heartbeat, a log line is shown describing the situation and why it happened (and it's treated as a fatal error, just like it was before this PR).

This PR also improves the test to ensure that the error propagated to the app thread matches the one received in the HB.

Reviewers: Andrew Schofield <aschofield@confluent.io>, David Jacot <djacot@confluent.io>
2024-05-24 05:19:43 -07:00
Kirk True a98c9be6b0
KAFKA-15974: Enforce that event processing respects user-provided timeout (#15640)
The intention of the CompletableApplicationEvent is for a Consumer to enqueue the event and then block, waiting for it to complete. The application thread will block up to the amount of the timeout. This change introduces a consistent manner in which events are expired out by checking their timeout values.

The CompletableEventReaper is a new class that tracks CompletableEvents that are enqueued. Both the application thread and the network I/O thread maintain their own reaper instances. The application thread will track any CompletableBackgroundEvents that it receives and the network I/O thread will do the same with any CompletableApplicationEvents it receives. The application and network I/O threads will check their tracked events, and if any are expired, the reaper will invoke each event's CompletableFuture.completeExceptionally() method with a TimeoutException.

On closing the AsyncKafkaConsumer, both threads will invoke their respective reapers to cancel any unprocessed events in their queues. In this case, the reaper will invoke each event's CompletableFuture.completeExceptionally() method with a CancellationException instead of a TimeoutException to differentiate the two cases.

The overall design for the expiration mechanism is captured on the Apache wiki and the original issue (KAFKA-15848) has more background on the cause.

Note: this change only handles the event expiration and does not cover the network request expiration. That is handled in a follow-up Jira (KAFKA-16200) that builds atop this change.

This change also includes some minor refactoring of the EventProcessor and its implementations. This allows the event processor logic to focus on processing individual events rather than also the handling of batches of events.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Philip Nee <pnee@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2024-05-22 18:11:10 +02:00
Mickael Maison affe8da54c
KAFKA-7632: Support Compression Levels (KIP-390) (#15516)
Reviewers: Jun Rao <jun@confluent.io>,  Luke Chen <showuon@gmail.com>
Co-authored-by: Lee Dongjin <dongjin@apache.org>
2024-05-21 17:58:49 +02:00
vamossagar12 476d323f5a
KAFKA-16197: Print Connect worker specific logs on poll timeout expiry (#15305)
Reviewers: Luke Chen <showuon@gmail.com>, Greg Harris <greg.harris@aiven.io>
2024-05-20 17:00:04 -07:00
Kuan-Po (Cooper) Tseng 92489995f3
KAFKA-16544 DescribeTopicsResult#allTopicIds and DescribeTopicsResult#allTopicNames should return null instead of throwing NPE (#15979)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-18 03:41:10 +08:00
Johnny Hsu 9896d07997
MINOR: Add debug enablement check when using log.debug (#15977)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-18 02:53:28 +08:00
Kirk True f9db4fa19c
KAFKA-16787: Remove TRACE level logging from AsyncKafkaConsumer hot path (#15981)
Removed unnecessarily verbose logging.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-05-17 08:57:23 +02:00
Andrew Schofield bb3ff0f84a
KAFKA-16759: Handle telemetry push response while terminating (#15957)
When client telemetry is configured in a cluster, Kafka producers and consumers push metrics to the brokers periodically. There is a special push of metrics that occurs when the client is terminating. A state machine in the client telemetry reporter controls its behaviour in different states.

Sometimes, when a client was terminating, it was attempting an invalid state transition from TERMINATING_PUSH_IN_PROGRESS to PUSH_NEEDED when it receives a response to a PushTelemetry RPC. This was essentially harmless because the state transition did not occur but it did cause unsightly log lines to be generated. This PR performs a check for the terminating states when receiving the response and simply remains in the current state.

I added a test to validate the state management in this case. Actually, the test passes before the code change in the PR, but with unsightly log lines.


Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>,  Apoorv Mittal <amittal@confluent.io>
2024-05-15 21:40:34 +05:30
Kirk True 3f8d11f047
KAFKA-16577: New consumer fails with stop within allotted timeout in consumer_test.py system test (#15784)
The KafkaAsyncConsumer would occasionally fail to stop when wakeup() was invoked. It turns out that there's a race condition between the thread that invokes wakeup() and the thread that is performing an action on the consumer. If the operation's Future is already completed by thread A when thread B invoke's completeExceptionally() inside wakeup(), the WakeupException will be ignored. We should use the return value from completeExceptionally() to determine if that call actually triggered completion of the Future. If that method returns false, that signals that the Future was already completed, and the exception we passed to completeExceptionally() was ignored. Therefore, we then need to return a new WakeupFuture instead of null so that the next call to setActiveTask() will throw the WakeupException.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-05-15 11:14:31 +02:00
flashmouse f0291ac74b
KAFKA-15170: Fix rack-aware assignment in AbstractStickyAssignor (#13965)
This PR fixes two kinds of bugs in the new(ish) rack-aware part of the sticky assignment algorithm:

First, when reassigning "owned partitions" to their previous owners, we now have to take the rack placement into account and might not immediately assign a previously-owned partition to its old consumer during this phase. There is a small chance this partition will be assigned to its previous owner during a later stage of the assignment, but if it's not then by definition it has been "revoked" and must be removed from the assignment during the adjustment phase of the CooperativeStickyAssignor according to the cooperative protocol. We need to make sure any partitions removed in this way end up in the "partitionsTransferringOwnership".

Second, the sticky algorithm works in part by keeping track of how many consumers are still "unfilled" when they are at the "minQuota", meaning we may need to assign one more partition to get to the expected number of consumers at the "maxQuota". During the rack-aware round-robin assignment phase, we were not properly clearing the set of unfilled & minQuota consumers once we reached the expected number of "maxQuota" consumers (since by definition that means no more minQuota consumers need to or can be given any more partitions since that would bump them up to maxQuota and exceed the expected count). This bug would result in the entire assignment being failed due to a correctness check at the end which verifies that the "unfilled members" set is empty before returning the assignment. An IllegalStateException would be thrown, failing the rebalancing and sending the group into an endless rebalancing loop until/unless it was lucky enough to produce a new assignment that didn't hit this bug

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2024-05-14 17:06:05 -07:00
Ivan Vaskevych 2958dcb919
KAFKA-13115; Update doSend doc about possible blocking (#11023)
Reviewers: José Armando García Sancio <jsancio@apache.org>
2024-05-14 10:50:56 -04:00
José Armando García Sancio 440f5f6c09
MINOR; Validate at least one control record (#15912)
Validate that a control batch in the batch accumulator has at least one control record.

Reviewers: Jun Rao <junrao@apache.org>, Chia-Ping Tsai <chia7712@apache.org>
2024-05-14 10:02:29 -04:00
Mickael Maison 0587a9af3d
MINOR: Various cleanups in clients tests (#15877)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>,  Lianet Magrans <lianetmr@gmail.com>
2024-05-14 10:19:00 +02:00
Lianet Magrans e18f61ce46
KAFKA-16695: Improve expired poll logging (#15909)
Improve consumer log for expired poll timer, by showing how much time was the max.poll.interval.ms exceeded. This should be helpful in guiding the user to tune that config on the common case of long-running processing causing the consumer to leave the group. Inspired by other clients that log such information on the same situation.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias Sax <mjsax@apache.org>, 
Andrew Schofield <andrew_schofield@live.com>, Kirk True <kirk@kirktrue.pro>
2024-05-13 18:04:54 -07:00
Lianet Magrans ea485a7061
KAFKA-16665: Allow to initialize newly assigned partition's positions without allowing fetching while callback runs (#15856)
Fix to allow to initialize positions for newly assigned partitions, while the onPartitionsAssigned callback is running, even though the partitions remain non-fetchable until the callback completes.

Before this PR, we were not allowing initialization or fetching while the callback was running. The fix here only allows to initialize the newly assigned partition position, and keeps the existing logic for making sure that the partition remains non-fetchable until the callback completes.

The need for this fix came out in one of the connect system tests, that attempts to retrieve a newly assigned partition position with a call to consumer.position from within the onPartitionsAssigned callback (WorkerSinkTask). With this PR, we allow to make such calls (test added), which is the behaviour of the legacy consumer.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2024-05-07 10:40:00 +02:00
Andrew Schofield 4c4ae6e39c
KAFKA-16608 Honour interrupted thread state on KafkaConsumer.poll (#15803)
The contract of KafkaConsumer.poll(Duration) says that it throws InterruptException "if the calling thread is interrupted before or while this function is called". The new KafkaConsumer implementation was not doing this if the thread was interrupted before the poll was called, specifically with a very short timeout. If it ever waited for records, it did check the thread state. If it did not wait for records because of a short timeout, it did not.

Some of the log messages in the code erroneously mentioned timeouts, when they really meant interruption.

Also adds a test for this specific scenario.

Reviewers: Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-07 08:22:41 +08:00
Okada Haruki 5c96ad61d9
KAFKA-16393 read/write sequence of buffers correctly (#15571)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-06 19:11:04 +08:00
Chia Chuan Yu 55a00be4e9
MINOR: Replaced Utils.join() with JDK API. (#15823)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-06 15:13:01 +08:00
PoAn Yang 970ac07881
KAFKA-16659 KafkaConsumer#position() does not respect wakup when group protocol is CONSUMER (#15853)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-06 08:45:11 +08:00