Commit Graph

3638 Commits

Author SHA1 Message Date
Kaushik Raina d77f44414d
KAFKA-18780: Extend RetriableException related exceptions (#19020)
- Added a unit test to validate the exception hierarchy for all KIP-1050
transaction related exceptions.
- RetriableException is correctly extended by all child classes
- Included test for RetriableException exception with verification that
all exceptions extending `RetriableException` do not inadvertently
extend `RefreshRetriableException, preserving the intended behavior.

Reviewers: Kirk True <ktrue@confluent.io>, TaiJuWu <tjwu1217@gmail.com>, TengYao Chi <kitingiao@gmail.com>,  Ken Huang <s7133700@gmail.com>, Justine Olshan <jolshan@confluent.io>
2025-02-27 07:49:13 -08:00
ClarkChen 269e2d898b
KAFKA-18849 Add "strict min ISR" to the docs of "min.insync.replicas" (#19016)
KIP-966 adds strict min ISR rule, so this PR improves the docs of min.insync.replicas to include that change.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-27 16:05:24 +08:00
Nick Guo dd85938661
KAFKA-18850 Fix the docs of org.apache.kafka.automatic.config.providers (#19039)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@apache.org>
2025-02-27 15:36:27 +08:00
Dongnuo Lyu 36f19057e1
KAFKA-18813: ConsumerGroupHeartbeat API and ConsumerGroupDescribe API must check topic describe (#18989)
This patch filters out the topic describe unauthorized topics from the
ConsumerGroupHeartbeat and ConsumerGroupDescribe response.

In ConsumerGroupHeartbeat, 
- if the request has `subscribedTopicNames` set, we directly check the
authz in `KafkaApis` and return a topic auth failure in the response if
any of the topics is denied.
- Otherwise, we check the authz only if a regex refresh is triggered and
we do it based on the acl of the consumer that triggered the refresh. If
any of the topic is denied, we filter it out from the resolved
subscription.

In ConsumerGroupDescribe, we check the authz of the coordinator
response. If any of the topic in the group is denied, we remove the
described info and add a topic auth failure to the described group.
(similar to the group auth failure)

Reviewers: David Jacot <djacot@confluent.io>, Lianet Magrans
<lmagrans@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>,
Chia-Ping Tsai <chia7712@gmail.com>, TaiJuWu <tjwu1217@gmail.com>,
TengYao Chi <kitingiao@gmail.com>
2025-02-26 13:05:36 -05:00
José Armando García Sancio 4a8a0637e0
KAFKA-18723; Better handle invalid records during replication (#18852)
For the KRaft implementation there is a race between the network thread,
which read bytes in the log segments, and the KRaft driver thread, which
truncates the log and appends records to the log. This race can cause
the network thread to send corrupted records or inconsistent records.
The corrupted records case is handle by catching and logging the
CorruptRecordException. The inconsistent records case is handle by only
appending record batches who's partition leader epoch is less than or
equal to the fetching replica's epoch and the epoch didn't change
between the request and response.

For the ISR implementation there is also a race between the network
thread and the replica fetcher thread, which truncates the log and
appends records to the log. This race can cause the network thread send
corrupted records or inconsistent records. The replica fetcher thread
already handles the corrupted record case. The inconsistent records case
is handle by only appending record batches who's partition leader epoch
is less than or equal to the leader epoch in the FETCH request.

Reviewers: Jun Rao <junrao@apache.org>, Alyssa Huang <ahuang@confluent.io>, Chia-Ping Tsai <chia7712@apache.org>
2025-02-25 20:09:19 -05:00
Shivsundar R fae2e53901
MINOR : Add missing error code in ConsumerHeartbeatRequestManagerTest. (#19024)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-02-25 15:35:20 -05:00
Shivsundar R 2880e04129
KAFKA-18779: Validate responses from broker in client for ShareFetch and ShareAcknowledge RPCs. (#18939)
- Currently if we received extraneous topic partitions in the response
or if the response was missing some partitions requested, we were
processing the response as it came and even populated the callback with
these partitions.

- These invalid responses should be parsed at the
`ShareConsumeRequestManager`.

- If the response missed any acknowledgements for partitions that were
requested, then we fail the request with `InvalidRecordStateException`
and populate the callbacks.

- For any extraneous partitions in the response, we log an error and
ignore them.

Some refactors are also done in this PR in ShareConsumeRequestManager to
make the code more readable.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-24 10:27:24 +00:00
Sushant Mahajan 3fc103b48b
KAFKA-18629: ShareGroupDeleteState admin client impl. (#18928)
* In this PR, we add various infra classes needed to support the
`deleteShareGroups` functionality via the `kafka-share-groups.sh`
script, as well as the implementation of `kafka-share-groups.sh --delete`.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-22 16:21:10 +00:00
Sushant Mahajan 4f28973bd1
KAFKA-18827: Initialize share state, share coordinator impl. [1/N] (#18968)
In this PR, we have added the share coordinator and KafkaApis side impl
of the intialize share group state RPC.
ref:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka#KIP932:QueuesforKafka-InitializeShareGroupStateAPI

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-22 16:12:08 +00:00
xijiu 118818a7ca
KAFKA-18795 Remove `Records#downConvert` (#18897)
Since we no longer convert records to the old format for fetch requests, this code is no longer used in production.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-22 02:29:58 +08:00
Lianet Magrans c580874fc2
KAFKA-18813: [3/N] Client support for TopicAuthException in DescribeConsumerGroup path (#18996)
Reviewers: David Jacot <djacot@confluent.io>
2025-02-21 12:42:00 -05:00
Lianet Magrans c56c9faee2
KAFKA-18813: [2/N] Client support for TopicAuthException in HB path (#18986)
Reviewers: David Jacot <djacot@confluent.io>
2025-02-21 08:45:20 -05:00
TengYao Chi 709bfc506a
KAFKA-18641: AsyncKafkaConsumer could lose records with auto offset commit (#18737)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Jun Rao <jun@confluent.io>, Kirk True <ktrue@confluent.io>
2025-02-20 12:11:01 -05:00
Ken Huang eda8fc84ae
KAFKA-16918 TestUtils#assertFutureThrows should use future.get with timeout (#18891)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Luke Chen <showuon@gmail.com>, Parker Chang <45290853+Parkerhiphop@users.noreply.github.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-20 07:22:31 +08:00
Matthias J. Sax 538a60e1b3
MINOR: disallow rawtypes and fail build (#18877)
Cleanup code to avoid rawtype, and add suppressions where necessary.
Change the build to fail on rawtype warning.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-02-19 13:11:49 -08:00
Shivsundar R 3603c8fe35
KAFKA-18829: Added check before converting to IMPLICIT mode (#18964)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-19 17:34:28 +00:00
Ismael Juma 3dba3125e9
KAFKA-18601: Assume a baseline of 3.3 for server protocol versions (#18845)
3.3.0 was the first KRaft release that was deemed production-ready and also
when KIP-778 (KRaft to KRaft upgrades) landed. Given that, it's reasonable
for 4.x to only support upgrades from 3.3.0 or newer (the metadata version also
needs to be set to "3.3" or newer before upgrading).

Noteworthy changes:
1. `AlterPartition` no longer includes topic names, which makes it possible to
simplify `AlterParitionManager` logic.
2. Metadata versions older than `IBP_3_3_IV3` have been removed and
`IBP_3_3_IV3` is now the minimum version.
3. `MINIMUM_BOOTSTRAP_VERSION` has been removed.
4. Removed `isLeaderRecoverySupported`, `isNoOpsRecordSupported`,
`isKRaftSupported`, `isBrokerRegistrationChangeRecordSupported` and
`isInControlledShutdownStateSupported` - these are always `true` now.
Also removed related conditional code.
5. Removed default metadata version or metadata version fallbacks in
multiple places - we now fail-fast instead of potentially using an incorrect
metadata version.
6. Update `MetadataBatchLoader.resetToImage` to set `hasSeenRecord`
based on whether image is empty - this was a previously existing issue that
became more apparent after the changes in this PR.
7. Remove `ibp` parameter from `BootstrapDirectory`
8. A number of tests were not useful anymore and have been removed.

I will update the upgrade notes via a separate PR as there are a few things that
need changing and it would be easier to do so that way.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Justine Olshan <jolshan@confluen.io>, Ken Huang <s7133700@gmail.com>
2025-02-19 05:35:42 -08:00
ShivsundarR a6a588fbed
KAFKA-18198: Added check to prevent acknowledgements on initial ShareFetchRequest. (#18944)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-19 10:49:58 +00:00
TaiJuWu 4c8d96c0f0
KAFKA-18767: Add client side config check for shareConsumer (#18850)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-18 15:57:56 +00:00
Parker Chang ed366e6b89
MINOR: Align assertFutureThrows method signature with JUnit conventions (#18825)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-02-18 15:56:42 +00:00
Chirag Wadhwa 63229a768c
KAFKA-16718 [1/n]: Added DeleteShareGroupOffsets request and response schema (#18927)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-18 14:06:24 +00:00
Bruno Cadonna d6b6952d48
KAFKA-18736: Add Streams group heartbeat request manager (1/N) (#18870)
This commit adds the Streams group heartbeat request manager
to the async consumer. The Streams group heartbeat request
manager is responsible to send heartbeat requests and to
process their responses.

This commit implements:
- sending of full heartbeat request (independent of any state)
- processing successful response

Reviewers: Bill Bejeck <bill@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2025-02-18 13:45:01 +01:00
Kaushik Raina 35420eb11b
KAFKA-18684: Add base exception classes (#18871)
Introduced two new exception classes to the Kafka error handling framework:

ApplicationRecoverableException: This exception signals that the error is recoverable, but the producer needs to be restarted. It helps in scenarios where recovery actions (like re-balancing or restoring from checkpoints) are needed.

RefreshRetriableException: This exception occurs when metadata is outdated or invalid and needs to be refreshed before retrying the request. It helps handle retries that depend on updated metadata.

Both classes are abstract and in upcoming PRs they will be extended by relevant classes as mentioned in KIP-1050:Exception Table.

Reviewers: Justine Olshan <jolshan@confluent.io>, Sanskar Jhajharia <jhajharia.sanskar@gmail.com>
2025-02-17 12:11:51 -08:00
Ken Huang d1db3d8e14
KAFKA-18805: add synchronized block for Consumer Heartbeat close (#18920)
add synchronized block for Consumer Heartbeat close.

Reviewers: Luke Chen <showuon@gmail.com>
2025-02-17 14:38:20 +08:00
Ming-Yen Chung e828767062
KAFKA-18790 Fix testCustomQuotaCallback (#18906)
Frequently updating the trust store can cause unexpected termination of the AsyncConsumer background thread.

1. To resolve this issue, reuse the same AdminClient instead of recreating it.
2. Add error logging when fail to initialize resources for the consumer network thread.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-15 03:07:59 +08:00
Jimmy Wang 6a6b80215d
KAFKA-16717 [1/2]: Add AdminClient.alterShareGroupOffsets (#18819)
KAFKA-16720 aims to add the support for the AlterShareGroupOffsets AdminClient. Key Changes in the PR:

1. Added handing of alterShareGroupOffsets() in KafkaAdminClient and introduce AlterShareGroupOffsetRequest/AlterShareGroupOffsetResponse/AlterShareGroupOffsetsOptions classes.
2. Corresponding test in KafkaAdminClientTest.
3. Added ALTER_SHARE_GROUP_OFFSETS API (will finish it in next PR and the share coordinator pieces)

Reviewers: poorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-15 02:35:46 +08:00
Calvin Liu 53c2b1604d
MINOR: TransactionManager logs the epoch bump less frequently. (#18895)
Reviwers: Justine Olshan <jolshan@confluen.io>
2025-02-14 08:37:23 -08:00
Apoorv Mittal e6b835f0b4
MINOR: Marking testVerifyFetchAndCloseImplicit flaky (#18893)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-14 04:57:06 +08:00
Kirk True 057460e807
KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#18795)
Reviewers: Jun Rao <jun@confluent.io>, Lianet Magrans <lmagrans@confluent.io>, Jeff Kim <jeff.kim@confluent.io>
2025-02-13 13:53:56 -05:00
Andrew Schofield 952113e8e0
KAFKA-16720: Support multiple groups in DescribeShareGroupOffsets RPC (#18834)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
2025-02-13 18:27:05 +00:00
Lianet Magrans 6eb6a5e578
KAFKA-18776: Fix flaky coordinator disconnect test & fix log level (#18866)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-13 12:11:45 -05:00
Lianet Magrans c465cf6b4b
KAFKA-17298: Update upgrade notes for 4.0 KIP-848 (#18756)
Reviewers: David Jacot <djacot@confluent.io>
2025-02-13 11:51:56 -05:00
ShivsundarR 0e40b80c86
KAFKA-18769: Improve leadership changes handling in ShareConsumeRequestManager. (#18851)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-12 15:54:01 +00:00
Sushant Mahajan 675a0889de
KAFKA-18764: Throttle on share state RPCs auth failure. (#18855)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-11 09:54:24 +00:00
Ismael Juma da21b536c4
MINOR: Java version and TLS documentation improvements (#18822)
Most of the changes are obvious clean-ups/fixes. A couple of noteworthy items:

1. Support for non LTS versions is clarified (we were incorrectly stating full support
for Java 23).
2. TLS version negotiation details are clarified.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-10 12:24:28 -08:00
Ken Huang 70adf746c4
KAFKA-18225 ClientQuotaCallback#updateClusterMetadata is unsupported by kraft (#18196)
This commit ensures that the ClientQuotaCallback#updateClusterMetadata method is executed in KRaft mode. This method is triggered whenever a topic or cluster metadata change occurs. However, in KRaft mode, the current implementation of the updateClusterMetadata API is inefficient due to the requirement of creating a full Cluster object. To address this, a follow-up issue (KAFKA-18239) has been created to explore more efficient mechanisms for providing cluster information to the ClientQuotaCallback without incurring the overhead of a full Cluster object creation.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-11 01:03:02 +08:00
PoAn Yang d0f4c2f844
KAFKA-18441: Remove flaky tag on KafkaAdminClientTest#testAdminClientApisAuthenticationFailure (#18847)
Signed-off-by: PoAn Yang <payang@apache.org>
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-10 16:36:27 +00:00
Andrew Schofield aa8c57665f
KAFKA-18618: Improve leader change handling of acknowledgements [1/N] (#18672)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, ShivsundarR <shr@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
2025-02-06 14:32:55 +00:00
Sushant Mahajan 0bd1ff936f
KAFKA-18629: Add persister impl and tests for DeleteShareGroupState RPC. [2/N] (#18748)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-05 14:51:19 +00:00
Sanskar Jhajharia 7dbed2f6e8
[KAFKA-16720] AdminClient Support for ListShareGroupOffsets (2/2) (#18671)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Sushant Mahajan <smahajan@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2025-02-05 14:38:09 +00:00
TengYao Chi 66363160c5
KAFKA-18645: New consumer should align close timeout handling with classic consumer (#18702)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-05 09:08:51 -05:00
Ming-Yen Chung d830179375
KAFKA-18675 Add tests for valid and invalid broker addresses (#18781)
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-05 17:01:51 +08:00
Sean Quah 42e7cbb67e
KAFKA-18690: Keep leader metadata for RE2J-assigned partitions (#18777)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-02-04 13:22:28 -05:00
Bruno Cadonna b998189b00
KAFKA-18538: Add Streams membership manager (#18551)
The Streams membership manager is used client-side in the
background thread of the async consumer. For each member
/consumer, it is responsible for:
* keeping the member state,
* keeping assignments for the member,
* reconciling the assignments of the member -- for example
when tasks need to be revoked before other tasks are assigned
* requesting invocations of assignment and revocation callbacks
by the stream thread.

The Streams membership manager is called by the background thread of
the async consumer, directly in its event loop and from the
 Streams group heartbeat request manager. The Streams membership
manager uses the Streams rebalance events processor to request
assignment/revocation callback in the stream thread.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Bill Bejeck <bill@confluent.io>
2025-02-04 17:32:26 +01:00
Luke Chen 612e1299e4
KAFKA-18230: Handle not controller or not leader error in admin client (#18165)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-04 16:51:24 +01:00
Ismael Juma 78aff4fede
KAFKA-18659: librdkafka compressed produce fails unless api versions returns produce v0 (#18727)
Return produce v0-v2 as supported versions in `ApiVersionsResponse`, but disable support
for it everywhere else.

Since clients pick the highest supported version by both client and broker during version
negotiation, this solves the problem with minimal tech debt (even though it's not ideal that
`ApiVersionsResponse` becomes inconsistent with the actual protocol support).

Add one test for the socket server handling (in `ProcessorTest`) and one test for the
client behavior (in `ProduceRequestTest`). Adjust a couple of api versions tests to verify
the new behavior.

Finally, include a few clean-ups in `ApiKeys`, `Protocol`, `ProduceRequest`,
`ProduceRequestTest` and `BrokerApiVersionsCommandTest`.

Reference to related librdkafka issue:
https://github.com/confluentinc/librdkafka/issues/4956

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
2025-02-01 16:08:54 -08:00
Apoorv Mittal 484ba83f59
KAFKA-18683: Handle slicing of file records for updated start position (#18759)
The PR corrects the check which was introduced in #5332 where position is checked to be within boundaries of file. The check 
    position > currentSizeInBytes - start 
is incorrect, since the position is relative to start.

Reviewers: Jun Rao <junrao@gmail.com>
2025-01-31 15:43:51 -08:00
Lianet Magrans 7920fadbb5 Revert "KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#17700)"
This reverts commit 6cf54c4dab.
2025-01-31 17:18:35 -05:00
Mickael Maison 71314739f9
KAFKA-15995: Initial API + make Producer/Consumer plugins Monitorable (#17511)
Reviewers: Greg Harris <gharris1727@gmail.com>, Luke Chen <showuon@gmail.com>
2025-01-31 10:40:10 +01:00
Luke Chen 15c5c075c1
MINOR: Clean up for sasl endpoints (#18519)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-31 09:27:04 +01:00