Commit Graph

3978 Commits

Author SHA1 Message Date
Xuan-Zhang Gong c527530e80
KAFKA-19042 Move ProducerCompressionTest, ProducerFailureHandlingTest, and ProducerIdExpirationTest to client-integration-tests module (#19319)
include three test case
- ProducerCompressionTest
- ProducerFailureHandlingTest
- ProducerIdExpirationTest

Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang
<payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2025-04-15 16:34:47 +08:00
Azhar Ahmed 4cdd4b617c
KAFKA-19071: Fix doc for remote.storage.enable (#19345)
As of 3.9, Kafka allows disabling remote storage on a topic after it was
enabled. It allows subsequent enabling and disabling too.

However the documentation says otherwise and needs to be corrected.

Doc:
https://kafka.apache.org/39/documentation/#topicconfigs_remote.storage.enable

Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>
2025-04-14 11:08:49 +08:00
PoAn Yang 34a87d3477
KAFKA-19042 Move TransactionsWithMaxInFlightOneTest to client-integration-tests module (#19289)
Use Java to rewrite `TransactionsWithMaxInFlightOneTest` by new test
infra and move it to client-integration-tests module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-04-11 12:04:19 +08:00
Jhen-Yung Hsu 90e7b53799
MINOR: Remove unused `ApiVersions` variable from Sender and RecordAccumulator (#19399)
Remove unused `ApiVersions` variable from Sender and RecordAccumulator.

Reviewers: PoAn Yang <payang@apache.org>, Ken Huang
<s7133700@gmail.com>, Parker Chang <parkerhiphop027@gmail.com>,
Chia-Ping Tsai <chia7712@gmail.com>
2025-04-11 11:23:41 +08:00
Kaushik Raina b3ba7bc929
KAFKA-18782: Extend ApplicationRecoverableException related exceptions (#19354)
**Summary**
Extend ApplicationRecoverableException related exceptions

Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan
 <jolshan@confluent.io>
2025-04-10 16:57:28 -07:00
Bruno Cadonna c11938c926
KAFKA-19124: Use consumer background event queue for Streams events (#19421)
In the first version of the integration of the stream thread with the
new Streams rebalance protocol, the consumer used a dedicated event
queue for Streams/specific background events to request the stream
thread to call the rebalance callbacks. That  led to an issue where the
consumer times out when unsubscribing.

This commit gets rid of the dedicated queue and incorporates the
Streams-specific background events into event queue used by the
consumer.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-04-10 21:06:06 +02:00
TengYao Chi b649b1ed5d
KAFKA-18935: Ensure brokers do not return null records in FetchResponse (#19167)
JIRA: KAFKA-18935  This patch ensures the broker will not return null
records in FetchResponse.   For more details, please refer to the
ticket.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai
 <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>
2025-04-10 22:21:00 +08:00
Abhinav Dixit 699ae1b75b
KAFKA-16729: Support isolation level for share consumer (#19261)
This PR adds the share group dynamic config `share.isolation.level`.
Until now, share groups only supported `READ_UNCOMMITTED` isolation
level type. With this PR, we aim to support `READ_COMMITTED` isolation
type to share groups.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
2025-04-10 09:00:03 +01:00
Florian Hussonnois eeb1214ba8
KAFKA-18962: Fix onBatchRestored call in GlobalStateManagerImpl (#19188)
Call the StateRestoreListener#onBatchRestored with numRestored and not
the totalRestored when reprocessing state

See: https://issues.apache.org/jira/browse/KAFKA-18962

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias
Sax <mjsax@apache.org>
2025-04-09 13:17:38 -07:00
Bruno Cadonna 2a370ed721
KAFKA-19037: Integrate consumer-side code with Streams (#19377)
The consumer adaptations for the new Streams rebalance protocol need to
be integrated into the Streams code. This commit does the following:
- creates an async Kafka consumer
  - with a Streams heartbeat request manager
  - with a Streams membership manager
- integrates consumer code with the Streams membership manager and the
Streams heartbeat request manager
- processes the events from the consumer network thread (a.k.a.
background thread)
  that request the invocation of the "on tasks revoked", "on  tasks
assigned", and "on all tasks lost"
  callbacks
- executes the callbacks
- sends to the consumer network thread the events signalling the
execution of the callbacks
- adapts SmokeTestDriverIntegrationTest to use the new Streams rebalance
protocol

This commit misses some unit test coverage, but it also unblocks other
work on trunk regarding the new Streams rebalance protocol.  The missing
unit tests will be added soon.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-04-09 13:26:51 +02:00
Chirag Wadhwa 5148174196
KAFKA-16718-2/n: KafkaAdminClient and GroupCoordinator implementation for DeleteShareGroupOffsets RPC (#18976)
This PR contains the implementation of KafkaAdminClient and
GroupCoordinator for DeleteShareGroupOffsets RPC.

- Added `deleteShareGroupOffsets` to `KafkaAdminClient`
- Added implementation for `handleDeleteShareGroupOffsetsRequest` in
`KafkaApis.scala`
- Added `deleteShareGroupOffsets` to `GroupCoordinator` as well.
internally this makes use of `persister.deleteState` to persist the
changes in share coordinator

Reviewers: Andrew Schofield <aschofield@confluent.io>, Sushant Mahajan <smahajan@confluent.io>
2025-04-09 07:31:06 +01:00
lorcan 434b0d39ae
MINOR: use enum map for error counts map (#19314)
Java provides a specialised Map where Enums are the keys, which can
provide some performance improvements.

https://docs.oracle.com/javase/8/docs/api/java/util/EnumMap.html

I have updated the Java code where possible to use an EnumMap rather
than a HashMap and run the unit tests under the requests directory.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Lianet Magrans
<lmagrans@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-04-09 02:01:02 +08:00
Ken Huang 2f086d188f
KAFKA-18892: Add KIP-877 support for ClientQuotaCallback (#19068)
Allow ClientQuotaCallback to implement Monitorable and register metrics.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, TaiJuWu 
<tjwu1217@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>
2025-04-08 16:58:29 +02:00
Nick Guo fcf6da0a0d
KAFKA-19098 Remove `lastOffset` from PartitionResponse (#19398)
The `lastOffset` is not used actually, so it can be removed.

Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-04-08 00:06:02 +08:00
Shivsundar R 2d02f1d52d
KAFKA-19084: Port KAFKA-16224, KAFKA-16764 for ShareConsumers (#19369)
Currently for ShareConsumers, if we receive an
`UNKNOWN_TOPIC_OR_PARTITION` error code in the
`ShareAcknowledgeResponse`, then we retry sending the acknowledgements
until the timer expires.
We ideally do not want this when a topic/partition is deleted, hence
like the
`CommitRequestManager`(https://github.com/apache/kafka/pull/15581), we
will treat this error as fatal and not retry the acknowledgements.

PR also suppresses `InvalidTopicException` during `unsubscribe()` which
was also added in the
`AsyncKafkaConsumer`(https://github.com/apache/kafka/pull/16043). It was
later removed in the regular consumer
as they notified the background operations of metadata errors instead of
propagating them via `ErrorEvent`. `ShareConsumerImpl` however does
not require that change and it still propagates the metadata error back
to the application. So we would need to suppress this exception during
unsubscribe().

Reviewers: Andrew Schofield <aschofield@confluent.io>, Sushant Mahajan <smahajan@confluent.io>
2025-04-07 10:04:48 +01:00
Hong-Yi Chen 6dd2cc70c3
MINOR: Clean up comments and remove unused code in RecordVersion and CreateTopicsRequestTest (#19342)
## Summary

This PR updates the `RecordVersion` javadoc for clarity. It removes
outdated references to `message.format.version` mentioned in the [Kafka
4.0 upgrade
documentation](48f06981ee/40/upgrade.html (L135))
and aligns with feedback from a previous discussion in [#19325
](https://github.com/apache/kafka/pull/19325).

## Changes
- Cleaned up javadoc in `RecordVersion`
- Removed outdated or deprecated references

Reviewers: PoAn Yang <payang@apache.org>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-04-07 07:47:06 +08:00
Thomas Gebert a65626b6a8
MINOR: Add functionalinterface to the producer callback (#19366)
The Callback interface is a perfect example of a place that can use the
functionalinterface in Java. Strictly for Java, this isn't "required"
since Java will automatically coerce, but for Clojure (and other JVM
languages I belive) to interop with Java lambdas it needs the
FunctionalInterface annotation.

Since FunctionalInterface doesn't add any overhead and provides
compiler-enforced documentation, I don't see any reason *not* to have
this. This has already been added into Kafka Streams here:
https://github.com/apache/kafka/pull/19234#pullrequestreview-2740742487

I am happy to add it to any other spots in that might be useful too.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-04-06 22:21:09 +08:00
Parker Chang 9f676dd7e2
MINOR: Clean up unreachable code in FetcherTest (#19376)
This is from [#16532's comment](https://github.com/apache/kafka/pull/16532/files#r2028985028):

The forEach loop in the assertion will never execute because
`nonResponseData` is empty.

This happens because the above assertion `emptyMap()` has a size of 0,
so there are no elements to iterate over.

Reviewers: PoAn Yang <payang@apache.org>, Ken Huang
<s7133700@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-04-06 22:17:02 +08:00
TengYao Chi 74acbd200d
KAFKA-16758: Extend Consumer#close with an option to leave the group or not (#17614)
JIRA: [KAFKA-16758](https://issues.apache.org/jira/browse/KAFKA-16758)
This PR is aim to deliver

[KIP-1092](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=321719077),
please refer to KIP-1092 and KAFKA-16758 for further details.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Chia-Ping
Tsai <chia7712@gmail.com>, Kirk True <kirk@kirktrue.pro>
2025-04-05 22:02:45 -07:00
PoAn Yang 3d96b20630
KAFKA-19042 Move TransactionsExpirationTest to client-integration-tests module (#19288)
Use Java to rewrite `TransactionsExpirationTest` by new test infra and
move it to client-integration-tests module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-04-05 20:01:31 +08:00
TaiJuWu ebb62812d9
KAFKA-19074 Remove the cached responseData from ShareFetchResponse (#19357)
Jira: https://issues.apache.org/jira/browse/KAFKA-19074

Similar fix https://github.com/apache/kafka/pull/16532

2b8aff58b5
make it accept input to return "partial" data.
The content of output is based on the input but we cache the output ...
It will return same output even though we pass different input. That is
a potential bug.

Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-04-05 19:56:59 +08:00
Andrew Schofield d4d9f11816
KAFKA-18761: [2/N] List share group offsets with state and auth (#19328)
This PR approaches completion of Admin.listShareGroupOffsets() and
kafka-share-groups.sh --describe --offsets.

Prior to this patch, kafka-share-groups.sh was only able to describe the
offsets for partitions which were assigned to active members. Now, the
Admin.listShareGroupOffsets() uses the persister's knowledge of the
share-partitions which have initialised state. Then, it uses this list
to obtain a complete set of offset information.

The PR also implements the topic-based authorisation checking. If
Admin.listShareGroupOffsets() is called with a list of topic-partitions
specified, the authz checking is performed on the supplied list,
returning errors for any topics to which the client is not authorised.
If Admin.listShareGroupOffsets() is called without a list of
topic-partitions specified, the list of topics is discovered from the
persister as described above, and then the response is filtered down to
only show the topics to which the client is authorised. This is
consistent with other similar RPCs in the Kafka protocol, such as
OffsetFetch.

Reviewers: David Arthur <mumrah@gmail.com>, Sushant Mahajan <smahajan@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>
2025-04-04 13:25:19 +01:00
Logan Zhu a4375045d6
KAFKA-19055 Cleanup the 0.10.x information from clients module (#19320)
Removes outdated references to Kafka 0.10.x in the clients module
documentation. Since the baseline version is now 2.1, any mentions of
versions earlier than this are unnecessary and have been removed or
updated accordingly.

Changes:
- Updated `ClusterResource`, `ClusterResourceListener`, and
`DescribeClusterResult` Javadoc to reflect the minimum supported broker
version as 2.1.
- Updated `TopicConfig` documentation: Removed references to consumers
older than 0.10.2.
- Removed references to 0.10.x and adjusted explanations to remain
relevant for newer versions.

Testing & Impact:
- This PR only modifies Javadoc comments—no functional code changes.
- No impact on existing functionality.

Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-04-04 04:17:13 +08:00
Thomas Gebert db4e74b46e
MINOR: Add Functional Interface annotation to interfaces used by Lambdas (#19234)
Adds the FunctionalInterface annotation to relevant Kafka Streams
classes. While this is not strictly required for Java, it's still best
practice and also useful for better integration with other JVM
languages, for example Clojure, to allow using these interfaces as
lambdas.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-04-03 09:30:56 -07:00
Ritika Reddy eeffd8c475
KAFKA-19003: Add forceTerminateTransaction command to CLI tools (#19276)
This patch is part of KIP-939 [Support Participation in
2PC](https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC)

The kafka-transactions.sh tool will support a new command
--forceTerminateTransaction It has one required argument
--transactionalId that would take the transactional id for the
transaction to be terminated.

The command uses the existing Admin#fenceProducers method to forcefully
abort the transaction associated with the specified transactional ID.
Under the hood, it sends an InitProducerId request to the transaction
coordinator with the given transactional ID and keepPreparedTxn = false
by default. This is aligned with the functionality outlined in the KIP.

We will be creating a new public method in the Admin Client **public
TerminateTransactionResult forceTerminateTransaction(String
transactionalId)**, and re-use the existing fence producer method.

Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>
2025-04-02 11:51:26 -07:00
Andrew Schofield cee55dbdec
KAFKA-18794: Disable flaky tests pending investigation (#19340)
KafkaShareConsumerTest is proving very flaky. The behaviour of
MockClient does not appear to match the expectations of the test. This
PR disables the flaky tests to reduce build noise. When a proper
solution has been worked out, the tests can be re-enabled.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-04-01 16:52:06 +01:00
Shivsundar R e301508b53
MINOR: Add check in ShareConsumerImpl to send acknowledgements of control records when ShareFetch is empty. (#19295)
Currently if we received just a control record in the
`ShareFetchResponse`, then the currentFetch in `ShareConsumerImpl` would
not be updated as the record is ignored. But in the process, we lose the
acknowledgment for this control record which is a GAP.
PR fixes this by adding an additional map for control record
acknowledgements in `ShareFetchEvent`.
This updates both the ShareConsumerImpl and ShareConsumeRequestManager
to accommodate the additional map.
Added a unit test in `ShareConsumerImplTest` and
`ShareConsumeRequestManagerTest` to verify the changes.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-04-01 14:15:03 +01:00
Shivsundar R ed77397814
KAFKA-19062: Port changes from KAFKA-18645 to share-consumers (#19335)
Limits waiting when closing a share consumer to request.timeout.ms.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-04-01 13:08:19 +01:00
Apoorv Mittal 4aa81204ff
KAFKA-19018,KAFKA-19063: Implement maxRecords and acquisition lock timeout in share fetch request and response resp. (#19334)
PR add `MaxRecords` to share fetch request and also adds
`AcquisitionLockTimeout` to share fetch response. PR also removes
internal broker config of `max.fetch.records`.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-04-01 12:23:06 +01:00
Ismael Juma b375bb099b
MINOR: Remove unused `ApiKeys.minRequiredInterBrokerMagic` (#19325)
Reviewers: David Jacot <david.jacot@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-31 10:41:05 -07:00
TengYao Chi 20546930ae
KAFKA-19042 Move ConsumerTopicCreationTest to client-integration-tests module (#19283)
This patch moves `ConsumerTopicCreationTest` to the
`client-integration-tests` and rewrite it as Java.
The patch also streamlines the test flow. 
In the Scala version, there is a producer that produces messages, but
this is not the main purpose of the `ConsumerTopicCreationTest`.

Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-03-31 20:15:54 +08:00
Kuan-Po Tseng c095faa578
KAFKA-18945 Enhance the docs for Admin APIs (#19315)
Enhance the documentation for Admin#describeCluster and
Admin#describeConfigs to clarify their behavior when using
bootstrap.controllers and bootstrap.servers.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-31 13:49:04 +08:00
Nick Guo c771116b89
KAFKA-19005 improve the documentation of DescribeTopicsOptions#partitionSizeLimitPerResponse (#19268)
jira: https://issues.apache.org/jira/browse/KAFKA-19005

This PR includes following changes:
1. refine the format
2. highligh that it is supported by topic names

<img width="857" alt="999"
src="https://github.com/user-attachments/assets/6eec9e2f-b839-430c-b111-2be3a8538593"
/>

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-29 03:21:10 +08:00
Nick Guo 9292a22606
KAFKA-19049 Remove the `@ExtendWith(ClusterTestExtensions.class)` from code base (#19299)
jira: https://issues.apache.org/jira/browse/KAFKA-19049

[KAFKA-18617](https://issues.apache.org/jira/browse/KAFKA-18617)
introduced the mechanism to inject the cluster test at runtime, so the
integration tests don't need to use
`@ExtendWith(ClusterTestExtensions.class)` any more.

Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-03-29 02:15:16 +08:00
Lucas Brutschy 2267902b40
MINOR: Mark streams RPCs as unstable (#19292)
Streams groups RPCs are not enabled by default, but they should also be
marked as unstable.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2025-03-27 14:22:01 +01:00
Sushant Mahajan eb88e78373
KAFKA-18827: Initialize share group state group coordinator impl. [3/N] (#19026)
* This PR adds impl for the initialize share groups call from the Group
Coordinator perspective.
* The initialize call on persister instance will be invoked by the
`GroupCoordinatorService`, based on the response of the
`GroupCoordinatorShard.shareGroupHeartbeat`. If there is new topic
subscription or member assignment change (topic paritions incremented),
the delta share partitions corresponding to the share group in question
are returned as an optional initialize request.
* The request is then sent to the share coordinator as an encapsulated
timer task because we want the heartbeat response to go asynchronously.
* Tests have been added for `GroupCoordinatorService` and
`GroupMetadataManager`. Existing tests have also been updated.
* A new formatter `ShareGroupStatePartitionMetadataFormatter` has been
added for debugging.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-03-26 19:40:23 +00:00
Vikas Singh 56d1dc1b6e
MINOR: Use readable interface to parse requests (#19163)
The generated request data type's constructors take Readable as an input. However, the parse method in the
AbstractRequest takes a ByteBuffer as input. So to create the corresponding request data objects, each individual
concrete Request classes wraps the ByteBuffer into a ByteBufferAccessor.

This is boilerplate code present in all the concrete request classes. This changes AbstractRequest's parse method so that subclasses can simply pass the `Readable` they get directly to request data classes.

The same change is made to the serialize method to maintain symmetry.

Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio
<jsancio@apache.org>, Artem Livshits <alivshits@confluent.io>,
Truc Nguyen <trnguyen@confluent.io>
2025-03-26 10:13:13 -04:00
Shivsundar R 91758cc99d
KAFKA-18899: Improve handling of timeouts for commitAsync() in ShareConsumer. (#19192)
Previously, the `ShareConsumer.commitAsync()` method retried sending
`ShareAcknowledge` requests indefinitely. Now it will instead use the
defaultApiTimeout config to expire the request so that it does not retry forever.

PR also fixes a bug in processing `commitSync() `requests, where we
need an additional check if the node is free.

Co-authored-by: Andrew Schofield <aschofield@confluent.io>
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-03-26 09:06:59 +00:00
ClarkChen 1547204baa
KAFKA-18914 Migrate ConsumerRebootstrapTest to use new test infra (#19154)
Migrate ConsumerRebootstrapTest to the new test infra and remove the old
Scala test.

The PR changed three things.
* Migrated `ConsumerRebootstrapTest` to new test infra and removed the
old Scala test.
* Updated the original test case to cover rebootstrap scenarios.
* Integrated `ConsumerRebootstrapTest` into `ClientRebootstrapTest` in
the `client-integration-tests` module.
* Removed the `RebootstrapTest.scala`.

Default `ConsumerRebootstrap` config:
> properties.put(CommonClientConfigs.METADATA_RECOVERY_STRATEGY_CONFIG,
"rebootstrap");

properties.put(CommonClientConfigs.METADATA_RECOVERY_REBOOTSTRAP_TRIGGER_MS_CONFIG,
"300000");

properties.put(CommonClientConfigs.SOCKET_CONNECTION_SETUP_TIMEOUT_MS_CONFIG,
"10000");

properties.put(CommonClientConfigs.SOCKET_CONNECTION_SETUP_TIMEOUT_MAX_MS_CONFIG,
"30000");
properties.put(CommonClientConfigs.RECONNECT_BACKOFF_MS_CONFIG, "50L");
properties.put(CommonClientConfigs.RECONNECT_BACKOFF_MAX_MS_CONFIG,
"1000L");

The test case for the consumer with enabled rebootstrap
![Screenshot 2025-03-22 at 9 48
13 PM](https://github.com/user-attachments/assets/8470549f-a24c-43fa-ae44-789cbf422a63)


The test case for the consumer with disabled rebootstrap
![Screenshot 2025-03-22 at 9 47
22 PM](https://github.com/user-attachments/assets/0a183464-6a74-449f-8e71-d641a6ea5bb1)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-26 01:53:42 +08:00
Bruno Cadonna 96196bb03b
KAFKA-18736: Add pollOnClose() and maximumTimeToWait() (#19233)
Adds pollOnClose() and maximumTimeToWait() to the Streams 
group heartbeat request manager.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-03-25 09:09:13 +01:00
Bruno Cadonna 266532f562
KAFKA-18736: Handle errors in the Streams group heartbeat request manager (#19230)
This commit adds error handling to the Streams heartbeat request
manager.

Errors can occur while sending a heartbeat request and when a response
with an error code that is not NONE is received.

Some errors are handled explicitly to recover from them or to log
specific messages. All the others are handled as fatal errors.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-03-24 21:26:14 +01:00
TaiJuWu a524fc64b4
MINOR: leverage preProcessParsedConfig within AbstractConfig (#19259)
In past, we have `AbstractConfig#preProcessParsedConfig` but did not use
its return value

Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-03-24 01:19:20 +08:00
ClarkChen fef9aebb19
KAFKA-18276 Migrate ProducerRebootstrapTest to new test infra (#19046)
The PR changed three things.
* Migrated `ProducerRebootstrapTest` to new test infra and removed the
old Scala test.
* Updated the original test case to cover rebootstrap scenarios.
* Integrated `ProducerRebootstrapTest` into `ClientRebootstrapTest` in
the `client-integration-tests` module.

Default `ProducerRebootstrap` config:
> properties.put(CommonClientConfigs.METADATA_RECOVERY_STRATEGY_CONFIG,
"rebootstrap");

properties.put(CommonClientConfigs.METADATA_RECOVERY_REBOOTSTRAP_TRIGGER_MS_CONFIG,
"300000");

properties.put(CommonClientConfigs.SOCKET_CONNECTION_SETUP_TIMEOUT_MS_CONFIG,
"10000");

properties.put(CommonClientConfigs.SOCKET_CONNECTION_SETUP_TIMEOUT_MAX_MS_CONFIG,
"30000");
properties.put(CommonClientConfigs.RECONNECT_BACKOFF_MS_CONFIG, "50L");
properties.put(CommonClientConfigs.RECONNECT_BACKOFF_MAX_MS_CONFIG,
"1000L");
       
The test case for the producer with enabled rebootstrap
<img width="1549" alt="Screenshot 2025-03-17 at 10 46 03 PM"
src="https://github.com/user-attachments/assets/547840a6-d79d-4db4-98c0-9b05ed04cf60"
/>

The test case for the producer with disabled rebootstrap
<img width="1552" alt="Screenshot 2025-03-17 at 10 46 47 PM"
src="https://github.com/user-attachments/assets/2248e809-d9d5-4f3b-a24f-ba1aa0fef728"
/>

Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-24 01:09:17 +08:00
Ken Huang 68ecb7720f
MINOR: add log4j2.yaml to clients-integration-tests module (#19252)
`clients-integration-tests` modules doesn't have the `log4j2.yaml` to
setting log, thus we should add.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-03-22 02:19:54 +08:00
TaiJuWu 79fe1305b6
KAFKA-18893: Add KIP-877 support to ReplicaSelector (#19064)
ReplicaSelector implementations can implement Monitorable to register their own metrics.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ken Huang <s7133700@gmail.com>
2025-03-21 15:39:50 +01:00
David Arthur 8fa3856473
MINOR Mar 19 flaky tests (#19248)
CoordinatorRequestManagerTest#testMarkCoordinatorUnknownLoggingAccuracy
has become flaky again. Last 30 days report shows a sudden re-occurrence


https://develocity.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=github,trunk,not:flaky,not:new&search.tasks=test&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.clients.consumer.internals.CoordinatorRequestManagerTest&tests.sortField=FLAKY#

Also mark QuorumControllerTest.testMinIsrUpdateWithElr as flaky.

Reviewers: Lianet Magrans <lmagrans@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-03-21 09:26:13 -04:00
David Jacot 0c5e5c5d2d
KAFKA-18329; [2/3] Delete old group coordinator (KIP-848) (#19251)
This patch is the second of a series of patches to remove the old group
coordinator. With the release of Apache Kafka 4.0, the so-called new
group coordinator is the default and only option available now.

The patch removes `group.coordinator.new.enable` (internal config) and
all its usages (integration tests, unit tests, etc.). It also cleans up
`KafkaApis` to remove logic only used by the old group coordinator.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-21 05:54:41 -07:00
Ken Huang e21c46a504
MINOR: Move FileRecord JavaDoc to comment (#19257)
See: https://github.com/apache/kafka/pull/19214#discussion_r2005945059

Move explaination from Javadoc to comment.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-21 13:47:56 +08:00
Ken Huang 31e1a57c41
KAFKA-18989 Optimize FileRecord#searchForOffsetWithSize (#19214)
The `lastOffset` includes the entire batch header, so we should check `baseOffset` instead.

To optimize this, we need to update the search logic. The previous
approach simply checked whether each batch's `lastOffset()` was greater
than or equal to the target offset. Once it found the first batch that
met this condition, it returned that batch immediately.

Now that we are using `baseOffset()`, we need to handle a special case:
if the `targetOffset` falls between the `lastOffset` of the previous
batch and the `baseOffset` of the matching batch, we should select the
matching batch. The updated logic is structured as follows:

1. First, if baseOffset exactly equals targetOffset, return immediately.
2. If we find the first batch with baseOffset greater than targetOffset
    - Check if the previous batch contains the target
- If there's no previous batch, return the current batch or the previous
batch doesn't contain the target, return the current batch
5. After iterating through all batches, check if the last batch contains
the target offset.

This code path is not thread-safe, so we need to prevent `EOFException`.
To avoid this exception, I am still using an early return. In this
scenario, `lastOffset` is still used within the loop, but it should be
executed at most once within the loop.

Therefore, in the new implementation, `lastOffset` will be executed at
most once. In most cases, this results in an optimization.

Test: Verifying Memory Usage Improvement  
To evaluate whether this optimization helps, I followed the steps below
to monitor memory usage:

1. Start a Standalone Kafka Server  
```sh
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/server.properties
bin/kafka-server-start.sh config/server.properties
```  

2. Use Performance Console Tools to Produce and Consume Records  

**Produce Records:**  
```sh
./kafka-producer-perf-test.sh \
  --topic test-topic \
  --num-records 1000000000 \
  --record-size 100 \
  --throughput -1 \
  --producer-props bootstrap.servers=localhost:9092
```  
**Consume Records:**  
```sh
./bin/kafka-consumer-perf-test.sh \
  --topic test-topic \
  --messages 1000000000 \
  --bootstrap-server localhost:9092
```  
It can be observed that memory usage has significantly decreased.
trunk:
![CleanShot 2025-03-16 at 11 53
31@2x](https://github.com/user-attachments/assets/eec26b1d-38ed-41c8-8c49-e5c68643761b)
this PR:
![CleanShot 2025-03-16 at 17 41
56@2x](https://github.com/user-attachments/assets/c8d4c234-18c2-4642-88ae-9f96cf54fccc)

Reviewers: Kirk True <kirk@kirktrue.pro>, TengYao Chi
<kitingiao@gmail.com>, David Arthur <mumrah@gmail.com>, Jun Rao
<junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-20 16:33:35 +08:00
Lan Ding e73719d962
KAFKA-18819 StreamsGroupHeartbeat API and StreamsGroupDescribe API check topic describe (#19183)
This patch filters out the topic describe unauthorized topics from the
StreamsGroupHeartbeat and StreamsGroupDescribe response.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-03-19 20:42:05 +01:00
PoAn Yang fcca4056fd
KAFKA-18975 Move clients-integration-test out of core module (#19217)
Move following tests from core to clients-integration-test module.

- ClientTelemetryTest
- DeleteTopicTest
- DescribeAuthorizedOperationsTest
- ConsumerIntegrationTest
- CustomQuotaCallbackTest
- RackAwareAutoTopicCreationTest

Move following tests from core to server module.

- BootstrapControllersIntegrationTest
- LogManagerIntegrationTest

Reviewers: Kirk True <kirk@kirktrue.pro>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-20 02:43:19 +08:00
Ritika Reddy 3a3159b01e
KAFKA-18953: [1/N] Add broker side handling for 2 PC (KIP-939) (#19193)
This patch adds logic to enable and handle two phase commit (2PC)
transactions following KIP-939.
The changes made are as follows:
1) Add a new broker config called
**transaction.two.phase.commit.enable** which is set to false by default
2) Add new flags **enableTwoPCFlag** and **keepPreparedTxn** to
handleInitProducerId
3) Return an error if keepPreparedTxn is set to true (for now)

Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan
<jolshan@confluent.io>
2025-03-19 09:22:00 -07:00
Ken Huang b805877705
KAFKA-18969 Rewrite ShareConsumerTest#setup and move to clients-integration-tests module (#19202)
Move share consumer to clients-integration-tests module and use `@BeforeEach` to setup

Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-18 14:47:38 +08:00
TengYao Chi a6a0ea56d8
KAFKA-17171 Add test cases for `STATIC_BROKER_CONFIG`in kraft mode (#18463)
Given that the `core` module will be separated into other small modules,
this test will not be added to the core module.
Instead, I added it to the `clients-integration-tests` module since it
focuses on the admin client test. The patch should include following test cases.

1. a topic-related static config is added to quorum controller. The
configs from topic creation should include it, but `describeConfigs`
does not.

2. a topic-related static config is added to quorum controller. The
configs from topic creation should include it, and `describeConfigs`
does if admin is using controller.bootstrap

3. a topic-related static config is added to broker. The configs from
topic creation should NOT include it, but `describeConfigs` does.

4. a topic-related static config is added to broker. The configs from
topic creation should NOT include it, and `describeConfigs` does not
also if admin is using controller.bootstrap

for another, the docs of `STATIC_BROKER_CONFIG` should remind the impact of "controller.properties" BTW, those test cases should leverage new test infra, since new test infra allow us to define configs to broker/controller individually.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-18 00:30:53 +08:00
Bruno Cadonna a7e40b7c5a
KAFKA-18736: Do not send fields if not needed (#19181)
The Streams heartbeat request has some fields that are always sent.
Those are:
- group ID
- member ID
- member epoch
- group instance ID (if static membership is used)

Then it has fields that are only sent when joining:
- topology and topology epoch
- rebalance timeout
- process ID
- endpoint
- client tags

Finally, the assignment is only sent if it changed compared to the last
sent request.

Reviewers: Bill Bejeck <bill@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>
2025-03-16 18:08:56 +01:00
Ken Huang 7bff678699
KAFKA-18859 honor the error message of UnregisterBrokerResponse (#19027)
Reviewers: Ismael Juma <ismael@juma.me.uk>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-16 03:06:01 +08:00
ClarkChen e05b0e68e4
KAFKA-18915 Rewrite AdminClientRebootstrapTest to cover the current scenario (#19187)
Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-16 02:35:41 +08:00
Kaushik Raina c32c167e04
KAFKA-18781: Extend RefreshRetriableException related exceptions (#19136)
- Extended derived exceptions described in
[KIP-1050](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=309496816#KIP1050:ConsistenterrorhandlingforTransactions-RefreshRetriableException)
to include the new RefreshRetriableException in base hierarchy.
- Added unit tests to validate the hierarchy of the derived exceptions
in relevant scenarios.

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-03-14 09:11:31 -07:00
Gerard Klijs-Nefkens b2a01b2754
MINOR: call the serialize method including headers from the MockProducer (#11144)
Currently when using serializers like the Cloud Event Serializer, we
need to do a work around so it doesn't throw an error. Using the method
taking the headers would prevent this. Since the default implementation
just calls the method without the headers, it's expected to be fully
backwards compatible.

Reviewers: Divij Vaidya <divijvaidya13@gmail.com>
2025-03-13 18:50:29 +01:00
Mickael Maison 759fbbba8b
KAFKA-14484: Move UnifiedLog to storage module (#19030)
Rewrite UnifiedLog in Java

Reviewers: Jun Rao <jun@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-13 10:49:55 +01:00
Mickael Maison 55d65cb3ba
MINOR: Cleanups in CoreUtils (#19175)
Delete unused methods in CoreUtils and switch to Utils.newInstance().

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-12 19:43:30 +01:00
David Arthur 0ebc3e83c5
MINOR Mar 12 Flaky tests (#19190)
Mark the following tests as flaky:

* StickyAssignorTest > testLargeAssignmentAndGroupWithUniformSubscription
* DeleteSegmentsByRetentionTimeTest
* QuorumControllerTest > testUncleanShutdownBrokerElrEnabled

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-03-12 13:47:35 -04:00
Abhinav Dixit c07c59ad24
KAFKA-18932: Removed usage of partition max bytes from share fetch requests (#19148)
This PR aims to remove the usage of partition max bytes from share fetch
requests. Partition Max Bytes is being defined by
`PartitionMaxBytesStrategy` which was added to the broker as part of PR
https://github.com/apache/kafka/pull/17870

Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>
2025-03-12 13:19:19 +00:00
David Arthur 701573366f
KAFKA-18933 Add client integration tests module (#19144)
Adds a new ":clients:integration-test" Gradle module. Relocates one
example test from ":core"

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-11 16:36:23 -04:00
David Arthur 903d70d764
MINOR Mark Tls13SelectorTest#testCloseOldestConnection as flaky (#19178)
This test has a flakiness around 7%. It caused two back-to-back failures
on trunk recently.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-11 16:35:38 -04:00
Lucas Brutschy 6551e87815
KAFKA-18925: Add streams groups support to Admin.listGroups (#19155)
Add support so that Admin.listGroups can represent
streams groups and their states.

Reviewers: Bill Bejeck <bill@confluent.io>
2025-03-11 15:48:07 +01:00
Bruno Cadonna 59e5890505
KAFKA-18736: Decide when a heartbeat should be sent (#19121)
This commit adds the conditions to decide when a Streams group heartbeat
should be sent.
A heartbeat should be sent when:
- the group coordinator is available
- the member is part of the Streams group or wants to join it
- the heartbeat interval expired or the member is leaving the group or
acknowledging the assginment

This commit does not implement:
- not sending fields that did not change
- handling errors

Reviewers: Zheguang Zhao <zheguang.zhao@alumni.brown.edu>, Lucas
Brutschy <lbrutschy@confluent.io>
2025-03-10 17:39:57 +01:00
PoAn Yang 19d8a414ef
KAFKA-15900, KAFKA-18310: fix flaky test testOutdatedCoordinatorAssignment and AbstractCoordinatorTest (#18945)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-03-10 11:50:35 -04:00
Lucas Brutschy fc2e3dfce9
MINOR: Disallow unused local variables (#18963)
Recently, we found a regression that could have been detected by static
analysis, since a local variable wasn't being passed to a method during
a refactoring, and was left unused. It was fixed in
[7a749b5](7a749b589f),
but almost slipped into 4.0. Unused variables are typically detected by
IDEs, but this is insufficient to prevent these kinds of bugs. This
change enables unused local variable detection in checkstyle for Kafka.

A few notes on the usage:
- There are two situations in which people actually want to have a local
variable but not use it. First, there are `for (Type ignored:
collection)` loops which have to loop `collection.length` number of
times, but that do not use `ignored` in the loop body. These are
typically still easier to read than a classical `for` loop. Second, some
IDEs detect it if a return value of a function such as `File.delete` is
not being used. In this case, people sometimes store the result in an
unused local variable to make ignoring the return value explicit and to
avoid the squiggly lines.
- In Java 22, unsued local variables can be omitted by using a single
underscore `_`. This is supported by checkstyle. In pre-22 versions,
IntelliJ allows such variables to be named `ignored` to suppress the
unused local variable warning. This pattern is often (but not
consistently) used in the Kafka codebase. This is, however, not
supported by checkstyle.

Since we cannot switch to Java 22, yet, and we want to use automated
detection using checkstyle, we have to resort to prefixing the unused
local variables with `@SuppressWarnings("UnusedLocalVariable")`. We have
to apply this in 11 cases across the Kafka codebase. While not being
pretty, I'd argue it's worth it to prevent bugs like the one fixed in
[7a749b5](7a749b589f).

Reviewers: Andrew Schofield <aschofield@confluent.io>, David Arthur
<mumrah@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Bruno
Cadonna <cadonna@apache.org>, Kirk True <ktrue@confluent.io>
2025-03-10 09:37:35 +01:00
Cheryl Simmons 6940bef6e8
MINOR: Small fit and finish changes to Producer config doc strings (#19125)
- Adding a space, article and punctuation to the Producer config doc
strings for consistency and readability.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Justine Olshan <jolshan@confluent.io>
2025-03-07 11:07:35 -08:00
Lucas Brutschy 618ea2c1ca
KAFKA-18285: Add describeStreamsGroup to Admin API (#19116)
Adds `describeStreamsGroup` to Admin API.

This exposes the result of the `DESCRIBE_STREAMS_GROUP` RPC in the Admin
API.

Reviewers: Bill Bejeck <bill@confluent.io>
2025-03-07 15:56:07 +01:00
David Jacot 8cf2f9a61a
KAFKA-18046; High CPU usage when using Log4j2 (#19138)
This patch is a first step towards resolving KAFKA-18046. Apache Kafka
4.0 ships with log4j2 so the issue raised in the ticket causing high CPU
usage on the fetch path due to LoggerFactory.getLogger() being called on
the handling of all fetch responses is not good. Hence, I propose to fix
that one by caching the Logger used by the `CompletedFetch` class.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2025-03-07 00:03:32 -08:00
Matthias J. Sax d85946da19
MINOR: reduce per-batch logging to TRACE level (#19101)
Logging on a per-batch bases is very chatty, and should only be done at
TRACE level to avoid spamming DEBUG logs.

Reviewers: Justine Olshan <jolshan@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2025-03-06 11:06:26 -08:00
Andrew Schofield 1da30bdedf
KAFKA-18900: Experimental share consumer acknowledge mode config (#19113)
User testing of the `KafkaShareConsumer` interface has revealed some
areas which confuse people. One of these is that way that it decides
whether you want to use implicit or explicit acknowledgement of records
by observing which calls the application issues. We are taking the
opportunity to refine the interface before it is finalised.

This PR introduces an experimental configuration called
`internal.share.acknowledgement.mode` which can be used to make the
application declare which kind of acknowledgement it wishes to use. We
plan to try out the configuration, assess whether it has helped, and
then create a proper consumer configuration that makes this area better.
That would require a lot of change in the tests, which explains why this
initial PR only has a small number of tests.

Reviewers: David Arthur <mumrah@gmail.com>
2025-03-06 17:57:11 +00:00
Ismael Juma a738df4aaa
KAFKA-18648: Make `records` in `FetchResponse` nullable again (#19131)
As Jun raised in
https://github.com/apache/kafka/pull/18726#discussion_r1972525165,
we actually do have a few code paths where `records` remains `null`
in the FetchResponse with broker version 3.9 and older:

* Compression codec for topic is ZSTD and fetch version < 10:
https://github.com/apache/kafka/blob/3.9/core/src/main/scala/kafka/server/KafkaApis.scala#L835
* Down-conversion of zstandard-compressed:
https://github.com/apache/kafka/blob/3.9/core/src/main/scala/kafka/server/KafkaApis.scala#L884
* Generic uncaught exception through:
https://github.com/apache/kafka/blob/3.9/clients/src/main/java/org/apache/kafka/common/requests/FetchRequest.java#L365

To ensure 4.0 clients don't fail to deserialize fetch responses from
brokers with the affected versions, we make `records` nullable again.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>
2025-03-06 09:12:36 -08:00
Alieh Saeedi 7a976c651e
KAFKA-18887: Implement Streams Admin APIs (#19120)
Implement Admin API extensions beyond list/describe group (delete group,
offset-related APIs).

* adds methods for describing and manipulating offsets, as described in
KIP-1071
* adds corresponding unit tests

These are doing the exact same thing as the corresponding consumer group
counter-parts.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-03-06 17:55:21 +01:00
Sushant Mahajan b89c819f63
MINOR: Added evolving annotation to DeleteShareGroupsResult. (#19133)
* Added `InterfaceStability.Evolving` annotation to`DeleteShareGroupsResult`.
* Fixed some java doc.

Co-authored-by: Andrew Schofield <aschofield@confluent.io>

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-03-06 16:17:37 +00:00
dengziming 50510bb19d
HOTFIX: Do not use highest version when version is valid (#19109)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-06 10:15:58 +08:00
David Arthur d86cb59790 Revert "KAFKA-18887: Implement Streams Admin APIs (#19049)"
This reverts commit 017692e86c.
2025-03-05 10:49:11 -05:00
Sushant Mahajan 485699a187
MINOR: Delete DeleteGroupsResult class. (#19057)
In this PR, we perform this refactor as the class is not needed since
there is no need to refer to child classes by common ref and the
duplicated code is minimal.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-03-05 14:38:18 +00:00
Alieh Saeedi 017692e86c
KAFKA-18887: Implement Streams Admin APIs (#19049)
Implement Admin API extensions beyond list/describe group (delete group, offset-related APIs).

* adds methods for describing and manipulating offsets, as described in KIP-1071
* adds corresponding unit tests

These are doing the exact same thing as the corresponding consumer group counter-parts.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-03-05 15:32:09 +01:00
S.Y. Wang 6ecf6817ad
KAFKA-18919 Clarify that KafkaPrincipalBuilder classes must also implement KafkaPrincipalSerde (#19104)
In KRaft mode, custom KafkaPrincipalBuilder instances must implement KafkaPrincipalSerde. This PR updates all related documentation to highlight this requirement.

Reviewers: Ken Huang <s7133700@gmail.com>, David Jacot <djacot@confluent.io>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-05 21:25:09 +08:00
Kuan-Po Tseng cbd72cc216
KAFKA-14121: AlterPartitionReassignments API should allow callers to specify the option of preserving the replication factor (#18983)
Reviewers: Christo Lolov <lolovc@amazon.com>, Chia-Ping Tsai <chia7712@gmail.com>, TengYao Chi <kitingiao@gmail.com>
2025-03-05 11:23:12 +00:00
dengziming 1bfa4cd17b
KAFKA-10864 Convert end txn marker schema to use auto-generated protocol (#9766)
1. Convert end txn marker schema to use auto-generated protocol`EndTxnMarker`
2. substitute `CURRENT_END_TXN_MARKER_VALUE_SIZE` with an`endTnxMarkerValueSize` method since the size is accumulated from `EndTxnMarker`.
3. add buffer to `EndTransactionMarker` to avoid twice compute from `serializeValue` and `endTnxMarkerValueSize`.
4. flexibleVersions is set to none.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-05 15:47:02 +08:00
co63oc e4ece37dbf
Fix typos in multiple files (#19086)
Fix typos in multiple files

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-03-04 16:05:51 +00:00
DL1231 a24fedfba0
KAFKA-18817:[1/N] ShareGroupHeartbeat and ShareGroupDescribe API must check topic describe (#19055)
1、Client support for TopicAuthException in DescribeShareGroup and HB
path
2、ShareConsumerImpl#sendAcknowledgementsAndLeaveGroup swallow
TopicAuthorizationException and GroupAuthorizationException

Reviewers: ShivsundarR <shr@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2025-03-03 09:49:37 +00:00
Bruno Cadonna 898dcd11ad
MINOR: Extract HeartbeatRequestState from heartbeat request managers (#19043)
The AbstractHeartbeatRequestManager and the
StreamsGroupHeartbeatRequestManager, both use the
HeartbeatRequestState to track the state of the heartbeat requests. Both
heartbeat request managers have an implementation of
HeartbeatRequestState as inner class.
To deduplicate code this commit extracts the HeartbeatRequestState so
that the same code can be used by both heartbeat request manager.

Reviewers: Kirk True <ktrue@confluent.io>, Chia-Ping Tsai
<chia7712@gmail.com>, Lucas Brutschy <lbrutschy@confluent.io>
2025-03-03 10:46:20 +01:00
Logan Zhu bf660fdeb6
KAFKA-18881 Document the ConsumerRecord as non-thread safe (#19056)
There are 3 issues (at least) about the multithreaded issue on ConsumerRecords. Hence, it would be better to document it completely. 

Reviewers: Kirk True <ktrue@confluent.io>, TengYao Chi <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Xuan-Zhang Gong <gongxuanzhangmelt@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-03-03 13:03:36 +08:00
TengYao Chi e0c77140b2
KAFKA-17039 KIP-919 supports for unregisterBroker (#19063)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-03-01 23:55:35 +08:00
Xuan-Zhang Gong 45f932819e
KAFKA-18864:remove the Evolving tag from stable public interfaces (#19036)
The purpose of this PR is to remove the `@InterfaceStability.Evolving` from classes that were created over a year ago.

Reviewers: Jun Rao <junrao@gmail.com>
2025-02-28 13:24:24 -08:00
Kaushik Raina d77f44414d
KAFKA-18780: Extend RetriableException related exceptions (#19020)
- Added a unit test to validate the exception hierarchy for all KIP-1050
transaction related exceptions.
- RetriableException is correctly extended by all child classes
- Included test for RetriableException exception with verification that
all exceptions extending `RetriableException` do not inadvertently
extend `RefreshRetriableException, preserving the intended behavior.

Reviewers: Kirk True <ktrue@confluent.io>, TaiJuWu <tjwu1217@gmail.com>, TengYao Chi <kitingiao@gmail.com>,  Ken Huang <s7133700@gmail.com>, Justine Olshan <jolshan@confluent.io>
2025-02-27 07:49:13 -08:00
ClarkChen 269e2d898b
KAFKA-18849 Add "strict min ISR" to the docs of "min.insync.replicas" (#19016)
KIP-966 adds strict min ISR rule, so this PR improves the docs of min.insync.replicas to include that change.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-27 16:05:24 +08:00
Nick Guo dd85938661
KAFKA-18850 Fix the docs of org.apache.kafka.automatic.config.providers (#19039)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@apache.org>
2025-02-27 15:36:27 +08:00
Dongnuo Lyu 36f19057e1
KAFKA-18813: ConsumerGroupHeartbeat API and ConsumerGroupDescribe API must check topic describe (#18989)
This patch filters out the topic describe unauthorized topics from the
ConsumerGroupHeartbeat and ConsumerGroupDescribe response.

In ConsumerGroupHeartbeat, 
- if the request has `subscribedTopicNames` set, we directly check the
authz in `KafkaApis` and return a topic auth failure in the response if
any of the topics is denied.
- Otherwise, we check the authz only if a regex refresh is triggered and
we do it based on the acl of the consumer that triggered the refresh. If
any of the topic is denied, we filter it out from the resolved
subscription.

In ConsumerGroupDescribe, we check the authz of the coordinator
response. If any of the topic in the group is denied, we remove the
described info and add a topic auth failure to the described group.
(similar to the group auth failure)

Reviewers: David Jacot <djacot@confluent.io>, Lianet Magrans
<lmagrans@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>,
Chia-Ping Tsai <chia7712@gmail.com>, TaiJuWu <tjwu1217@gmail.com>,
TengYao Chi <kitingiao@gmail.com>
2025-02-26 13:05:36 -05:00
José Armando García Sancio 4a8a0637e0
KAFKA-18723; Better handle invalid records during replication (#18852)
For the KRaft implementation there is a race between the network thread,
which read bytes in the log segments, and the KRaft driver thread, which
truncates the log and appends records to the log. This race can cause
the network thread to send corrupted records or inconsistent records.
The corrupted records case is handle by catching and logging the
CorruptRecordException. The inconsistent records case is handle by only
appending record batches who's partition leader epoch is less than or
equal to the fetching replica's epoch and the epoch didn't change
between the request and response.

For the ISR implementation there is also a race between the network
thread and the replica fetcher thread, which truncates the log and
appends records to the log. This race can cause the network thread send
corrupted records or inconsistent records. The replica fetcher thread
already handles the corrupted record case. The inconsistent records case
is handle by only appending record batches who's partition leader epoch
is less than or equal to the leader epoch in the FETCH request.

Reviewers: Jun Rao <junrao@apache.org>, Alyssa Huang <ahuang@confluent.io>, Chia-Ping Tsai <chia7712@apache.org>
2025-02-25 20:09:19 -05:00
Shivsundar R fae2e53901
MINOR : Add missing error code in ConsumerHeartbeatRequestManagerTest. (#19024)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-02-25 15:35:20 -05:00
Shivsundar R 2880e04129
KAFKA-18779: Validate responses from broker in client for ShareFetch and ShareAcknowledge RPCs. (#18939)
- Currently if we received extraneous topic partitions in the response
or if the response was missing some partitions requested, we were
processing the response as it came and even populated the callback with
these partitions.

- These invalid responses should be parsed at the
`ShareConsumeRequestManager`.

- If the response missed any acknowledgements for partitions that were
requested, then we fail the request with `InvalidRecordStateException`
and populate the callbacks.

- For any extraneous partitions in the response, we log an error and
ignore them.

Some refactors are also done in this PR in ShareConsumeRequestManager to
make the code more readable.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-24 10:27:24 +00:00
Sushant Mahajan 3fc103b48b
KAFKA-18629: ShareGroupDeleteState admin client impl. (#18928)
* In this PR, we add various infra classes needed to support the
`deleteShareGroups` functionality via the `kafka-share-groups.sh`
script, as well as the implementation of `kafka-share-groups.sh --delete`.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-22 16:21:10 +00:00
Sushant Mahajan 4f28973bd1
KAFKA-18827: Initialize share state, share coordinator impl. [1/N] (#18968)
In this PR, we have added the share coordinator and KafkaApis side impl
of the intialize share group state RPC.
ref:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka#KIP932:QueuesforKafka-InitializeShareGroupStateAPI

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-22 16:12:08 +00:00
xijiu 118818a7ca
KAFKA-18795 Remove `Records#downConvert` (#18897)
Since we no longer convert records to the old format for fetch requests, this code is no longer used in production.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-22 02:29:58 +08:00
Lianet Magrans c580874fc2
KAFKA-18813: [3/N] Client support for TopicAuthException in DescribeConsumerGroup path (#18996)
Reviewers: David Jacot <djacot@confluent.io>
2025-02-21 12:42:00 -05:00
Lianet Magrans c56c9faee2
KAFKA-18813: [2/N] Client support for TopicAuthException in HB path (#18986)
Reviewers: David Jacot <djacot@confluent.io>
2025-02-21 08:45:20 -05:00
TengYao Chi 709bfc506a
KAFKA-18641: AsyncKafkaConsumer could lose records with auto offset commit (#18737)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Jun Rao <jun@confluent.io>, Kirk True <ktrue@confluent.io>
2025-02-20 12:11:01 -05:00
Ken Huang eda8fc84ae
KAFKA-16918 TestUtils#assertFutureThrows should use future.get with timeout (#18891)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Luke Chen <showuon@gmail.com>, Parker Chang <45290853+Parkerhiphop@users.noreply.github.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-20 07:22:31 +08:00
Matthias J. Sax 538a60e1b3
MINOR: disallow rawtypes and fail build (#18877)
Cleanup code to avoid rawtype, and add suppressions where necessary.
Change the build to fail on rawtype warning.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-02-19 13:11:49 -08:00
Shivsundar R 3603c8fe35
KAFKA-18829: Added check before converting to IMPLICIT mode (#18964)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-19 17:34:28 +00:00
Ismael Juma 3dba3125e9
KAFKA-18601: Assume a baseline of 3.3 for server protocol versions (#18845)
3.3.0 was the first KRaft release that was deemed production-ready and also
when KIP-778 (KRaft to KRaft upgrades) landed. Given that, it's reasonable
for 4.x to only support upgrades from 3.3.0 or newer (the metadata version also
needs to be set to "3.3" or newer before upgrading).

Noteworthy changes:
1. `AlterPartition` no longer includes topic names, which makes it possible to
simplify `AlterParitionManager` logic.
2. Metadata versions older than `IBP_3_3_IV3` have been removed and
`IBP_3_3_IV3` is now the minimum version.
3. `MINIMUM_BOOTSTRAP_VERSION` has been removed.
4. Removed `isLeaderRecoverySupported`, `isNoOpsRecordSupported`,
`isKRaftSupported`, `isBrokerRegistrationChangeRecordSupported` and
`isInControlledShutdownStateSupported` - these are always `true` now.
Also removed related conditional code.
5. Removed default metadata version or metadata version fallbacks in
multiple places - we now fail-fast instead of potentially using an incorrect
metadata version.
6. Update `MetadataBatchLoader.resetToImage` to set `hasSeenRecord`
based on whether image is empty - this was a previously existing issue that
became more apparent after the changes in this PR.
7. Remove `ibp` parameter from `BootstrapDirectory`
8. A number of tests were not useful anymore and have been removed.

I will update the upgrade notes via a separate PR as there are a few things that
need changing and it would be easier to do so that way.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Justine Olshan <jolshan@confluen.io>, Ken Huang <s7133700@gmail.com>
2025-02-19 05:35:42 -08:00
ShivsundarR a6a588fbed
KAFKA-18198: Added check to prevent acknowledgements on initial ShareFetchRequest. (#18944)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-19 10:49:58 +00:00
TaiJuWu 4c8d96c0f0
KAFKA-18767: Add client side config check for shareConsumer (#18850)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-18 15:57:56 +00:00
Parker Chang ed366e6b89
MINOR: Align assertFutureThrows method signature with JUnit conventions (#18825)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-02-18 15:56:42 +00:00
Chirag Wadhwa 63229a768c
KAFKA-16718 [1/n]: Added DeleteShareGroupOffsets request and response schema (#18927)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-18 14:06:24 +00:00
Bruno Cadonna d6b6952d48
KAFKA-18736: Add Streams group heartbeat request manager (1/N) (#18870)
This commit adds the Streams group heartbeat request manager
to the async consumer. The Streams group heartbeat request
manager is responsible to send heartbeat requests and to
process their responses.

This commit implements:
- sending of full heartbeat request (independent of any state)
- processing successful response

Reviewers: Bill Bejeck <bill@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2025-02-18 13:45:01 +01:00
Kaushik Raina 35420eb11b
KAFKA-18684: Add base exception classes (#18871)
Introduced two new exception classes to the Kafka error handling framework:

ApplicationRecoverableException: This exception signals that the error is recoverable, but the producer needs to be restarted. It helps in scenarios where recovery actions (like re-balancing or restoring from checkpoints) are needed.

RefreshRetriableException: This exception occurs when metadata is outdated or invalid and needs to be refreshed before retrying the request. It helps handle retries that depend on updated metadata.

Both classes are abstract and in upcoming PRs they will be extended by relevant classes as mentioned in KIP-1050:Exception Table.

Reviewers: Justine Olshan <jolshan@confluent.io>, Sanskar Jhajharia <jhajharia.sanskar@gmail.com>
2025-02-17 12:11:51 -08:00
Ken Huang d1db3d8e14
KAFKA-18805: add synchronized block for Consumer Heartbeat close (#18920)
add synchronized block for Consumer Heartbeat close.

Reviewers: Luke Chen <showuon@gmail.com>
2025-02-17 14:38:20 +08:00
Ming-Yen Chung e828767062
KAFKA-18790 Fix testCustomQuotaCallback (#18906)
Frequently updating the trust store can cause unexpected termination of the AsyncConsumer background thread.

1. To resolve this issue, reuse the same AdminClient instead of recreating it.
2. Add error logging when fail to initialize resources for the consumer network thread.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-15 03:07:59 +08:00
Jimmy Wang 6a6b80215d
KAFKA-16717 [1/2]: Add AdminClient.alterShareGroupOffsets (#18819)
KAFKA-16720 aims to add the support for the AlterShareGroupOffsets AdminClient. Key Changes in the PR:

1. Added handing of alterShareGroupOffsets() in KafkaAdminClient and introduce AlterShareGroupOffsetRequest/AlterShareGroupOffsetResponse/AlterShareGroupOffsetsOptions classes.
2. Corresponding test in KafkaAdminClientTest.
3. Added ALTER_SHARE_GROUP_OFFSETS API (will finish it in next PR and the share coordinator pieces)

Reviewers: poorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-15 02:35:46 +08:00
Calvin Liu 53c2b1604d
MINOR: TransactionManager logs the epoch bump less frequently. (#18895)
Reviwers: Justine Olshan <jolshan@confluen.io>
2025-02-14 08:37:23 -08:00
Apoorv Mittal e6b835f0b4
MINOR: Marking testVerifyFetchAndCloseImplicit flaky (#18893)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-14 04:57:06 +08:00
Kirk True 057460e807
KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#18795)
Reviewers: Jun Rao <jun@confluent.io>, Lianet Magrans <lmagrans@confluent.io>, Jeff Kim <jeff.kim@confluent.io>
2025-02-13 13:53:56 -05:00
Andrew Schofield 952113e8e0
KAFKA-16720: Support multiple groups in DescribeShareGroupOffsets RPC (#18834)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
2025-02-13 18:27:05 +00:00
Lianet Magrans 6eb6a5e578
KAFKA-18776: Fix flaky coordinator disconnect test & fix log level (#18866)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-13 12:11:45 -05:00
Lianet Magrans c465cf6b4b
KAFKA-17298: Update upgrade notes for 4.0 KIP-848 (#18756)
Reviewers: David Jacot <djacot@confluent.io>
2025-02-13 11:51:56 -05:00
ShivsundarR 0e40b80c86
KAFKA-18769: Improve leadership changes handling in ShareConsumeRequestManager. (#18851)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-12 15:54:01 +00:00
Sushant Mahajan 675a0889de
KAFKA-18764: Throttle on share state RPCs auth failure. (#18855)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-11 09:54:24 +00:00
Ismael Juma da21b536c4
MINOR: Java version and TLS documentation improvements (#18822)
Most of the changes are obvious clean-ups/fixes. A couple of noteworthy items:

1. Support for non LTS versions is clarified (we were incorrectly stating full support
for Java 23).
2. TLS version negotiation details are clarified.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-10 12:24:28 -08:00
Ken Huang 70adf746c4
KAFKA-18225 ClientQuotaCallback#updateClusterMetadata is unsupported by kraft (#18196)
This commit ensures that the ClientQuotaCallback#updateClusterMetadata method is executed in KRaft mode. This method is triggered whenever a topic or cluster metadata change occurs. However, in KRaft mode, the current implementation of the updateClusterMetadata API is inefficient due to the requirement of creating a full Cluster object. To address this, a follow-up issue (KAFKA-18239) has been created to explore more efficient mechanisms for providing cluster information to the ClientQuotaCallback without incurring the overhead of a full Cluster object creation.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-11 01:03:02 +08:00
PoAn Yang d0f4c2f844
KAFKA-18441: Remove flaky tag on KafkaAdminClientTest#testAdminClientApisAuthenticationFailure (#18847)
Signed-off-by: PoAn Yang <payang@apache.org>
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-10 16:36:27 +00:00
Andrew Schofield aa8c57665f
KAFKA-18618: Improve leader change handling of acknowledgements [1/N] (#18672)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, ShivsundarR <shr@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
2025-02-06 14:32:55 +00:00
Sushant Mahajan 0bd1ff936f
KAFKA-18629: Add persister impl and tests for DeleteShareGroupState RPC. [2/N] (#18748)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-05 14:51:19 +00:00
Sanskar Jhajharia 7dbed2f6e8
[KAFKA-16720] AdminClient Support for ListShareGroupOffsets (2/2) (#18671)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Sushant Mahajan <smahajan@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2025-02-05 14:38:09 +00:00
TengYao Chi 66363160c5
KAFKA-18645: New consumer should align close timeout handling with classic consumer (#18702)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-05 09:08:51 -05:00
Ming-Yen Chung d830179375
KAFKA-18675 Add tests for valid and invalid broker addresses (#18781)
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-05 17:01:51 +08:00
Sean Quah 42e7cbb67e
KAFKA-18690: Keep leader metadata for RE2J-assigned partitions (#18777)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-02-04 13:22:28 -05:00
Bruno Cadonna b998189b00
KAFKA-18538: Add Streams membership manager (#18551)
The Streams membership manager is used client-side in the
background thread of the async consumer. For each member
/consumer, it is responsible for:
* keeping the member state,
* keeping assignments for the member,
* reconciling the assignments of the member -- for example
when tasks need to be revoked before other tasks are assigned
* requesting invocations of assignment and revocation callbacks
by the stream thread.

The Streams membership manager is called by the background thread of
the async consumer, directly in its event loop and from the
 Streams group heartbeat request manager. The Streams membership
manager uses the Streams rebalance events processor to request
assignment/revocation callback in the stream thread.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Bill Bejeck <bill@confluent.io>
2025-02-04 17:32:26 +01:00
Luke Chen 612e1299e4
KAFKA-18230: Handle not controller or not leader error in admin client (#18165)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-04 16:51:24 +01:00
Ismael Juma 78aff4fede
KAFKA-18659: librdkafka compressed produce fails unless api versions returns produce v0 (#18727)
Return produce v0-v2 as supported versions in `ApiVersionsResponse`, but disable support
for it everywhere else.

Since clients pick the highest supported version by both client and broker during version
negotiation, this solves the problem with minimal tech debt (even though it's not ideal that
`ApiVersionsResponse` becomes inconsistent with the actual protocol support).

Add one test for the socket server handling (in `ProcessorTest`) and one test for the
client behavior (in `ProduceRequestTest`). Adjust a couple of api versions tests to verify
the new behavior.

Finally, include a few clean-ups in `ApiKeys`, `Protocol`, `ProduceRequest`,
`ProduceRequestTest` and `BrokerApiVersionsCommandTest`.

Reference to related librdkafka issue:
https://github.com/confluentinc/librdkafka/issues/4956

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
2025-02-01 16:08:54 -08:00
Apoorv Mittal 484ba83f59
KAFKA-18683: Handle slicing of file records for updated start position (#18759)
The PR corrects the check which was introduced in #5332 where position is checked to be within boundaries of file. The check 
    position > currentSizeInBytes - start 
is incorrect, since the position is relative to start.

Reviewers: Jun Rao <junrao@gmail.com>
2025-01-31 15:43:51 -08:00
Lianet Magrans 7920fadbb5 Revert "KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#17700)"
This reverts commit 6cf54c4dab.
2025-01-31 17:18:35 -05:00
Mickael Maison 71314739f9
KAFKA-15995: Initial API + make Producer/Consumer plugins Monitorable (#17511)
Reviewers: Greg Harris <gharris1727@gmail.com>, Luke Chen <showuon@gmail.com>
2025-01-31 10:40:10 +01:00
Luke Chen 15c5c075c1
MINOR: Clean up for sasl endpoints (#18519)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-31 09:27:04 +01:00
Kirk True 6cf54c4dab
KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#17700)
This change reduces fetch session cache evictions on the broker for AsyncKafkaConsumer by altering its logic to determine which partitions it includes in fetch requests.

Background
Consumer implementations fetch data from the cluster and temporarily buffer it in memory until the user next calls Consumer.poll(). When a fetch request is being generated, partitions that already have buffered data are not included in the fetch request.

The ClassicKafkaConsumer performs much of its fetch logic and network I/O in the application thread. On poll(), if there is any locally-buffered data, the ClassicKafkaConsumer does not fetch any new data and simply returns the buffered data to the user from poll().

On the other hand, the AsyncKafkaConsumer consumer splits its logic and network I/O between two threads, which results in a potential race condition during fetch. The AsyncKafkaConsumer also checks for buffered data on its application thread. If it finds there is none, it signals the background thread to create a fetch request. However, it's possible for the background thread to receive data from a previous fetch and buffer it before the fetch request logic starts. When that occurs, as the background thread creates a new fetch request, it skips any buffered data, which has the unintended result that those partitions get added to the fetch request's "to remove" set. This signals to the broker to remove those partitions from its internal cache.

This issue is technically possible in the ClassicKafkaConsumer too, since the heartbeat thread performs network I/O in addition to the application thread. However, because of the frequency at which the AsyncKafkaConsumer's background thread runs, it is ~100x more likely to happen.

Options
The core decision is: what should the background thread do if it is asked to create a fetch request and it discovers there's buffered data. There were multiple proposals to address this issue in the AsyncKafkaConsumer. Among them are:

The background thread should omit buffered partitions from the fetch request as before (this is the existing behavior)
The background thread should skip the fetch request generation entirely if there are any buffered partitions
The background thread should include buffered partitions in the fetch request, but use a small “max bytes” value
The background thread should skip fetching from the nodes that have buffered partitions
Option 4 won out. The change is localized to AbstractFetch where the basic idea is to skip fetch requests to a given node if that node is the leader for buffered data. By preventing a fetch request from being sent to that node, it won't have any "holes" where the buffered partitions should be.

Reviewers: Lianet Magrans <lmagrans@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Jun Rao <junrao@gmail.com>
2025-01-30 13:12:11 -08:00
Ken Huang 4b29fd6383
KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18548)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>
2025-01-30 11:22:54 -05:00
Pramithas Dhakal aa27df9396
MINOR: KafkaProducerTest - Fix resource leakage and replace explicit invocation of close() method with try with resources (#18678)
Reviewers: Divij Vaidya <diviv@amazon.com>, Greg Harris <greg.harris@aiven.io>, Christo Lolov <lolovc@amazon.com>
2025-01-30 12:34:57 +01:00
PoAn Yang 0dfc4017b8
KAFKA-18441: Fix flaky KafkaAdminClientTest#testAdminClientApisAuthenticationFailure (#18735)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-30 08:01:20 +00:00
TengYao Chi 9dd73d43b0
KAFKA-18569: New consumer close may wait on unneeded FindCoordinator (#18590)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-29 14:15:56 -05:00
Calvin Liu a3b34c1315
KAFKA-18662: Return CONCURRENT_TRANSACTIONS on produce request in TV2 (#18733)
While testing, it was found that the not_enough_replicas error was super common and could be easily confused. Since we are already bumping the request, we can signify that the produce request may return this error and new clients can handle it 

(Note, the java client should be able to handle this already as a retriable error, but other client libraries may need to implement this change)

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-01-29 10:15:48 -08:00
Ismael Juma ca5d2cf76d
KAFKA-18646: Null records in fetch response breaks librdkafka (#18726)
Ensure we always return empty records (including cases where an error is returned).
We also remove `nullable` from `records` since it is effectively expected to be
non-null by a large percentage of clients in the wild.

This behavior regressed in fe56fc9 (KAFKA-18269). Empty records were
previously set via `FetchResponse.recordsOrFail(partitionData)` in the
now-removed `maybeConvertFetchedData` method.

Added an integration test that fails without this fix and also update many
tests to set `records` to `empty` instead of leaving them as `null`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>
2025-01-29 07:04:12 -08:00
TengYao Chi 97a228070e
KAFKA-18619: New consumer topic metadata events should set requireMetadata flag (#18668)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-01-29 08:36:05 -05:00
Andrew Schofield f960e20647
KAFKA-18488: Improve KafkaShareConsumerTest (#18728)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-01-29 09:47:21 +00:00
Ismael Juma e6d72c9e60
KAFKA-18648: Add back support for metadata version 0-3 (#18716)
During testing, we identified that kafka-python (and aiokafka) relies on metadata request v0 and
hence we need to add these back to comply with the premise of KIP-896 - i.e. it should not
break the clients listed within it.

I reverted the changes from #18218 related to the removal of metadata versions 0-3.

I will submit a separate PR to undeprecate these API versions on the relevant 3.x branches.

kafka-python (and aiokafka) work correctly (produce & consume) with this change on
top of the 4.0 branch.

Reviewers: David Arthur <mumrah@gmail.com>
2025-01-28 18:35:33 -08:00
David Arthur f18457f2b8
MINOR Mark a StickyAssignorTest as flaky (#18719)
Mark StickyAssignorTest#testLargeAssignmentAndGroupWithNonEqualSubscription as flaky. Used data from this
report https://github.com/apache/kafka/actions/runs/12982945953

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-28 10:34:05 -05:00
Sushant Mahajan f32932cc25
KAFKA-18629: Delete share group state impl [1/N] (#18712)
Reviewers: Christo Lolov <lolovc@amazon.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-28 11:43:01 +00:00
Chung, Ming-Yen 43af241b50
KAFKA-18639 Enable the @Flaky annotation for some flaky tests (#18701)
The following tests were previously reported as flaky but were only annotated with a comment in pull request #18558 due to module dependency limitations:

    testAdminClientApisAuthenticationFailure
    testOutdatedCoordinatorAssignment
    testThrottledProducerConsumer

With the introduction of the new test infrastructure #18602 , which allows all modules to use the @Flaky annotation, these tests should now be updated to include the @Flaky annotation.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-25 22:44:35 +08:00
David Arthur 8c0a0e07ce
KAFKA-17587 Refactor test infrastructure (#18602)
This patch reorganizes our test infrastructure into three Gradle modules:

":test-common:test-common-internal-api" is now a minimal dependency which exposes interfaces and annotations only. It has one project dependency on server-common to expose commonly used data classes (MetadataVersion, Feature, etc). Since this pulls in server-common, this module is Java 17+. It cannot be used by ":clients" or other Java 11 modules.

":test-common:test-common-util" includes the auto-quarantined JUnit extension. The @Flaky annotation has been moved here. Since this module has no project dependencies, we can add it to the Java 11 list so that ":clients" and others can utilize the @Flaky annotation

":test-common:test-common-runtime" now includes all of the test infrastructure code (TestKitNodes, etc). This module carries heavy dependencies (core, etc) and so it should not normally be included as a compile-time dependency.

In addition to this reorganization, this patch leverages JUnit SPI service discovery so that modules can utilize the integration test framework without depending on ":core". This will allow us to start moving integration tests out of core and into the appropriate sub-module. This is done by adding ":test-common:test-common-runtime" as a testRuntimeOnly dependency rather than as a testImplementation dependency. A trivial example was added to QuorumControllerTest to illustrate this.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-24 09:03:43 -05:00
Ken Huang 0c9df75295
KAFKA-18474: Remove zkBroker listener (#18477)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang <payang@apache.org>
2025-01-24 05:53:32 -08:00
Okada Haruki 17846fe743
KAFKA-16372 Fix producer doc discrepancy with the exception behavior (#15574)
Currently, Producer.send doc is inconsistent with actual exception behavior
      - TimeoutException: This won't be thrown from send on buffer-full or metadata-missing actually. Instead, it will returned as failed future.
       - AuthenticationException/AuthorizationException: These exceptions are also won't be thrown. Returned with failed future actually.
Fixed Callback javadoc and ProducerConfig doc as well.

Reviewers: Luke Chen <showuon@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-24 20:23:43 +08:00
Karsten Spang 400ecab518
KAFKA-13810: Document behavior of KafkaProducer.flush() w.r.t callbacks (#12042)
Reviewers: Luke Chen <showuon@gmail.com>, Andrew Eugene Choi <andrew.choi@uwaterloo.ca>
2025-01-23 17:20:30 +01:00
Andrew Schofield 8000d04dcb
KAFKA-18488: Additional protocol tests for share consumption (#18601)
Reviewers: ShivsundarR <shr@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2025-01-23 13:32:59 +00:00
Andrew Schofield 9da516b1a9
KAFKA-18392: Ensure client sets member ID for share group (#18649)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Lianet Magrans <lmagrans@confluent.io>
2025-01-22 08:57:40 +00:00
Bruno Cadonna 239708f52e
KAFKA-18518: Add processor to handle rebalance events (#18527)
This commit adds a processor named
StreamsRebalanceEventsProcessor that handles the rebalance
events sent from the background thread of the async
consumer to the stream thread when an task
assignment changes. It also adds the corresponding rebalance
events.

Additionally, this commit adds StreamsRebalanceData that
maintains the data that is exchanges for the Streams rebalance
protocol.

All of these are used by the Streams heartbeat request manager
and the Streams membership manager that will be added in a future
commit.

Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
2025-01-22 08:30:56 +01:00
David Jacot b368c38684
KAFKA-18302; Update CoordinatorRecord (#18512)
This patch does a few things:
1) Replace ApiMessageAndVersion by ApiMessage in CoordinatorRecord for the key
2) Leverage the fact that ApiMessage exposes the apiKey. Hence we don't need to specify the key anymore.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-21 18:11:26 +01:00
Artem Livshits 247c0f0ba5
KAFKA-15370: Support Participation in 2PC (KIP-939) (2/N) (#18316)
Update producer id request / response formats and transaction log value format. There is no functional change.

Reviewers: Justine Olshan <jolshan@confluent.io>, Calvin Liu <caliu@confluent.io>
2025-01-21 08:40:46 -08:00
Matthias J. Sax ba774a09f4
KAFKA-8862: Improve Producer error message for failed metadata update (#18587)
We should provide the same informative error message for both timeout
cases.

Reviewers: Kirk True <ktrue@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Ismael Juma <ismael@juma.me.uk>
2025-01-21 08:37:45 -08:00
Andrew Schofield 7cbfd22bde
MINOR: Improve javadoc for ListShareGroupOffsetsResult (#18650)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, PoAn Yang <payang@apache.org>, Apoorv Mittal <apoorvmittal10@gmail.com>
2025-01-21 13:56:40 +00:00
Ismael Juma 87b37a4065
KAFKA-14552: Assume a baseline of 3.0 for server protocol versions (#18497)
Kafka 4.0 will remove support for zk mode and will require conversion to kraft
before upgrading to 4.0. The minimum kraft version is 3.0 (aka 3.0-IV1).

This provides an opportunity to remove exclusively server side protocols versions
that only exist to allow direct upgrades from versions older than 3.0 or that are
used only by zk mode.

Since KRaft became production ready in 3.3, we should consider setting the
baseline to 3.3. But that requires more discussion and it can be done via a
separate change (KAFKA-18601).

Protocol changes:
* Remove RequestHeader v0 (only used by ControlledShutdown v0)
* Remove WriteTxnMarkers v0
* Remove all versions of ControlledShutdown, LeaderAndIsr, StopReplica, UpdateMetadata

In order to remove all versions safely, extend generator to support setting
"versions" to "none". In this case, we no longer generate the `*Data` classes,
but we still reserve the id for the relevant protocol api (so it doesn't get
accidentally used for something else). The protocol documentation is correct
after these changes.

We kept a simplified version of `LeaderAndIsr{Request|Response}` because
it's used by many tests that are still relevant in kraft mode. Once
KAFKA-18486 is done, it may be possible to remove it (I left a comment on
the ticket). Similarly, KAFKA-18487 may make it possible to remove
the introduced `StopReplicaPartitionState` (left a comment on that ticket too).

There are a number of places that were adjusted to include an
`ApiKeys.hasValidVersion` check.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-20 13:51:44 -08:00
PoAn Yang 7733323040
HOTFIX: ListShareGroupOffsetResult javadoc (#18642)
Signed-off-by: PoAn Yang <payang@apache.org>
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-20 15:29:11 +00:00
Sanskar Jhajharia bcbc72e29b
[KAFKA-16720] AdminClient Support for ListShareGroupOffsets (1/n) (#18571)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-20 07:47:14 +00:00
Alyssa Huang 4583b033f0
KAFKA-17642: PreVote response handling and ProspectiveState (#18240)
This PR implements the second part of KIP-996 and KAFKA-16164 (tasks KAFKA-16607, KAFKA-17642, KAFKA-17643, KAFKA-17675) which encompass the response handling of PreVotes, addition of new ProspectiveState, update to metrics, and addition of Raft simulation tests.

Voters now transition to ProspectiveState first before CandidateState to prevent unnecessary epoch bumps. Voters in ProspectiveState send PreVotes requests which are Vote requests with PreVote set to true.

Follower grants PreVotes if it has not yet fetched successfully from leader. Leader denies all PreVotes. Unattached, Prospective, Candidate, and Resigned will grant PreVotes if the requesting replica's log is at least as long as theirs. Granted PreVotes are not persisted like standard votes. It is possible for a voter to grant several PreVotes in the same epoch.

The only state which is allowed to transition directly to CandidateState is ProspectiveState. This happens on reception of majority of granted PreVotes or if at least one voter doesn't support PreVote requests.

Prospective will transition to Follower after election loss/timeout if it was already aware of last known leader and the leader's endpoint, or at any point if it discovers the leader.

Prospective will transition to Unattached after election loss/timeout if it does not know the leader endpoints.

After electionTimeout, Resigned now always transitions to Unattached and increases the epoch.

Prospective grants standard votes if it has not already granted a standard vote (no votedKey), has no leaderId, and the recipient's log is current enough

Candidate no longer backs off after election timeout. Candidate still backs off after election loss.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2025-01-17 09:38:03 -05:00
Bruno Cadonna 5c20aa187a
KAFKA-18546: Use mocks instead of a real DNS lookup to the outside (#18565)
Since the example.com DNS lookup changed the second time within one
year, we rewrote the unit tests for ClientUtils so that they do
not make a real DNS lookup to the outside but use mocks.

Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>, Lianet Magrans <lmagrans@confluent.io>
2025-01-16 16:18:44 +01:00
ShivsundarR bf760d4ebe
KAFKA-18558: Added check before adding previously subscribed partitions (#18562)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-16 13:17:48 +00:00
Mickael Maison 8262e2315d
MINOR: Cleanups in JaasUtils (#18522)
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-16 14:07:16 +01:00
Ken Huang 3c1f965c60
KAFKA-18521 Cleanup NodeApiVersions zkMigrationEnabled field (#18535)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-16 20:05:04 +08:00
Jason Taylor 11c10fe4da
KAFKA-16368: Update default linger.ms to 5ms for KIP-1030 (#18080)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Divij Vaidya <diviv@amazon.com>
2025-01-16 10:50:06 +01:00
Mickael Maison 833921ab9e
MINOR: Adjust logging in SerializedJwt (#18523)
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-16 09:58:13 +01:00
Sushant Mahajan 47f22faac3
MINOR: Added flaky references for a few tests. (#18558)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-15 19:24:52 +00:00
Kuan-Po Tseng d3b4c1bdf4
KAFKA-18401: Transaction version 2 does not support commit transaction without records (#18448)
Fix the issue where producer.commitTransaction under transaction version 2 throws error if no partition or offset is added to transaction. The solution is to avoid sending the endTxnRequest unless producer.send or producer.sendOffsetsToTransaction is triggered.

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-01-15 10:21:11 -08:00
PoAn Yang 85d2e90074
HOTFIX: ClientUtilsTest#testParseAndValidateAddressesWithReverseLookup (#18549)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Gaurav Narula <gaurav_narula2@apple.com>, TengYao Chi <kitingiao@gmail.com>
2025-01-15 16:09:03 +01:00
Mickael Maison 66b1f00c0e
KAFKA-18520: Remove ZooKeeper logic from JaasUtils (#18530)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-15 13:17:06 +01:00
Mickael Maison 6b8cc5d558
MINOR: Remove ZooKeeper mentions in Admin javadoc (#18531)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-15 10:33:30 +01:00
Ismael Juma f3a93551fa
Revert "KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18050)" (#18544)
This reverts commit 70d6312a3a.

Reviewers: Luke Chen <showuon@gmail.com>
2025-01-15 16:16:47 +08:00
Pramithas Dhakal ea77352dfc
Rename the variable to reflect its purpose (#18525)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-14 18:00:27 +00:00
Sanskar Jhajharia e3e4c17959
Add DescribeShareGroupOffsets API [KIP-932] (#18500)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-14 14:33:39 +00:00
Istvan Toth d7e5d0a59b
KAFKA-18064: SASL mechanisms should throw exception on wrap/unwrap (#17901)
SASL mechanisms that do support neither integrity nor confidentality should throw exception on wrap/unwrap.

The current implementation does not implement wrap/unwrap correctly.
This may cause security issues, if the code using the mechanisms does
not check for QOP correctly.

Reviewers: Gaurav Narula <gaurav_narula2@apple.com>, Igor Soarez <i@soarez.me>
2025-01-14 11:30:01 +00:00
陳昱霖(Yu-Lin Chen) 4fcde4542b
KAFKA-18469;KAFKA-18036: AsyncConsumer should request metadata update if ListOffsetRequest encounters a retriable error (#18475)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-01-13 19:03:52 +01:00
Ken Huang 70d6312a3a
KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18050)
Reviewers: Kirk True <ktrue@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2025-01-13 15:29:14 +01:00
Xuan-Zhang Gong dbe27c9eb2
KAFKA-18467 enhance the docs of `NewTopic` - the first replica will be treated as the preferred leader (#18470)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-12 20:32:13 +08:00
Ismael Juma d4aee71e36
KAFKA-18465: Remove MetadataVersions older than 3.0-IV1 (#18468)
Apache Kafka 4.0 will only support KRaft and 3.0-IV1 is the minimum version supported by KRaft. So, we can assume that Apache Kafka 4.0 will only communicate with brokers that are 3.0-IV1 or newer.

Note that KRaft was only marked as production-ready in 3.3, so we could go further and set the baseline to 3.3. I think we should have that discussion, but it made sense to start with the non controversial parts.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <david.jacot@gmail.com>
2025-01-11 09:42:39 -08:00
Matthias J. Sax f54cfff1dc
MINOR: simplify producer TX abort error handling (#18486)
Reviewers: Justine Olshan <jolshan@confluent.io>, Jason Gustafson <jason@responsive.dev>
2025-01-10 17:54:40 -08:00
Matthias J. Sax 3b38b016c8
KAFKA-17825: Update docs for ByteBufferDeserializer changes in 3.6 release (#18466)
KIP-863 introduced a change to ByteBufferDeserializer which is not
properly documented, but should be called out because it could surface
bugs in application code which using ByteBufferDeserializer.

Reviewers:  Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-10 15:32:51 -08:00
PoAn Yang 2b7c039971
KAFKA-18440: Convert AuthorizationException to fatal error in AdminClient (#18435)
Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-10 11:12:28 +01:00
Colt McNealy bb22eec478
KAFKA-17455: fix stuck producer when throttling or retrying (#17527)
A producer might get stuck after it was throttled. This PR unblocks the producer by polling again
after pollDelayMs in NetworkUtils#awaitReady().

Reviewers: Matthias J. Sax <matthias@confluent.io>, David Jacot <djacot@confluent.io>
2025-01-09 10:27:04 -08:00
Ismael Juma cf7029c026
KAFKA-13093: Log compaction should write new segments with record version v2 (KIP-724) (#18321)
Convert v0/v1 record batches to v2 during compaction even if said record batches would be
written with no change otherwise. A few important details:

1. V0 compressed record batch with multiple records is converted into single V2 record batch
2. V0 uncompressed records are converted into single record V2 record batches
3. V0 records are converted to V2 records with timestampType set to `CreateTime` and the
timestamp is `-1`.
4. The `KAFKA-4298` workaround is no longer needed since the conversion to V2 fixes
the issue too.
5. Removed a log warning applicable to consumers older than 0.10.1 - they are no longer
supported.
6. Added back the ability to append records with v0/v1 (for testing only).
7. The creation of the leader epoch cache is no longer optional since the record version
config is effectively always V2.

Add integration tests, these tests existed before #18267 - restored, modified and
extended them.

Reviewers: Jun Rao <jun@confluent.io>
2025-01-09 09:37:23 -08:00
xijiu fcd98da9ae
KAFKA-18445 Remove LazyDownConversionRecords and LazyDownConversionRecordsSend (#18445)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-10 00:22:56 +08:00
Ken Huang 64b8b4a632
MINOR: Remove ZooKeeper mentions in Sanitizer (#18420)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-09 14:33:43 +01:00
Andrew Schofield 3f9d2c2db0
KAFKA-18433: Add BatchSize to ShareFetch request (1/N) (#18439)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
2025-01-08 15:29:43 +00:00
ShivsundarR 3c7ed3333d
KAFKA-18397: Added null check before sending background event from ShareConsumeRequestManager. (#18419)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-08 13:56:52 +00:00
Lianet Magrans 0721d21a57
KAFKA-18415: Fix for event queue metric and flaky test (#18416)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-08 14:31:10 +01:00
Peter Lee 08ef22d888
KAFKA-18173 Remove duplicate `assertFutureError` (#18296)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 20:24:35 +08:00
Manikumar Reddy 746ab4dc1e MINOR: Few cleanups 2025-01-08 16:01:40 +05:30
Guang 058f0a94c8
MINOR: Replace deprecated MemberDescription calls in test (#18425)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 17:32:21 +08:00
mingdaoy c40cc5740f
KAFKA-18408 tweak the 'tag' field for BrokerHeartbeatRequest.json, BrokerRegistrationChangeRecord.json and RegisterBrokerRecord.json (#18421)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 04:16:59 +08:00
TengYao Chi abeed20168
KAFKA-10790: Add deadlock detection to producer#flush (#17946)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>, TaiJuWu <tjwu1217@gmail.com>
2025-01-07 16:32:43 +00:00
David Jacot f974b3dcd4
MINOR: Fix typo in CommitRequestManager (#18407)
This patch fix a minor typo in CommitRequestManager.

Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-01-07 12:59:59 +01:00
Matthias J. Sax 738bd928f1
MINOR: cleanup JavaDocs for deprecation warnings (#18402)
Reviewers: Bill Bejeck <bbejeck@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-07 09:18:33 +00:00
Matthias J. Sax 3918f37af1
MINOR: Update Consumer and Producer JavaDocs for committing offsets (#18336)
The consumer/producer JavaDocs still contain instruction for naively
computing the offset to be committed.

This PR updates the JavaDocs with regard to the improvements of KIP-1094.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Lianet Magrans <lmagrans@confluent.io>
2025-01-06 13:39:20 -08:00
Andrew Schofield 53a8b6189a
KAFKA-17539: Application metrics extension for share consumer (#18377)
Reviewers: Christo Lolov <lolovc@amazon.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
2025-01-06 15:06:38 +00:00
Ismael Juma 409a43eff7
MINOR: Collection/Option usage simplification via methods introduced in Java 9 & 11 (#18305)
Relevant methods:
1. `List.of`, `Set.of`, `Map.of` and similar (introduced in Java 9)
2. Optional: `isEmpty` (introduced in Java 11), `stream` (introduced in Java 9).

Reviewers: Mickael Maison <mimaison@users.noreply.github.com>
2025-01-03 16:13:39 -08:00
Ismael Juma 73ab7ee4ea
MINOR: Use `Files.readString/writeString` and `String.repeat` to simplify code (#18372)
The 3 methods were introduced in Java 11.

Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-02 17:50:27 -08:00
Andrew Schofield 0344f8f5ae
KAFKA-18273: KIP-1099 verbose display share group options (#18259)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2025-01-02 09:12:52 +00:00
Ismael Juma 083e344b30
MINOR: Remove PureJavaCrc32C and reflective code for CRC32C (#18361)
This is no longer required since we dropped support for Java 8. Also update `NOTICE*` and
`spotbugs-exclude.xml` files.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>
2025-01-01 02:08:06 -08:00
Apoorv Mittal f88cf57e1c
KAFKA-12469: Deprecated and corrected topic metrics for consumer (KIP-1109) (#18232)
The PR implements the behaviour defined in KIP-1109. It corrects the consumer topic and topic-partition metrics, while deprecating the incorrect ones.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2024-12-31 17:35:47 +00:00
TengYao Chi 96527be90d
KAFKA-18243 Fix compatibility of Loggers class between log4j and log4j2 (#18185)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-30 01:50:52 +08:00
Ismael Juma 3654bc4513
KAFKA-18339: Fix parseRequestHeader error handling (#18340)
A minor refactoring just before merging #18295 introduced a regression and no test failed. Throw the correct exception and add test to verify it. Also refactor the code slightly to make that possible.

Thanks to Chia-Ping for catching the issue.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-29 09:31:14 -08:00
Dongnuo Lyu e47f6983df
MINOR: Replace enum name with state name when parsing `ConsumerGroupState` (#18315)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-29 23:11:55 +08:00
Ken Huang 106fd5f601
KAFKA-18354 Use log4j2 APIs to refactor LogCaptureAppender (#18338)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-29 22:50:53 +08:00
Peter Lee be4d1a6277
KAFKA-18135: ShareConsumer HB UnsupportedVersion msg mixed with Consumer HB (#18101)
Add specific error handling for unsupported version in share consumer and consumer

Reviewers: Lianet Magrans <lmagrans@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2024-12-28 15:39:58 +00:00
PoAn Yang e67172ce6d
MINOR: remove zkBroker from StreamsGroupHeartbeatRequest and StreamsGroupDescribeRequest (#18319)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-28 16:58:29 +08:00
mingdaoy d95cb7d65e
MINOR: remove unused previousPartition from RoundRobinPartitioner.java (#18331)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-28 16:33:32 +08:00
Ismael Juma 875da35ec3
KAFKA-18339: Remove raw unversioned direct SASL protocol (KIP-896) (#18295)
Clients that support SASL but don't implement KIP-43 (eg Kafka producer/consumer 0.9.0.x) will
fail to connect after this change.

Added unit tests and also manually tested with the console producer 0.9.0.

While testing, I noticed that the logged message when a 0.9.0 Java client is used without sasl is
slightly misleading - fixed that too.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2024-12-27 10:23:25 -08:00
PoAn Yang e6d2421136
KAFKA-18295 Remove deprecated function Partitioner#onNewBatch (#18282)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-27 18:40:19 +08:00
Ismael Juma dfb178a1d8
KAFKA-18272: Deprecated protocol api usage should be logged at info level (#18313)
This makes it possible to enable request logs for deprecated protocol api versions without enabling it for the rest. Combined with the ability to enable/disable dynamically, it makes it a bit easier to collect the information about deprecated clients that is not available via metrics.

This isn't particularly useful in trunk/4.0 since there are no deprecated api versions in these versions, but it will be useful for older branches. I intend to backport to those branches and add a release note in the backport regarding the change in behavior.

I manually verified that:
1. If the request logger is configured at `INFO` level, only deprecated protocol api versions are logged and they are logged at `INFO` level.
2. If the request logger is configured at `DEBUG` level, all requests are logged but the log level is `INFO` for deprecated protocol api versions and `DEBUG` for the rest.
3. If the request logger is configured at `WARN` level (the default), no requests are logged.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-27 00:04:19 -08:00
xijiu 6139840e98
KAFKA-18348 Remove the deprecated MockConsumer#setException (#18317)
Reviewers: TengYao Chi <kitingiao@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-27 12:31:23 +08:00
Ismael Juma 2e3b4cb957
KAFKA-18352: Add back DeleteGroups v0, it incorrectly tagged as deprecated (#18324)
The Sarama version listed in the KIP only supports v0.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-26 16:53:50 -08:00
Chung, Ming-Yen 88adb94c81
KAFKA-18093 Remove deprecated DeleteTopicsResult#values (#18250)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-25 14:46:14 +08:00
Logan Zhu 4d6535e60d
KAFKA-18290 Remove deprecated methods of FeatureUpdate (#18246)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-24 14:53:32 +08:00
Nick Guo 1a3dce72fa
KAFKA-18289 Remove deprecated methods of DescribeTopicsResult (#18255)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-24 01:48:15 +08:00
Nick Guo 1cf514313e
KAFKA-18291 Remove deprecated methods of ListConsumerGroupOffsetsOptions (#18265)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-24 01:37:10 +08:00
PoAn Yang b4be178599
KAFKA-17393: Remove log.message.format.version/message.format.version (KIP-724) (#18267)
Based on [KIP-724](https://cwiki.apache.org/confluence/display/KAFKA/KIP-724%3A+Drop+support+for+message+formats+v0+and+v1), the `log.message.format.version` and `message.format.version` can be removed in 4.0.

These configs effectively a no-op with inter-broker protocol version 3.0 or higher
since Apache Kafka 3.0, so the impact should be minimal.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2024-12-21 15:35:15 -08:00
Justine Olshan 8bd3746e0c
KAFKA-17705: Add Transactions V2 system tests and mark as production ready (#18132)
Added transaction version 2 to some of the system tests. Also marking TV2 as production ready.

Also fixes the defaultVersion test. 

Reviewers: Jun Rao <jun@confluent.io>
2024-12-21 14:01:54 -08:00
Ismael Juma fe56fc98fa
KAFKA-18269: Remove deprecated protocol APIs support (KIP-896, KIP-724) (#18218)
Included in this change:
1. Remove deprecated protocol api versions from json files.
3. Remove fields that are no longer used from json files (affects ListOffsets, OffsetCommit, DescribeConfigs).
4. Remove record down-conversion support from KafkaApis.
5. No longer return `Errors.UNSUPPORTED_COMPRESSION_TYPE` on the fetch path[1].
6. Deprecate `TopicConfig. MESSAGE_DOWNCONVERSION_ENABLE_CONFIG` and made the relevant
configs (`message.downconversion.enable` and `log.message.downcoversion.enable`) no-ops since
down-conversion is no longer supported. It was an oversight not to deprecate this via KIP-724.
7. Fix `shouldRetainsBufferReference` to handle null request schemas for a given version.
8. Simplify producer logic since it only supports the v2 record format now.
9. Fix tests so they don't exercise protocol api versions that have been removed.
10. Add upgrade note.

Testing:
1. System tests have a lot of failures, but those tests fail for trunk too and I didn't see any issues specific to this change - it's hard to be sure given the number of failing tests, but let's not block on that given the other testing that has been done (see below).
3. Java producers and consumers with version 0.9-0.10.1 don't have api versions support and hence they fail in an ungraceful manner: the broker disconnects and the clients reconnect until the relevant timeout is triggered.
4. Same thing seems to happen for the console producer 0.10.2 although it's unclear why since api versions should be supported. I will look into this separately, it's unlikely to be related to this PR.
5. Console consumer 0.10.2 fails with the expected error and a reasonable message[2].
6. Console producer and consumer 0.11.0 works fine, newer versions should naturally also work fine.
7. kcat 1.5.0 (based on librdkafka 1.1.0) produce and consume fail with a reasonable message[3][4].
8. kcat 1.6.0-1.7.0 (based on librdkafka 1.5.0 and 1.7.0 respectively) consume fails with a reasonable message[5].
9. kcat 1.6.0-1.7.0 produce works fine.
10. kcat 1.7.1  (based on librdkafka 1.8.2) works fine for consumer and produce.
11. confluent-go-client (librdkafka based) 1.8.2 works fine for consumer and produce.
12. I will test more clients, but I don't think we need to block the PR on that.

Note that this also completes part of KIP-724: produce v2 and lower as well as fetch v3 and lower are no longer supported.

Future PRs will remove conditional code that is no longer needed (some of that has been done in KafkaApis,
but only what was required due to the schema changes). We can probably do that in master only as it does
not change behavior.

Note that I did not touch `ignorable` fields even though some of them could have been
changed. The reasoning is that this could result in incompatible changes for clients
that use new protocol versions without setting such fields _if_ we don't manually
validate their presence. I will file a JIRA ticket to look into this carefully for each
case (i.e. if we do validate their presence for the appropriate versions, we can
set them to ignorable=false in the json file).

[1] We would return this error if a fetch < v10 was used and the compression topic config was set
to zstd, but we would not do the same for the case where zstd was compressed at the producer
level (the most common case). Since there is no efficient way to do the check for the common
case, I made it consistent for both by having no checks.
[2] ```org.apache.kafka.common.errors.UnsupportedVersionException: The broker is too new to support JOIN_GROUP version 1```
[3]```METADATA|rdkafka#producer-1| [thrd:main]: localhost:9092/bootstrap: Metadata request failed: connected: Local: Required feature not supported by broker (0ms): Permanent```
[4]```METADATA|rdkafka#consumer-1| [thrd:main]: localhost:9092/bootstrap: Metadata request failed: connected: Local: Required feature not supported by broker (0ms): Permanent```
[5] `ERROR: Topic test-topic [0] error: Failed to query logical offset END: Local: Required feature not supported by broker`

Reviewers: David Arthur <mumrah@gmail.com>
2024-12-20 19:52:00 -08:00
Ismael Juma 288d4de834
KAFKA-18334: Produce v4-v6 should be undeprecated (#18288)
Librdkafka totally breaks if produce v3 is removed - it starts sending records with record format v0.
These api versions have to be undeprecated - KIP-896 has been updated.

Reviewers: David Arthur <mumrah@gmail.com>
2024-12-20 16:59:51 -08:00
Chirag Wadhwa 6e61bfee91
KAFKA-18312: Added entityType: topicName to SubscribedTopicNames in ShareGroupHeartbeatRequest.json (#18285)
This PR Adds entityType: topicName to SubscribedTopicNames in ShareGroupHeartbeatRequest.json as per the recent update to kip-932.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2024-12-20 21:31:47 +05:30
David Jacot d67379c310
KAFKA-18301; Make coordinator records first class citizen (#18261)
This patch is the first one in a series to improve how coordinator records are managed. It focuses on making coordinator records first class citizen in the generator.
* Introduce `coordinator-key` and `coordinator-value` in the schema;
* Introduce `apiKey` for those. This is done to avoid relying on the version to determine the type.
* It also allows the generator to enforce some rules: the key cannot use flexible versions, the key must have a single version `0`, there must be a key and a value for a given api key, etc.
* It generates an enum with all the coordinator record types. This is pretty handy in the code.

The patch also updates the group coordinators to use those.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2024-12-20 12:16:14 +01:00
PoAn Yang 753a003480
KAFKA-18262 Remove DefaultPartitioner and UniformStickyPartitioner (#18204)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-20 15:33:51 +08:00
ClarkChen bd27e34f2d
KAFKA-18292 Remove deprecated methods of UpdateFeaturesOptions (#18245)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-20 11:26:45 +08:00
David Arthur 64279d2e82
Mark flaky tests for Dec 18, 2024 (#18263)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2024-12-19 10:12:33 -05:00
Ismael Juma 22d1ba8265
KAFKA-18270: FindCoordinator v0 incorrectly tagged as deprecated (#18262)
It's now consistent with KIP-896.
2024-12-19 06:25:17 -08:00
Justine Olshan f7f3cffb48
KAFKA-18227: Ensure v2 partitions are not added to last transaction during upgrade (#18176)
We want to bump the epoch if we are upgrading to TV2. Given that we already have code in place for this, I thought we could piggyback on the completing transaction epoch bump logic. For just initializing producers, I moved the check to the end of InitTransaction. Note, we have do do this check after we initialize the producer ID to ensure we have updated ApiVersions correctly.

Reviewers: Calvin Liu <caliu@confluent.io>, Artem Livshits <alivshits@confluent.io>, Jeff Kim <jeff.kim@confluent.io>
2024-12-18 16:15:08 -08:00
Lucas Brutschy 0055ef0a49
KAFKA-18283: Add StreamsGroupDescribe RPC definitions (#18230)
Adds a new RPC StreamsGroupDescribe that returns, given the group ID, all metadata related to the streams group, such as

 - The topology metadata of the group.
 - The topology epoch of the group.
 - The latest member metadata that each member provided through the StreamsGroupHeartbeat API.
 - The current target assignment generated by the assignor.
 - This just adds the JSON as defined in KIP-1071, together with some plumbing.

Reviewers: Bill Bejeck <bbejeck@gmail.com>
2024-12-18 19:38:01 +01:00
Nick Guo 21b7bb2265
KAFKA-18264 Remove NotLeaderForPartitionException (#18211)
Reviewers: Yung <yungyung7654321@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-19 00:56:13 +08:00
Chung, Ming-Yen 90ff97b51d
KAFKA-18094 Remove deprecated TopicListing(String, Boolean) (#18248)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-18 18:45:21 +08:00
Lucas Brutschy ec32c8a376
KAFKA-18282: Add StreamsGroupHeartbeat RPC definitions (#18227)
The StreamsGroupHeartbeat API is the new core API used by streams application to form a group. The API allows members to initialize a topology, advertise their state, and their owned tasks. The group coordinator uses it to assign/revoke tasks to/from members. This API is also used as a liveness check.

This change adds the JSON definition of the RPC, as defined in KIP-1071.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2024-12-18 11:43:44 +01:00
Xuan-Zhang Gong 346e5dc322
KAFKA-18293 Remove `org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerLoginCallbackHandler` and `org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerValidatorCallbackHandler` (#18244)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-18 14:47:57 +08:00
Yaroslav Kutsela 337fb8cec1
MINOR, DOCS : Fixed old and added new javadocs to org.apache.kafka.common.utils.Utils (#18162)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2024-12-17 08:16:00 +00:00
Ismael Juma cd5a7ee6b2
KAFKA-18270: SaslHandshake v0 incorrectly tagged as deprecated (#18221)
It's now consistent with KIP-896.

Reviewers: David Arthur <mumrah@gmail.com>
2024-12-16 20:52:30 -08:00
TengYao Chi 4aee33d6ab
KAFKA-18259: Documentation for consumer auto.offset.reset contains invalid HTML (#18210)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2024-12-16 16:20:30 +00:00
TengYao Chi 0781b1bad3
KAFKA-15474 Disable flaky testWakeupAfterSyncGroupReceivedExternalCompletion (#18188)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-15 15:41:51 +08:00
Kuan-Po Tseng 0815d70592
KAFKA-18160 Interrupting or waking up onPartitionsAssigned in AsyncConsumer can cause the ConsumerRebalanceListenerCallbackCompletedEvent to be skipped (#18089)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-15 10:47:30 +08:00
Mickael Maison 57eb5fd7dc
KAFKA-14587: Move AclCommand to tools (#17880)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 20:05:46 +01:00
Kamal Chandraprakash 139e5b15a1
KAFKA-17928: Make remote log manager thread-pool configs dynamic (#17859)
- Disallow configuring -1 for copier and expiration thread pools dynamically

Co-authored-by: Peter Lee <peterxcli@gmail.com>

Reviewers: Peter Lee <peterxcli@gmail.com>, Satish Duggana <satishd@apache.org>
2024-12-14 13:14:05 +05:30
Alyssa Huang b73e31eb15
KAFKA-17641; Update Vote RPC with PreVote field (#17807)
Introduces v2 of Vote RPC and implements the handling of the new version of the RPC.

Many references to "candidate" in the Vote RPC are changed to the more generic "replica". Replicas sending Vote request with PreVote set to true are not candidate. They are instead prospective candidate that are attempting to become candidate.

Replicas receiving PreVote requests (vote request with PreVote=true) with an epoch equal to their own will _not_ transition to Unattached state. They will only grant the vote if they have not recently fetched from leader and the request's last epoch and offset are up-to-date with theirs.

If a replica receives a PreVote request with an epoch greater than their current epoch, they will transition to Unattached state (setting their epoch to the one from the pre-vote request) and then grant the vote if the request's last epoch and offset are up-to-date with theirs.

To avoid a possible ping-pong scenario. For example, there is 3 node quorum, leader node A disconnects from quorum, node B goes into prospective state first before node C, node B sends pre-vote request to node C still in follower state and receives back that node A is leader, node B transitions to follower while node C transitions to prospective after election timeout. If you repeat this interaction, it is possible for such replicas to transition from Follower to Prospective in perpetuity. This issue is resolved by having follower state nodes grant pre-vote requests only if they have successfully fetched from the leader at least once after becoming a follower.

This change introduces a new suite called KafkaRaftClientPreVoteTest, for additional KRaft protocol tests with respect to pre-vote.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2024-12-13 16:24:30 -05:00
TengYao Chi b37b89c668
KAFKA-9366 Upgrade log4j to log4j2 (#17373)
This pull request replaces Log4j with Log4j2 across the entire project, including dependencies, configurations, and code. The notable changes are listed below:

1. Introduce Log4j2 Instead of Log4j
2. Change Configuration File Format from Properties to YAML
3. Adds warnings to notify users if they are still using Log4j properties, encouraging them to transition to Log4j2 configurations

Co-authored-by: Lee Dongjin <dongjin@apache.org>

Reviewers: Luke Chen <showuon@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-14 01:14:31 +08:00
Sean Quah b94defa189
KAFKA-18199; Fix size calculation for nullable tagged structs (#18127)
When a struct field is tagged and nullable, it is serialized as
{ varint tag; varint dataLength; nullable data }, where
nullable is serialized as
{ varint isNotNull; if (isNotNull) struct s; }. The length field
includes the is-not-null varint.

This patch fixes a bug in serialization where the written value of
the length field and the value used to compute the size of the length
field differs by 1. In practice this has no impact unless the
serialized length of the struct is 127 bytes, since the varint encodings
of 127 and 128 have different lengths (0x7f vs 0x80 01).

Reviewers: David Jacot <djacot@confluent.io>
2024-12-13 04:31:53 -08:00
PoAn Yang 770d64d2cc
KAFKA-16143: New JMX metrics for AsyncKafkaConsumer (#17199)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Kirk True <ktrue@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2024-12-13 07:20:27 -05:00
Gantigmaa Selenge 747dc172e8
KIP-1073: Return fenced brokers in DescribeCluster response (#17524)
mplementation of KIP-1073: Return fenced brokers in DescribeCluster response.
Add new unit and integration tests for describeCluster.

Reviewers: Luke Chen <showuon@gmail.com>
2024-12-13 10:58:11 +08:00
Matthias J. Sax 6cdb8c352a
KAFKA-18015: add byDuration auto.offset.reset to Kafka Streams (#18115)
Part of KIP-1106.

Adds support for "by_duration" and "none" reset strategy
to the Kafka Streams runtime.

Reviewers: Bill Bejeck <bill@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2024-12-11 15:12:16 -08:00
Apoorv Mittal a1703e2cca
KAFKA-17040: Removing exception on further calls to terminated telemetry reporter (#18143)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2024-12-11 15:47:45 -05:00
Ken Huang 23de98cdc5
KAFKA-17554 disable testFutureCompletionOutsidePoll (#18138)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-12 00:54:36 +08:00
Kirk True d09e222846
KAFKA-18189: CoordinatorRequestManager log message can include incorrect coordinator disconnect time (#18109)
Fixed logic in markCoordinatorUnknown to ensure the warning log contains the correct number of milliseconds the client has been disconnected.

Reviewers: Christo Lolov <lolovc@amazon.com>
2024-12-11 16:22:51 +00:00
Kuan-Po Tseng d2ad418cfd
KAFKA-18156 VerifiableConsumer should ignore "--session-timeout" when using CONSUMER protocol (#18036)
Reviewers: TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-11 21:12:46 +08:00
Lianet Magrans dadc7cc477
update test configs (#18123)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Kirk True <ktrue@confluent.io>
2024-12-10 13:11:32 -05:00
ShivsundarR 7a31b9eae8
Add null check (#18119)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2024-12-10 17:38:25 +00:00
TengYao Chi f57fd2d9fd
MINOR: Logs warning message when user invoke producer#flush within callback (#18112)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2024-12-10 15:27:42 +00:00
PoAn Yang c8380ae779
KAFKA-17750: Extend kafka-consumer-groups command line tool to support new consumer group (part 2) (#18034)
* Add fields `groupEpoch` and `targetAssignmentEpoch` to `ConsumerGroupDescription.java`.
* Add fields `memberEpoch` and `upgraded` to `MemberDescription.java`.
* Add assertion to `PlaintextAdminIntegrationTest#testDescribeClassicGroups` to make sure member in classic group returns `upgraded` as `Optional.empty`.
* Add new case `testConsumerGroupWithMemberMigration` to `PlaintextAdminIntegrationTest` to make sure migration member has correct `upgraded` value. Add assertion for `groupEpoch`, `targetAssignmentEpoch`, `memberEpoch` as well.

Reviewers: David Jacot <djacot@confluent.io>

Signed-off-by: PoAn Yang <payang@apache.org>
2024-12-10 05:02:20 -08:00
ShivsundarR 77ac31b36a
KAFKA-18164: Clear existing acknowledgements on share session epoch reset. (#18063)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2024-12-09 21:03:49 +00:00
Ken Huang d76238a18f
KAFKA-17696 New consumer background operations unaware of metadata errors (#17440)
Reviewers: Kirk True <ktrue@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2024-12-09 09:31:14 -05:00
yx9o 38e727fe4d
KAFKA-17864: add descriptions to fields in the agreement (#17681)
Improve descriptive information in Kafka protocol documentation.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>
2024-12-07 18:47:11 +00:00
Mickael Maison e255433374
KAFKA-18162 Move LocalLogTest to storage module (#18057)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-07 10:19:56 +08:00
TengYao Chi 9ee3247281
MINOR: Fix broken javadoc in NetworkClientTest (#18075)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-07 01:37:30 +08:00
Calvin Liu 755adf8a56
KAFKA-14563: RemoveClient-Side AddPartitionsToTxn Requests (#17698)
Removes the client side AddPartitionsToTxn/AddOffsetsToTxn calls so that the partition is implicitly added as part of KIP-890 part 2. 

This change also requires updating the valid state transitions. The client side can not know for certain if a partition has been added server side when the request times out (partial completion). Thus for TV2, the transition to PrepareAbort is now valid for Empty, CompleteCommit, and CompleteAbort. 

For readability, the V1 and V2 endTransaction methods have been separated. 

Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>, Ritika Reddy <rreddy@confluent.io>
2024-12-06 09:00:04 -08:00
Andrew Schofield e7d986e48c
KAFKA-17550: DescribeGroups v6 exploitation (#17706)
This PR introduces the DescribeGroups v6 API as part of KIP-1043. This adds an error message for the described groups so that it is possible to get some context on the error. It also changes the behaviour for when the group ID cannot be found but returning error code GROUP_ID_NOT_FOUND rather than NONE.

Reviewers: David Jacot <djacot@confluent.io>
2024-12-05 23:12:24 -08:00
Lianet Magrans 36b48536f6
MINOR: Fix broken test (#18062)
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, TaiJuWu <tjwu1217@gmail.com>
2024-12-05 21:31:52 -05:00
ShivsundarR 8fde6dedea
KAFKA-18155 : Fix bug in response handler for ShareAcknowledge (#18029)
In the response handler for ShareAcknowledge, we are passing the clientResponse.receivedTimeMs() to the handler methods. But when there is a disconnect or when the response received is null, we should be passing the current time instead.

This bug was causing consumer to hang as it did not call the handler methods on disconnect, and further requests were blocked waiting for its completion.

Reviewers: Andrew Schofield <aschofield@confluent.io>,  Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
2024-12-05 12:59:13 +05:30
Ken Huang 2b43c49f51
KAFKA-18050 Upgrade the checkstyle version to 10.20.2 (#17999)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-05 10:59:18 +08:00
Kirk True 4362ab7090
KAFKA-17947: Update currentLag(), pause(), and resume() to update SubscriptionState in background thread (#17699)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2024-12-04 21:31:44 -05:00
Lianet Magrans bd0ea70912
KAFKA-18096: Allow join with regex if no matching topics (#18024)
Reviewers: David Jacot <djacot@confluent.io>
2024-12-04 11:35:42 -05:00
Lianet Magrans f60382bf21
KAFKA-18127 Validate SubscriptionPattern used on v0 HB (#17989)
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-04 19:55:12 +08:00
PoAn Yang fe88232b07
KAFKA-17750 Extend kafka-consumer-groups command line tool to support new consumer group (part 1) (#17958)
1) Bump validVersions of ConsumerGroupDescribeRequest.json and ConsumerGroupDescribeResponse.json to "0-1".

2) Add MemberType field to ConsumerGroupDescribeResponse.json. Default value is -1 (unknown). 0 is for classic member and 1 is for consumer member.

3) When ConsumerGroupMember#useClassicProtocol is true, return MemberType field as 0. Otherwise, return 1.

Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-04 06:08:39 +08:00
Kuan-Po Tseng ac8b3dfbf0
KAFKA-18060 new coordinator does not handle TxnOffsetCommitRequest with empty member id when using CONSUMER group (#17914)
There are two issues in KAFKA-18060:

1) New coordinator can't handle the TxnOffsetCommitRequest with empty member id, and TxnOffsetCommitRequest v0-v2 do definitely has an empty member ID, causing ConsumerGroup#validateOffsetCommit to throw an UnknownMemberIdException. This prevents the old producer from calling sendOffsetsToTransaction. Note that TxnOffsetCommitRequest versions v0-v2 are included in KIP-896, so it seems the new coordinator should support that operations

2) The deprecated API Producer#sendOffsetsToTransaction does not use v0-v2 to send TxnOffsetCommitRequest with an empty member ID. Unfortunately, it has been released for a while. Therefore, the new coordinator needs to handle TxnOffsetCommitRequest with an empty member ID for all versions.

Taken from the two issues above, we need to handle empty member id in all API versions when new coordinator are dealing with TxnOffsetCommitRequest.

Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-04 02:55:19 +08:00
Ken Huang ff44f5e0a5
KAFKA-17554 Flaky testFutureCompletionOutsidePoll in ConsumerNetworkClientTest (#17217)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-03 08:58:56 +08:00
TengYao Chi 6fd951a9c0
KAFKA-17610 Drop alterConfigs (#18002)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-02 23:26:06 +08:00
Manikumar Reddy ae3c5dec99
KAFKA-18013: Add support for duration based offset reset strategy to Kafka Consumer (#17972)
Update AutoOffsetResetStrategy.java to support duration based reset strategy
Update OffsetFetcher related classes and unit tests

Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2024-11-29 22:38:57 +05:30
Lianet Magrans 6237325fb1
KAFKA-15561 [5/N]: Integration tests for new subscribe API with Re2J pattern (#17964)
- integration tests for new subscribe api with RE2J pattern
- fix to ensure all topics are included in metadata requests when consumer is subscribed to RE2J pattern

Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2024-11-30 01:02:39 +08:00
Ken Huang 9d23f89e05
KAFKA-17338 ConsumerConfig should prevent using partition assignors with CONSUMER group protocol (#16899)
Reviewers: Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Lianet Magrans <lmagrans@confluent.io>
2024-11-29 09:36:29 -05:00
Andrew Schofield e7bbcdb251
KAFKA-18090: Add ShareMemberDescription and Assignment (#17975)
Introduce ShareMemberDescription and ShareMemberAssignment as distinct classes for share groups. Although the correspondence with consumer groups is fairly close, the concepts are likely to diverge over time and separating these concepts now makes sense.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2024-11-29 10:20:01 +05:30
HYUNSANG HAN (한현상, Travis) 700bdd5fee
KAFKA-17997 Remove deprecated config log.message.timestamp.difference.max.ms (#17928)
Reviewers: Divij Vaidya <diviv@amazon.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-11-29 03:15:46 +08:00
Chia-Chuan Yu c446e799be
KAFKA-17010 Remove `DescribeLogDirsResponse#LogDirInfo`, `DescribeLogDirsResponse#ReplicaInfo`, and `DescribeLogDirsResult#all` (#17953)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-28 04:42:34 +08:00
Lianet Magrans a39c984d21
KAFKA-15561 [4/N]: MockConsumer support for SubscriptionPattern (#17962)
Reviewers: David Jacot <djacot@confluent.io>
2024-11-27 14:33:28 -05:00
David Jacot aae42ef656
KAFKA-17593; [9/N] Mark ConsumerGroupHeartbeat API v1 as stable (#17961)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2024-11-27 13:03:46 -05:00
Lianet Magrans 37b4d9b01d
KAFKA-15561 [3/N]: Client support for SubscriptionPattern in HB (#17951)
Reviewers: David Jacot <djacot@confluent.io>
2024-11-27 12:01:12 -05:00
Ken Huang c32a49549d
MINOR: Remove duplicate valid value in document (#17947)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-27 21:07:36 +08:00
ShivsundarR 866d66229d
KAFKA-18056: Fixed bug in handling commitAsync responses (#17909)
There was a bug in handling the ShareAcknowledgeResponse for commitAsync(). Currently after we receive a response, we send out a background event to the application thread to update the acknowledgement commit callbacks for EVERY TopicIdPartition.
The map that was sent was not cleared after sending the event. This meant we ended up sending responses for partitions that were already sent in the previous event. So there will be duplicate calls to the callback.

The PR fixes the bug and adds a unit test for the same.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
2024-11-26 20:15:19 +05:30
Lianet Magrans 0b081fc310
KAFKA-15561 [2/N]: Background event and subscription state changes for RE2J pattern (#17918)
Reviewers: David Jacot <djacot@confluent.io>
2024-11-26 14:49:13 +01:00
Andrew Schofield 5480d54d18
KAFKA-17544: Add log message for early access use of KafkaShareConsumer (#17940)
When a KafkaShareConsumer is constructed in AK 4.0, a log message is written warning about the early access nature of the feature.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2024-11-26 10:15:43 +05:30
Ritika Reddy 4fc9e442c3
KAFKA-17898: Refine Epoch Bumping Logic (#17849)
With KAFKA-14562, we implemented epoch bump on both the client and the server. Mentioned below are the different epoch bump scenarios we have on hand after enabled tv2

Non-Transactional Producers
• Epoch bumping is always allowed.
• Different code paths are used to handle epoch bumping.

Transactional Producers

No Epoch Bump Allowed
• coordinatorSupportsBumpingEpoch = false when initPIDVersion < 3 or initPIDVersion = null.

Client-Triggered Epoch Bump Allowed
• coordinatorSupportsBumpingEpoch = true when initPIDVersion >= 3.
• TransactionVersion2Enabled = false when endTxnVersion < 5.

Only Server-Triggered Epoch Bump Allowed
• TransactionVersion2Enabled = true and endTxnVersion >= 5.

We want to refine the code and make it more structured to correctly handle epoch bumping in the above mentioned cases.

The changes made in this patch are:

Rename epochBumpRequired to epochBumpTriggerRequired to symbolize a manual epoch bump request from the client
Modify canEpochBump method according to the above mentioned scenarios

Reviewers: Artem Livshits <alivshits@confluent.io>, Calvin Liu <caliu@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-11-25 14:29:15 -08:00
Bill Bejeck 7f8a592ad1
KAFKA-17869: Adding tests to ensure KIP-1076 doesn't interfere producer metrics[2/3] (#17783)
Adding producer tests to ensure the KIP-1076 methods don't interfere with existing metrics
Reviewers: Matthias Sax <mjsax@apache.org>
2024-11-25 16:24:16 -05:00
Rajini Sivaram 0f33b16fdf
KAFKA-18085: Abort inflight requests on existing connections while rebootstrapping (#17939)
When disconnecting channels before rebootstrapping due to the rebootstrap conditions introduced in KIP-1102, we should ensure that inflight requests are aborted similar to other disconnections like request timeout in clients. With the earlier rebootstrapping from KIP-899, we only rebootstrapped when there were no connections, so no disconnections are required.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2024-11-25 17:58:11 +00:00
Andrew Schofield d17a149205
KAFKA-17956 Remove Admin.listShareGroups (#17912)
KIP-1043 introduced Admin.listGroups as the way to list all types of groups. As a result, Admin.listShareGroups has been removed. This PR is the final step of the removal.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-25 22:05:35 +08:00
ClarkChen 54843e6e1e
KAFKA-18077 Remove deprecated JmxReporter(String) (#17923)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-25 21:54:50 +08:00
Manikumar Reddy 3268435fd6
KAFKA-18013: Add AutoOffsetResetStrategy internal class (#17858)
- Deprecates OffsetResetStrategy enum
- Adds new internal class AutoOffsetResetStrategy
- Replaces all OffsetResetStrategy enum usages with AutoOffsetResetStrategy
- Deprecate old/Add new constructors to MockConsumer

 Reviewers: Andrew Schofield <aschofield@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2024-11-25 19:11:12 +05:30