Commit Graph

15172 Commits

Author SHA1 Message Date
TaiJuWu 1c82b89b4c
KAFKA-18712 Move Endpoint to server module (#18803)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Mickael Maison <mickael.maison@gmail.com>, Christo Lolov <lolovc@amazon.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-25 14:02:51 +08:00
PoAn Yang 10873e4210
KAFKA-18281: Kafka is improperly validating non-advertised listeners for routable controller addresses (#18387)
When a cluster is configured with a dynamic controller quorum, KRaft replica's endpoint are computed using the advertised.listeners property and not the quorum.controller.voters property. This change in the configuration makes it difficult to keeping all previous node configurations compatible with the new endpoint discovery functionality.

The least intrusive solution is to rely on Kafka's reverse hostname lookup when the hostname is not specified. The effective advertised controller listener now remove '0.0.0.0' hostname if the endpoint came from the listener configuration and not the advertised.listener configuration.

Reviewers: José Armando García Sancio <jsancio@apache.org>, Alyssa Huang <ahuang@confluent.io>
2025-02-24 21:51:28 -05:00
Nick Guo d23a61738a
KAFKA-17937 Cleanup AbstractFetcherThreadTest (#18900)
- Remove AbstractFetcherThreadWithIbp26Test as it tests unsupported IBP
- cleanup AbstractFetcherThreadTest to remove unreachable paths, variables, and code

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-25 07:45:47 +08:00
Calvin Liu 009bee75ab
KIP-966 part 1 release doc (#18898)
Add notes to explain how ELR and how to manage ELR.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2025-02-24 15:19:18 -08:00
David Arthur cb33e98dfc
KAFKA-18748 Run new tests separately in PRs (#18770)
Split the JUnit tests into "new", "flaky", and the remainder. 

On PR builds, "new" tests are anything that do not exist on trunk. They are run with zero tolerance for flakiness. 

On trunk builds, "new" tests are anything added in the last 7 days. They are run with some tolerance for flakiness.

Another change included here is that we will not update the test catalog if any test job fails on a trunk build. We have had difficulty determining if all the tests had or not (due to timeout or failures in upstream Gradle tasks). By requiring green ":test" jobs, we can be sure that the resulting catalog will be valid.

---

The purpose of this change is to discourage contributors from adding flaky tests, but give some leeway for trunk so we have successful builds.

The "quarantinedTest" Gradle target has been consolidated into the regular "test" target. There are now some
runtime properties to control what tests are run.

* kafka.test.catalog.file: path to test catalog
* kafka.test.run.new: include new tests. this selection depends on the age of the loaded test catalog
* kafka.test.run.flaky: include tests marked as `@Flaky` (replaces the `excludeTags 'flaky'` directive)
* kafka.test.verbose: include additional logging from new JUnit classes (enabled by default if re-running GitHub workflow with debug logging enabled)
* maxTestRetries: how many retries to allow via Develocity retry plugin (default 0)
* maxTestRetryFailures: how many failures to allow before stopping retries (default 0)


Thanks to Jun Rao for inspiring the idea.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
2025-02-24 17:08:15 -05:00
Calvin Liu 10da082184
MINOR: update truncation test (#18952)
Reduce the minISR to be 1 for the truncation test in order to skip the protection from KIP-966

Reviewers: David Jacot <djacot@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-25 04:32:29 +08:00
Apoorv Mittal 48a506b7b8
KAFKA-18522: Slice records for share fetch (#18804)
The PR handles slicing of fetched records based on acquire response for
share fetch. There could be additional bytes fetched from log but
acquired offsets can be a subset, typically with `max fetch records`
configuration. Rather sending additional bytes of fetched data to client
we should slice the file and wire only needed batches.

Note: If the acquired offsets are within a batch then we need to send
the entire batch within the file record. Hence rather checking for
individual batches, PR finds the first and last acquired offset, and
trims the file for all batches between (inclusive) these two offsets.

Reviewers: Christo Lolov <lolovc@amazon.com>, Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>
2025-02-24 09:55:24 -08:00
Ismael Juma 38c984307c
MINOR: Test showing MetadataLoader waits until metadata version is known (#19012)
Reviewers: David Arthur <mumrah@gmail.com>
2025-02-24 08:38:45 -08:00
Ismael Juma 48527a1e7f
MINOR: Clean-up imports, imports and unused parameter in upgrade_test.py (#19018)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-24 06:35:07 -08:00
Sebastien Viale 3ce5f23295
KAFKA-18023: Enforcing Explicit Naming for Kafka Streams Internal Topics (#18233)
Pull request to implement KIP-1111, aims to add a configuration that
prevents a Kafka Streams application from starting if any of its
internal topics have auto-generated names, thereby enforcing explicit
naming for all internal topics and enhancing the stability of the
application’s topology.

- Repartition Topics:

All repartition topics are created in the
KStreamImpl.createRepartitionedSource(...) static method. This method
either receives a name explicitly provided by the user or null and then
builds the final repartition topic name.

- Changelog Topics and State Store Names:

There are several scenarios where these are created:
  In the MaterializedInternal constructor.
  During KStream/KStream joins.
  During KStream/KTable joins with grace periods.
  With key-value buffers are used in suppressions.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Sophie Blee-Goldman <sophie@responsive.dev>
2025-02-24 11:41:42 +01:00
Shivsundar R 2880e04129
KAFKA-18779: Validate responses from broker in client for ShareFetch and ShareAcknowledge RPCs. (#18939)
- Currently if we received extraneous topic partitions in the response
or if the response was missing some partitions requested, we were
processing the response as it came and even populated the callback with
these partitions.

- These invalid responses should be parsed at the
`ShareConsumeRequestManager`.

- If the response missed any acknowledgements for partitions that were
requested, then we fail the request with `InvalidRecordStateException`
and populate the callbacks.

- For any extraneous partitions in the response, we log an error and
ignore them.

Some refactors are also done in this PR in ShareConsumeRequestManager to
make the code more readable.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-24 10:27:24 +00:00
mingdaoy 289e958c39
MINOR: Fix validateResourceNameIsNodeId's exception message (#19017)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-24 09:30:02 +08:00
Sushant Mahajan 6e76736890
KAFKA-18827: Initialize share group state persister impl [2/N]. (#18992)
* In this PR, we have provided implementation for the initialize share
group state RPC from the persister perspective.
* Tests have been added wherever applicable.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-23 08:03:13 +00:00
Calvin Liu a1372ced69
KAFKA-15583 doc update for the "strict min ISR" rule (#18880)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Dave Troiano <dtroiano@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-23 13:06:50 +08:00
Ismael Juma 13cb87c2d0
MINOR: Remove request log space added inadvertently (#19011)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-23 11:30:19 +08:00
Sanskar Jhajharia a206feb4ba
MINOR: Clean up share-coordinator (#19007)
Given that now we support Java 17 on our brokers, this PR replace the use of `Collections.singletonList()` and `Collections.emptyList()` with `List.of()`

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-23 11:27:38 +08:00
Sushant Mahajan 3fc103b48b
KAFKA-18629: ShareGroupDeleteState admin client impl. (#18928)
* In this PR, we add various infra classes needed to support the
`deleteShareGroups` functionality via the `kafka-share-groups.sh`
script, as well as the implementation of `kafka-share-groups.sh --delete`.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-22 16:21:10 +00:00
Apoorv Mittal 6e45ab7d84
KAFKA-17351: Update tests and acquire API to allow discard batches from compacted topics (1/N) (#18978)
The PR does following:
1.  Adds `fetchOffset` to `acquire` API in SharePartition.
2. Adds a ShareFetchPartitionData class efficiently handle the
propagation of fetchOffset information.
3. Updates SharePartitionTests to make common code so such improvements
does not require all tests changes for future PRs.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-22 16:14:09 +00:00
Sushant Mahajan 4f28973bd1
KAFKA-18827: Initialize share state, share coordinator impl. [1/N] (#18968)
In this PR, we have added the share coordinator and KafkaApis side impl
of the intialize share group state RPC.
ref:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka#KIP932:QueuesforKafka-InitializeShareGroupStateAPI

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-22 16:12:08 +00:00
Ken Huang c6335c2ae8
MINOR: Fix fail e2e transactions_upgrade_test.py::TransactionsUpgradeTest.test_transactions_upgrade (#19004)
The main root cause is
3dba3125e9,
this PR remove the metadata version which is older than 3.3, thus this
test will fail when it use metadata version 3.2, 3.1

Reviewers: David Jacot <djacot@confluent.io>
2025-02-22 14:45:39 +01:00
David Jacot 407431499e
MINOR: Update version is doc (#19006)
This patch updates the version in the documentation.
2025-02-22 12:37:15 +01:00
TengYao Chi 1e9565788c
MINOR: Fix fail e2e TestUpgrade#test_combined_mode_upgrade and test_isolated_mode_upgrade (#19003)
#18845 assumed a baseline of 3.3 for server protocol versions so that
the lower version couldn't roll up to 4.0. Hence, the
`TestUpgrade#test_combined_mode_upgrad` and `test_isolated_mode_upgrade`
failed for the 3.1 and 3.2 versions.

e2e tests result with this patch on jenkins:
![Screenshot from 2025-02-22
13-22-17](https://github.com/user-attachments/assets/2de6f707-8281-4f30-b5d0-83dd4de9666d)
e2e tests result with this patch on local machine:
![Screenshot from 2025-02-22
13-28-16](https://github.com/user-attachments/assets/2e5e563a-1ac4-4894-ba30-593304697d1d)

Reviewers: David Jacot <djacot@confluent.io>
2025-02-22 08:53:34 +01:00
Ken Huang d820559751
MINOR: Fix fail e2e TransactionsMixedVersionsTest#test_transactions_mixed_versions (#19002)
The main root cause is
3dba3125e9,
this PR remove the metadata version which is older than 3.3, thus this
test will fail when it use metadata version 3.2, 3.1

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
2025-02-22 08:52:22 +01:00
David Jacot 14ebec345a
MINOR: Update release script for 4.0 (#18999)
This patch updates the release script to use JDK 21 to build the
release. We could also use JDK 17 but using JDK 21 directly does not
change much. We have to verify anyway that the server works with 17 and
the client with 11.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-02-21 20:30:33 +01:00
xijiu 118818a7ca
KAFKA-18795 Remove `Records#downConvert` (#18897)
Since we no longer convert records to the old format for fetch requests, this code is no longer used in production.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-22 02:29:58 +08:00
Lianet Magrans c580874fc2
KAFKA-18813: [3/N] Client support for TopicAuthException in DescribeConsumerGroup path (#18996)
Reviewers: David Jacot <djacot@confluent.io>
2025-02-21 12:42:00 -05:00
Apoorv Mittal f543eac4fe
KAFKA-18733: Implemented fetch ratio and partition acquire time metrics (3/N) (#18959)
PR implements the final set of ShareGroupMetrics,
RequestTopicPartitionsFetchRatio and TopicPartitionsAcquireTimeMs, as
defined in KIP-1103:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption

Note: Metric `RequestTopicPartitionsFetchRatio` is calculated as
percentage as Histogram API doesn't record double.


Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>
2025-02-21 17:01:39 +00:00
Calvin Liu 8f13e7c207
MINOR: Move the ELR default version to 4.1 (#18954)
- ELR is enabled (ELRV_1) by default if the cluster is created with its bootstrap metadata version >= IBP_4_1_IV0.
- ELRV_1 can be manually enabled iff the metadata version is >= IBP_4_0_IV1.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin P. McCabe <cmccabe@apache.org>, David Jacot <djacot@confluent.io>
2025-02-21 16:13:11 +01:00
Shivsundar R 7da1a6cbff
KAFKA-18033: Remove flaky tag in ShareConsumerTest (#18995)
3 tests which were marked flaky in ShareConsumerTest do not have any
failure on trunk since the test was converted to use `ClusterTestExtensions`.

Reviewers: Sushant Mahajan <smahajan@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-02-21 13:50:08 +00:00
Lianet Magrans c56c9faee2
KAFKA-18813: [2/N] Client support for TopicAuthException in HB path (#18986)
Reviewers: David Jacot <djacot@confluent.io>
2025-02-21 08:45:20 -05:00
TengYao Chi 767a62ade6
KAFKA-18737 KafkaDockerWrapper setup functions fails due to storage format command (#18844)
The current Docker Hub documentation for Kafka is based on the use of static voters. Since Kafka 4.0 utilizes dynamic voters, users following the doc of docker hub may encounter unexpected behavior. Due to the limited time available for the 4.0.0 release, a simple and quick solution is to revert to using static voters within the Docker image. This can be achieved by adding a configuration file with static voter definitions to the kafka/docker folder, keeping it separate from the main kafka/config directory. This approach allows us to encourage the use of dynamic voters in typical deployments while maintaining compatibility within the Docker image.

Reviewers: Vedarth Sharma <142404391+VedarthConfluent@users.noreply.github.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-21 20:43:41 +08:00
David Jacot 2124511431
MINOR: Rearrange configs in GroupCoordinatorConfigs (#18970)
I was looking into GroupCoordinatorConfigs to review configurations that
we will ship with Apache Kafka 4.0. I found out that it was pretty
disorganised. This patch cleans up the format and re-groups the
configurations which are related.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-21 13:20:58 +01:00
Sushant Mahajan c2cb543a1e
KAFKA-18629: Delete share group state RPC group coordinator impl. [3/N] (#18848)
* In this PR, we have added GC side impl to call the delete state share
coord RPC using the persister.
* We will be using the existing `GroupCoordinatorService.deleteGroups`.
The logic will be modified as follows:
* After sanitization, we will call a new
`runtime.scheduleWriteOperation` (not read for consistency) with
callback `GroupCoordinatorShard.sharePartitions`. This will return a Map
of share partitions of the groups which are of SHARE type. We need to
pass all groups as WE CANNOT DETERMINE the type of the group in the
service class.
* Then using the map we will create requests which could be passed to
the persister and make the appropriate calls.
* Once this future completes, we will continue with the existing flow of
group deletion.
* If the group under inspection is not share group - the read callback
should return an empty map.
* Tests have been added wherever applicable.

Reviewers: David Jacot <djacot@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2025-02-21 12:13:16 +00:00
TengYao Chi d31cbf59de
KAFKA-18831 Migrating to log4j2 introduce behavior changes of adjusting level dynamically (#18969)
fix the following behavior changes.

1) in log4j 1, users can't change the logger by parent if the logger is declared by properties explicitly. For example, `org.apache.kafka.controller` has level explicitly in the properties. Hence, we can't use "org.apache.kafka=INFO" to change the level of `org.apache.kafka.controller` to INFO. By contrast, log4j2 allows us to change all child loggers by the parent logger.

2) in log4j2, we can change the level of root to impact all loggers' level. By contrast, log4j 1 can't. 

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-21 16:12:58 +08:00
Matthias J. Sax acea35ddf3
MINOR: cleanup SinkNode generics (#18975)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Bill Bejeck <bill@confluent.io>
2025-02-20 17:47:39 -08:00
TengYao Chi 709bfc506a
KAFKA-18641: AsyncKafkaConsumer could lose records with auto offset commit (#18737)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Jun Rao <jun@confluent.io>, Kirk True <ktrue@confluent.io>
2025-02-20 12:11:01 -05:00
Calvin Liu 1eecd02ce8
MINOR: Deflake EligibleLeaderReplicasIntegrationTest (#18923)
Make sure to give enough time for the partition ISR updates.

Reviewers: David Jacot <djacot@confluent.io>
2025-02-20 05:14:15 -08:00
Sushant Mahajan c89fd2bff6
KAFKA-18828: Update share group metrics per new init and call mechanism. (#18962)
* Due to recent changes in the way group count metrics are initialized
and updated, the current share group count code has become obsolete as
well as non-functional.
* The update method for the share group count which should be called
from `ShareGroup` cannot be called either. This is because the
constructor has been changed to NOT accept the
`GroupCoordinatorShardMetrics` ref.
* In this PR, we remedy the situation by bringing share group count code
at par with consumer and streams groups.
* Additionally the metric name for share groups with group state
attributes was not aligned with streams and consumer groups as mentioned
in https://github.com/apache/kafka/pull/17011#discussion_r1960309578.
This PR aligns them too.

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-20 10:23:37 +00:00
Ken Huang eda8fc84ae
KAFKA-16918 TestUtils#assertFutureThrows should use future.get with timeout (#18891)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Luke Chen <showuon@gmail.com>, Parker Chang <45290853+Parkerhiphop@users.noreply.github.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-20 07:22:31 +08:00
Matthias J. Sax 9f23b25f6e
MINOR: fix Kafka Streams "smoke test" pass criteria (#18835)
Reviewers: Bill Bejeck <bill@confluent.io>, Bruno Cadonna <bruno@confluent.io>
2025-02-19 14:33:31 -08:00
Matthias J. Sax 538a60e1b3
MINOR: disallow rawtypes and fail build (#18877)
Cleanup code to avoid rawtype, and add suppressions where necessary.
Change the build to fail on rawtype warning.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-02-19 13:11:49 -08:00
Ismael Juma 3a59a526d9
MIINOR: Remove redundant quorum parameter from *AdminIntegrationTest classes (#18965)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-02-19 15:57:47 -05:00
David Arthur 7d628401e9
KAFKA-18791 Set default commit to PR title and description [2/n] (#18967)
Reviewers: Justine Olshan <jolshan@confluent.io>
2025-02-19 14:46:53 -05:00
Calvin Liu f85c7d4696
MINOR: Fix incorrect return value from upgradeFeatures #18958
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2025-02-19 09:41:06 -08:00
Shivsundar R 3603c8fe35
KAFKA-18829: Added check before converting to IMPLICIT mode (#18964)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-19 17:34:28 +00:00
Ismael Juma 6aab304542
MINOR: Update upgrade notes for 4.0.0 (#18960)
Details:
1. Upgrades to 4.0.x are only supported from 3.3.x and for kraft mode clusters
2. Add rolling upgrade instructions for 4.0.x
3. Clarify the message for zk to kraft migrations
4. Remove all the upgrade instructions for older versions (they can still be found in the upgrade notes for older releases)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
2025-02-19 09:32:50 -08:00
Kaushik Raina 469c55cf02
Add TransactionAbortableException and Timeout Exception handling instruction in docs (#18942)
Add instruction for handing TransactionAbortableException and TimeoutException at application side.

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-02-19 09:15:47 -08:00
David Arthur 9b29e91218
KAFKA-18791 Enable new asf.yaml parser [1/n] (#18955)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-19 10:48:23 -05:00
Ismael Juma 3dba3125e9
KAFKA-18601: Assume a baseline of 3.3 for server protocol versions (#18845)
3.3.0 was the first KRaft release that was deemed production-ready and also
when KIP-778 (KRaft to KRaft upgrades) landed. Given that, it's reasonable
for 4.x to only support upgrades from 3.3.0 or newer (the metadata version also
needs to be set to "3.3" or newer before upgrading).

Noteworthy changes:
1. `AlterPartition` no longer includes topic names, which makes it possible to
simplify `AlterParitionManager` logic.
2. Metadata versions older than `IBP_3_3_IV3` have been removed and
`IBP_3_3_IV3` is now the minimum version.
3. `MINIMUM_BOOTSTRAP_VERSION` has been removed.
4. Removed `isLeaderRecoverySupported`, `isNoOpsRecordSupported`,
`isKRaftSupported`, `isBrokerRegistrationChangeRecordSupported` and
`isInControlledShutdownStateSupported` - these are always `true` now.
Also removed related conditional code.
5. Removed default metadata version or metadata version fallbacks in
multiple places - we now fail-fast instead of potentially using an incorrect
metadata version.
6. Update `MetadataBatchLoader.resetToImage` to set `hasSeenRecord`
based on whether image is empty - this was a previously existing issue that
became more apparent after the changes in this PR.
7. Remove `ibp` parameter from `BootstrapDirectory`
8. A number of tests were not useful anymore and have been removed.

I will update the upgrade notes via a separate PR as there are a few things that
need changing and it would be easier to do so that way.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Justine Olshan <jolshan@confluen.io>, Ken Huang <s7133700@gmail.com>
2025-02-19 05:35:42 -08:00
ShivsundarR a6a588fbed
KAFKA-18198: Added check to prevent acknowledgements on initial ShareFetchRequest. (#18944)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-19 10:49:58 +00:00