kafka

Commit Graph

Author	SHA1	Message	Date
A. Sophie Blee-Goldman	f20f299492	KAFKA-18839: Drop EAGER rebalancing support in Kafka Streams (#18988 ) In 3.1 we deprecated the eager rebalancing protocol and marked it for removal in a later release. We aim to officially drop support and remove the protocol from Streams in 4.0. The effect of this PR is that it will no longer be possible to perform a live upgrade Kafka Streams directly to 4.0 from version 2.3 or below. Users will have to go through a bridge release between 2.4 - 3.9 instead. Reviewers: Matthias J. Sax <matthias@confluent.io>	2025-02-25 19:05:03 -08:00
José Armando García Sancio	4a8a0637e0	KAFKA-18723; Better handle invalid records during replication (#18852 ) For the KRaft implementation there is a race between the network thread, which read bytes in the log segments, and the KRaft driver thread, which truncates the log and appends records to the log. This race can cause the network thread to send corrupted records or inconsistent records. The corrupted records case is handle by catching and logging the CorruptRecordException. The inconsistent records case is handle by only appending record batches who's partition leader epoch is less than or equal to the fetching replica's epoch and the epoch didn't change between the request and response. For the ISR implementation there is also a race between the network thread and the replica fetcher thread, which truncates the log and appends records to the log. This race can cause the network thread send corrupted records or inconsistent records. The replica fetcher thread already handles the corrupted record case. The inconsistent records case is handle by only appending record batches who's partition leader epoch is less than or equal to the leader epoch in the FETCH request. Reviewers: Jun Rao <junrao@apache.org>, Alyssa Huang <ahuang@confluent.io>, Chia-Ping Tsai <chia7712@apache.org>	2025-02-25 20:09:19 -05:00
TengYao Chi	1fed928a0b	MINOR: Add a separate page for zk2kraft.html (#18961 ) Reviewers: mingdaoy <mingdaoy@gmail.com>, Ken Huang <s7133700@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-26 05:39:30 +08:00
Shivsundar R	fae2e53901	MINOR : Add missing error code in ConsumerHeartbeatRequestManagerTest. (#19024 ) Reviewers: Lianet Magrans <lmagrans@confluent.io>	2025-02-25 15:35:20 -05:00
David Arthur	8be216b24f	KAFKA-18792 Add workflow to check PR format (#18985 ) Adds two new workflows to help us enforce some uniform PR structure. The first workflow runs in an unprivileged context and simply captures the PR number into a text file and archives it. This is then used by another workflow that runs in a privileged context using the code in `trunk` to actually do the validation. The validation is done using a new Python script. This script fetches a PR using the GitHub CLI and validates its structure. For now this just includes the title and body, but could perform other non-code related checks in the future. This validation is needed for the up-coming merge queue functionality. Reviewers: Justine Olshan <jolshan@confluent.io>	2025-02-25 15:33:52 -05:00
Apoorv Mittal	df5839a9f4	KAFKA-17351: Improved handling of compacted topics in share partition (2/N) (#19010 ) The PR handles fetch for `compacted` topics. The fix was required only when complete batch disappears from the topic log, and same batch is marked re-available in Share Partition state cache. Subsequent log reads will not result the disappeared batch in read response hence respective batch will be left as available in the state cache. The PR checks for the first fetched/read batch base offset and if it's greater than the position from where the read occurred (fetch offset) then if there exists any `available` batches in the state cache then they will be archived. Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>	2025-02-25 14:11:39 +00:00
mingdaoy	a33a413d15	MINOR: Adjust docs for the committed message (#19022 ) Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-02-25 13:27:33 +00:00
Sanskar Jhajharia	b2f0d92c45	[MINOR] Fix the docs for share group metric functions (#19023 ) The last commit in this class mistakenly described the functions to be for Streams Groups. Just a minor update. Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Sushant Mahajan <smahajan@confluent.io>	2025-02-25 10:16:00 +00:00
xijiu	1edc30bf30	KAFKA-17836 Move RackAwareTest to server module (#19021 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-25 18:15:34 +08:00
TaiJuWu	1c82b89b4c	KAFKA-18712 Move Endpoint to server module (#18803 ) Reviewers: Ismael Juma <ismael@juma.me.uk>, Mickael Maison <mickael.maison@gmail.com>, Christo Lolov <lolovc@amazon.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-25 14:02:51 +08:00
PoAn Yang	10873e4210	KAFKA-18281: Kafka is improperly validating non-advertised listeners for routable controller addresses (#18387 ) When a cluster is configured with a dynamic controller quorum, KRaft replica's endpoint are computed using the advertised.listeners property and not the quorum.controller.voters property. This change in the configuration makes it difficult to keeping all previous node configurations compatible with the new endpoint discovery functionality. The least intrusive solution is to rely on Kafka's reverse hostname lookup when the hostname is not specified. The effective advertised controller listener now remove '0.0.0.0' hostname if the endpoint came from the listener configuration and not the advertised.listener configuration. Reviewers: José Armando García Sancio <jsancio@apache.org>, Alyssa Huang <ahuang@confluent.io>	2025-02-24 21:51:28 -05:00
Nick Guo	d23a61738a	KAFKA-17937 Cleanup AbstractFetcherThreadTest (#18900 ) - Remove AbstractFetcherThreadWithIbp26Test as it tests unsupported IBP - cleanup AbstractFetcherThreadTest to remove unreachable paths, variables, and code Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-25 07:45:47 +08:00
Calvin Liu	009bee75ab	KIP-966 part 1 release doc (#18898 ) Add notes to explain how ELR and how to manage ELR. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2025-02-24 15:19:18 -08:00
David Arthur	cb33e98dfc	KAFKA-18748 Run new tests separately in PRs (#18770 ) Split the JUnit tests into "new", "flaky", and the remainder. On PR builds, "new" tests are anything that do not exist on trunk. They are run with zero tolerance for flakiness. On trunk builds, "new" tests are anything added in the last 7 days. They are run with some tolerance for flakiness. Another change included here is that we will not update the test catalog if any test job fails on a trunk build. We have had difficulty determining if all the tests had or not (due to timeout or failures in upstream Gradle tasks). By requiring green ":test" jobs, we can be sure that the resulting catalog will be valid. --- The purpose of this change is to discourage contributors from adding flaky tests, but give some leeway for trunk so we have successful builds. The "quarantinedTest" Gradle target has been consolidated into the regular "test" target. There are now some runtime properties to control what tests are run. * kafka.test.catalog.file: path to test catalog * kafka.test.run.new: include new tests. this selection depends on the age of the loaded test catalog * kafka.test.run.flaky: include tests marked as `@Flaky` (replaces the `excludeTags 'flaky'` directive) * kafka.test.verbose: include additional logging from new JUnit classes (enabled by default if re-running GitHub workflow with debug logging enabled) * maxTestRetries: how many retries to allow via Develocity retry plugin (default 0) * maxTestRetryFailures: how many failures to allow before stopping retries (default 0) Thanks to Jun Rao for inspiring the idea. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>	2025-02-24 17:08:15 -05:00
Calvin Liu	10da082184	MINOR: update truncation test (#18952 ) Reduce the minISR to be 1 for the truncation test in order to skip the protection from KIP-966 Reviewers: David Jacot <djacot@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-25 04:32:29 +08:00
Apoorv Mittal	48a506b7b8	KAFKA-18522: Slice records for share fetch (#18804 ) The PR handles slicing of fetched records based on acquire response for share fetch. There could be additional bytes fetched from log but acquired offsets can be a subset, typically with `max fetch records` configuration. Rather sending additional bytes of fetched data to client we should slice the file and wire only needed batches. Note: If the acquired offsets are within a batch then we need to send the entire batch within the file record. Hence rather checking for individual batches, PR finds the first and last acquired offset, and trims the file for all batches between (inclusive) these two offsets. Reviewers: Christo Lolov <lolovc@amazon.com>, Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>	2025-02-24 09:55:24 -08:00
Ismael Juma	38c984307c	MINOR: Test showing MetadataLoader waits until metadata version is known (#19012 ) Reviewers: David Arthur <mumrah@gmail.com>	2025-02-24 08:38:45 -08:00
Ismael Juma	48527a1e7f	MINOR: Clean-up imports, imports and unused parameter in upgrade_test.py (#19018 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-24 06:35:07 -08:00
Sebastien Viale	3ce5f23295	KAFKA-18023: Enforcing Explicit Naming for Kafka Streams Internal Topics (#18233 ) Pull request to implement KIP-1111, aims to add a configuration that prevents a Kafka Streams application from starting if any of its internal topics have auto-generated names, thereby enforcing explicit naming for all internal topics and enhancing the stability of the application’s topology. - Repartition Topics: All repartition topics are created in the KStreamImpl.createRepartitionedSource(...) static method. This method either receives a name explicitly provided by the user or null and then builds the final repartition topic name. - Changelog Topics and State Store Names: There are several scenarios where these are created: In the MaterializedInternal constructor. During KStream/KStream joins. During KStream/KTable joins with grace periods. With key-value buffers are used in suppressions. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Sophie Blee-Goldman <sophie@responsive.dev>	2025-02-24 11:41:42 +01:00
Shivsundar R	2880e04129	KAFKA-18779: Validate responses from broker in client for ShareFetch and ShareAcknowledge RPCs. (#18939 ) - Currently if we received extraneous topic partitions in the response or if the response was missing some partitions requested, we were processing the response as it came and even populated the callback with these partitions. - These invalid responses should be parsed at the `ShareConsumeRequestManager`. - If the response missed any acknowledgements for partitions that were requested, then we fail the request with `InvalidRecordStateException` and populate the callbacks. - For any extraneous partitions in the response, we log an error and ignore them. Some refactors are also done in this PR in ShareConsumeRequestManager to make the code more readable. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-02-24 10:27:24 +00:00
mingdaoy	289e958c39	MINOR: Fix validateResourceNameIsNodeId's exception message (#19017 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-24 09:30:02 +08:00
Sushant Mahajan	6e76736890	KAFKA-18827: Initialize share group state persister impl [2/N]. (#18992 ) * In this PR, we have provided implementation for the initialize share group state RPC from the persister perspective. * Tests have been added wherever applicable. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-02-23 08:03:13 +00:00
Calvin Liu	a1372ced69	KAFKA-15583 doc update for the "strict min ISR" rule (#18880 ) Reviewers: Matthias J. Sax <matthias@confluent.io>, Dave Troiano <dtroiano@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-23 13:06:50 +08:00
Ismael Juma	13cb87c2d0	MINOR: Remove request log space added inadvertently (#19011 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-23 11:30:19 +08:00
Sanskar Jhajharia	a206feb4ba	MINOR: Clean up share-coordinator (#19007 ) Given that now we support Java 17 on our brokers, this PR replace the use of `Collections.singletonList()` and `Collections.emptyList()` with `List.of()` Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-23 11:27:38 +08:00
Sushant Mahajan	3fc103b48b	KAFKA-18629: ShareGroupDeleteState admin client impl. (#18928 ) * In this PR, we add various infra classes needed to support the `deleteShareGroups` functionality via the `kafka-share-groups.sh` script, as well as the implementation of `kafka-share-groups.sh --delete`. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-02-22 16:21:10 +00:00
Apoorv Mittal	6e45ab7d84	KAFKA-17351: Update tests and acquire API to allow discard batches from compacted topics (1/N) (#18978 ) The PR does following: 1. Adds `fetchOffset` to `acquire` API in SharePartition. 2. Adds a ShareFetchPartitionData class efficiently handle the propagation of fetchOffset information. 3. Updates SharePartitionTests to make common code so such improvements does not require all tests changes for future PRs. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-02-22 16:14:09 +00:00
Sushant Mahajan	4f28973bd1	KAFKA-18827: Initialize share state, share coordinator impl. [1/N] (#18968 ) In this PR, we have added the share coordinator and KafkaApis side impl of the intialize share group state RPC. ref: https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka#KIP932:QueuesforKafka-InitializeShareGroupStateAPI Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-02-22 16:12:08 +00:00
Ken Huang	c6335c2ae8	MINOR: Fix fail e2e transactions_upgrade_test.py::TransactionsUpgradeTest.test_transactions_upgrade (#19004 ) The main root cause is `3dba3125e9`, this PR remove the metadata version which is older than 3.3, thus this test will fail when it use metadata version 3.2, 3.1 Reviewers: David Jacot <djacot@confluent.io>	2025-02-22 14:45:39 +01:00
David Jacot	407431499e	MINOR: Update version is doc (#19006 ) This patch updates the version in the documentation.	2025-02-22 12:37:15 +01:00
TengYao Chi	1e9565788c	MINOR: Fix fail e2e TestUpgrade#test_combined_mode_upgrade and test_isolated_mode_upgrade (#19003 ) #18845 assumed a baseline of 3.3 for server protocol versions so that the lower version couldn't roll up to 4.0. Hence, the `TestUpgrade#test_combined_mode_upgrad` and `test_isolated_mode_upgrade` failed for the 3.1 and 3.2 versions. e2e tests result with this patch on jenkins: ![Screenshot from 2025-02-22 13-22-17](https://github.com/user-attachments/assets/2de6f707-8281-4f30-b5d0-83dd4de9666d) e2e tests result with this patch on local machine: ![Screenshot from 2025-02-22 13-28-16](https://github.com/user-attachments/assets/2e5e563a-1ac4-4894-ba30-593304697d1d) Reviewers: David Jacot <djacot@confluent.io>	2025-02-22 08:53:34 +01:00
Ken Huang	d820559751	MINOR: Fix fail e2e TransactionsMixedVersionsTest#test_transactions_mixed_versions (#19002 ) The main root cause is `3dba3125e9`, this PR remove the metadata version which is older than 3.3, thus this test will fail when it use metadata version 3.2, 3.1 Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>	2025-02-22 08:52:22 +01:00
David Jacot	14ebec345a	MINOR: Update release script for 4.0 (#18999 ) This patch updates the release script to use JDK 21 to build the release. We could also use JDK 17 but using JDK 21 directly does not change much. We have to verify anyway that the server works with 17 and the client with 11. Reviewers: Ismael Juma <ismael@juma.me.uk>	2025-02-21 20:30:33 +01:00
xijiu	118818a7ca	KAFKA-18795 Remove `Records#downConvert` (#18897 ) Since we no longer convert records to the old format for fetch requests, this code is no longer used in production. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-22 02:29:58 +08:00
Lianet Magrans	c580874fc2	KAFKA-18813: [3/N] Client support for TopicAuthException in DescribeConsumerGroup path (#18996 ) Reviewers: David Jacot <djacot@confluent.io>	2025-02-21 12:42:00 -05:00
Apoorv Mittal	f543eac4fe	KAFKA-18733: Implemented fetch ratio and partition acquire time metrics (3/N) (#18959 ) PR implements the final set of ShareGroupMetrics, RequestTopicPartitionsFetchRatio and TopicPartitionsAcquireTimeMs, as defined in KIP-1103: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption Note: Metric `RequestTopicPartitionsFetchRatio` is calculated as percentage as Histogram API doesn't record double. Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>	2025-02-21 17:01:39 +00:00
Calvin Liu	8f13e7c207	MINOR: Move the ELR default version to 4.1 (#18954 ) - ELR is enabled (ELRV_1) by default if the cluster is created with its bootstrap metadata version >= IBP_4_1_IV0. - ELRV_1 can be manually enabled iff the metadata version is >= IBP_4_0_IV1. Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin P. McCabe <cmccabe@apache.org>, David Jacot <djacot@confluent.io>	2025-02-21 16:13:11 +01:00
Shivsundar R	7da1a6cbff	KAFKA-18033: Remove flaky tag in ShareConsumerTest (#18995 ) 3 tests which were marked flaky in ShareConsumerTest do not have any failure on trunk since the test was converted to use `ClusterTestExtensions`. Reviewers: Sushant Mahajan <smahajan@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>	2025-02-21 13:50:08 +00:00
Lianet Magrans	c56c9faee2	KAFKA-18813: [2/N] Client support for TopicAuthException in HB path (#18986 ) Reviewers: David Jacot <djacot@confluent.io>	2025-02-21 08:45:20 -05:00
TengYao Chi	767a62ade6	KAFKA-18737 KafkaDockerWrapper setup functions fails due to storage format command (#18844 ) The current Docker Hub documentation for Kafka is based on the use of static voters. Since Kafka 4.0 utilizes dynamic voters, users following the doc of docker hub may encounter unexpected behavior. Due to the limited time available for the 4.0.0 release, a simple and quick solution is to revert to using static voters within the Docker image. This can be achieved by adding a configuration file with static voter definitions to the kafka/docker folder, keeping it separate from the main kafka/config directory. This approach allows us to encourage the use of dynamic voters in typical deployments while maintaining compatibility within the Docker image. Reviewers: Vedarth Sharma <142404391+VedarthConfluent@users.noreply.github.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-21 20:43:41 +08:00
David Jacot	2124511431	MINOR: Rearrange configs in GroupCoordinatorConfigs (#18970 ) I was looking into GroupCoordinatorConfigs to review configurations that we will ship with Apache Kafka 4.0. I found out that it was pretty disorganised. This patch cleans up the format and re-groups the configurations which are related. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-21 13:20:58 +01:00
Sushant Mahajan	c2cb543a1e	KAFKA-18629: Delete share group state RPC group coordinator impl. [3/N] (#18848 ) * In this PR, we have added GC side impl to call the delete state share coord RPC using the persister. * We will be using the existing `GroupCoordinatorService.deleteGroups`. The logic will be modified as follows: * After sanitization, we will call a new `runtime.scheduleWriteOperation` (not read for consistency) with callback `GroupCoordinatorShard.sharePartitions`. This will return a Map of share partitions of the groups which are of SHARE type. We need to pass all groups as WE CANNOT DETERMINE the type of the group in the service class. * Then using the map we will create requests which could be passed to the persister and make the appropriate calls. * Once this future completes, we will continue with the existing flow of group deletion. * If the group under inspection is not share group - the read callback should return an empty map. * Tests have been added wherever applicable. Reviewers: David Jacot <djacot@confluent.io>, Andrew Schofield <aschofield@confluent.io>	2025-02-21 12:13:16 +00:00
TengYao Chi	d31cbf59de	KAFKA-18831 Migrating to log4j2 introduce behavior changes of adjusting level dynamically (#18969 ) fix the following behavior changes. 1) in log4j 1, users can't change the logger by parent if the logger is declared by properties explicitly. For example, `org.apache.kafka.controller` has level explicitly in the properties. Hence, we can't use "org.apache.kafka=INFO" to change the level of `org.apache.kafka.controller` to INFO. By contrast, log4j2 allows us to change all child loggers by the parent logger. 2) in log4j2, we can change the level of root to impact all loggers' level. By contrast, log4j 1 can't. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-02-21 16:12:58 +08:00
Matthias J. Sax	acea35ddf3	MINOR: cleanup SinkNode generics (#18975 ) Reviewers: Andrew Schofield <aschofield@confluent.io>, Bill Bejeck <bill@confluent.io>	2025-02-20 17:47:39 -08:00
TengYao Chi	709bfc506a	KAFKA-18641: AsyncKafkaConsumer could lose records with auto offset commit (#18737 ) Reviewers: Lianet Magrans <lmagrans@confluent.io>, Jun Rao <jun@confluent.io>, Kirk True <ktrue@confluent.io>	2025-02-20 12:11:01 -05:00
Calvin Liu	1eecd02ce8	MINOR: Deflake EligibleLeaderReplicasIntegrationTest (#18923 ) Make sure to give enough time for the partition ISR updates. Reviewers: David Jacot <djacot@confluent.io>	2025-02-20 05:14:15 -08:00
Sushant Mahajan	c89fd2bff6	KAFKA-18828: Update share group metrics per new init and call mechanism. (#18962 ) * Due to recent changes in the way group count metrics are initialized and updated, the current share group count code has become obsolete as well as non-functional. * The update method for the share group count which should be called from `ShareGroup` cannot be called either. This is because the constructor has been changed to NOT accept the `GroupCoordinatorShardMetrics` ref. * In this PR, we remedy the situation by bringing share group count code at par with consumer and streams groups. * Additionally the metric name for share groups with group state attributes was not aligned with streams and consumer groups as mentioned in https://github.com/apache/kafka/pull/17011#discussion_r1960309578. This PR aligns them too. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-02-20 10:23:37 +00:00
Ken Huang	eda8fc84ae	KAFKA-16918 TestUtils#assertFutureThrows should use future.get with timeout (#18891 ) Reviewers: TengYao Chi <kitingiao@gmail.com>, Luke Chen <showuon@gmail.com>, Parker Chang <45290853+Parkerhiphop@users.noreply.github.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-02-20 07:22:31 +08:00
Matthias J. Sax	9f23b25f6e	MINOR: fix Kafka Streams "smoke test" pass criteria (#18835 ) Reviewers: Bill Bejeck <bill@confluent.io>, Bruno Cadonna <bruno@confluent.io>	2025-02-19 14:33:31 -08:00
Matthias J. Sax	538a60e1b3	MINOR: disallow rawtypes and fail build (#18877 ) Cleanup code to avoid rawtype, and add suppressions where necessary. Change the build to fail on rawtype warning. Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>	2025-02-19 13:11:49 -08:00

1 2 3 4 5 ...

15181 Commits All Branches Search

15181 Commits

All Branches