kafka

Commit Graph

Author	SHA1	Message	Date
Logan Zhu	15ec053665	KAFKA-18656 Backport KAFKA-18597 to 4.0 (#20026 ) CI / build (push) Waiting to run Details Backport of [KAFKA-18597](https://github.com/apache/kafka/pull/18627) to the 4.0 branch. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-06-25 19:19:30 +08:00
Calvin Liu	46e843da9f	KAFKA-19383: Handle the deleted topics when applying ClearElrRecord (#20034 ) CI / build (push) Waiting to run Details https://issues.apache.org/jira/browse/KAFKA-19383 When applying the ClearElrRecord, it may pick up the topicId in the image without checking if the topic has been deleted. This can cause the creation of a new TopicRecord with an old topic ID. Reviewers: Alyssa Huang <ahuang@confluent.io>, Artem Livshits <alivshits@confluent.io>, Colin P. McCabe <cmccabe@apache.org> No conflicts.	2025-06-24 19:59:50 -07:00
Colin Patrick McCabe	7e51a2a43b	KAFKA-19294: Fix BrokerLifecycleManager RPC timeouts (#19745 ) Previously, we could wait for up to half of the broker session timeout for an RPC to complete, and then delay by up to half of the broker session timeout. When taken together, these two delays could lead to brokers erroneously missing heartbeats. This change removes exponential backoff for heartbeats sent from the broker to the controller. The load caused by heartbeats is not heavy, and controllers can easily time out heartbeats when the queue length is too long. Additionally, we now set the maximum RPC time to the length of the broker period. This minimizes the impact of heavy load. Reviewers: José Armando García Sancio <jsancio@apache.org>, David Arthur <mumrah@gmail.com>	2025-06-24 16:39:07 -07:00
Alyssa Huang	52a5b88512	KAFKA-19411: Fix deleteAcls bug which allows more deletions than max records per user op (#19974 ) CI / build (push) Waiting to run Details If there are more deletion filters after we initially hit the `MAX_RECORDS_PER_USER_OP` bound, we will add an additional deletion record ontop of that for each additional filter. The current error message returned to the client is not useful either, adding logic so client doesn't just get `UNKNOWN_SERVER_EXCEPTION` with no details returned.	2025-06-24 16:00:20 -07:00
Lan Ding	d426d65041	MINOR: fix reassign command bug (#20003 ) see `9570c67b8c/core/src/main/scala/kafka/admin/ReassignPartitionsCommand.scala (L1208)` During the rewrite for [KAFKA-14595](https://github.com/apache/kafka/pull/13247), the relevant condition was omitted. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-06-25 02:38:41 +08:00
Okada Haruki	9fcfe546d1	KAFKA-19407 Fix potential IllegalStateException when appending to timeIndex (#19972 ) ## Summary - Fix potential race condition in LogSegment#readMaxTimestampAndOffsetSoFar(), which may result in non-monotonic offsets and causes replication to stop. - See https://issues.apache.org/jira/browse/KAFKA-19407 for the details how it happen. Reviewers: Vincent PÉRICART <mauhiz@gmail.com>, Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-06-25 00:37:15 +08:00
Ritika Reddy	c8b8adf3c1	KAFKA-19367: Follow up bug fix (#19991 ) CI / build (push) Waiting to run Details This is a follow up to [https://github.com/apache/kafka/pull/19910](https://github.com/apache/kafka/pull/url) The coordinator failed to write an epoch fence transition for producer jt142 to the transaction log with error COORDINATOR_NOT_AVAILABLE. The epoch was increased to 2 but not returned to the client (kafka.coordinator.transaction.TransactionCoordinator) -- as we don't bump the epoch with this change, we should also update the message to not say "increased" and remove the epochAndMetadata.transactionMetadata.hasFailedEpochFence = true line In the test, the expected behavior is: First append transaction to the log fails with COORDINATOR_NOT_AVAILABLE (epoch 1) We try init_pid again, this time the SINGLE epoch bump succeeds, and the following things happen simultaneously (epoch 2) -> Transition to COMPLETE_ABORT -> Return CONCURRENT_TRANSACTION error to the client The client retries, and there is another epoch bump; state transitions to EMPTY (epoch 3) Reviewers: Justine Olshan <jolshan@confluent.io>	2025-06-23 15:15:36 -07:00
Apoorv Mittal	26e5a53906	HOTFIX: Correcting build after cherry-pick (#19969 ) CI / build (push) Has been cancelled Details Fixing build after cherrypicking: ``` commit `254c1fa519` Author: Apoorv Mittal <apoorvmittal10@gmail.com> Date: Thu Jun 12 22:52:50 2025 +0100 MINOR: Fixing client telemetry validate request (#19959) Minor fix to correct the validate condition for GetTelemetryRequests. Added respective tests as well. Reviewers: Andrew Schofield <aschofield@confluent.io> ``` Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2025-06-16 10:11:18 +01:00
Ritika Reddy	c6b44b5d66	Cherry Pick KAFKA-19367 to 4.0 (#19958 ) CI / build (push) Has been cancelled Details [`0b2e410d61`](url) Bug fix in 4.0 Conflicts: - The Transaction Coordinator had some conflicts, mainly with the transaction states. Ex: ongoing in 4.0 is TransactionState.ONGOING in 4.1. - The TransactionCoordinatorTest file had conflicts w.r.t the 2PC changes from KIP-939 in 4.1 and the above mentioned state changes Reviewers: Justine Olshan <jolshan@confluent.io>, Artem Livshits <alivshits@confluent.io>	2025-06-14 11:40:00 -07:00
Apoorv Mittal	254c1fa519	MINOR: Fixing client telemetry validate request (#19959 ) CI / build (push) Has been cancelled Details Minor fix to correct the validate condition for GetTelemetryRequests. Added respective tests as well. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-06-12 22:56:21 +01:00
Luke Chen	00a1b1e8ce	Bump the commons-beanutils for CVE-2025-48734. Since `commons-validator` CI / build (push) Has been cancelled Details hasn't had new release with newer `commons-beanutils` versions, we manually bump it in kafka. Reviewers: Mickael Maison <mickael.maison@gmail.com>	2025-06-11 15:27:22 +08:00
Okada Haruki	1cc14f6343	KAFKA-19334 MetadataShell execution unintentionally deletes lock file (#19817 ) CI / build (push) Has been cancelled Details ## Summary - MetadataShell may deletes lock file unintentionally when it exists or fails to acquire lock. If there's running server, this causes unexpected result as below: * MetadataShell succeeds on 2nd run unexpectedly * Even worse, LogManager/RaftManager's lock also no longer work from concurrent Kafka process startup Reviewers: TengYao Chi <frankvicky@apache.org> # Conflicts: # shell/src/test/java/org/apache/kafka/shell/MetadataShellIntegrationTest.java	2025-06-09 12:48:19 +08:00
Alyssa Huang	ded7653066	MINOR: Fix some Request toString methods (#19655 ) (#19689 ) CI / build (push) Has been cancelled Details Reviewers: Colin P. McCabe <cmccabe@apache.org> ``` Conflicts: clients/src/main/java/org/apache/kafka/common/requests/AlterUserScramCredentialsRequest.java - import statement clients/src/main/java/org/apache/kafka/common/requests/IncrementalAlterConfigsRequest.java - import statement core/src/test/scala/unit/kafka/server/KafkaApisTest.scala - different logging and metadatacache instantiation ``` Cherry-Picked-From: `042be5b9ac` Cherry-Picked-By: Alyssa Huang <ahuang@confluent.io> Cherry-Picked-At: Mon May 12 11:01:47 2025 -0700	2025-05-27 15:37:29 -07:00
Dongnuo Lyu	e9c5069be7	KAFKA-18687: Setting the subscriptionMetadata during conversion to consumer group (#19790 ) CI / build (push) Waiting to run Details When a consumer protocol static member replaces an existing member in a classic group, it's not necessary to recompute the assignment. However, it happens anyway. In [ConsumerGroup.fromClassicGroup](`0ff4dafb7d/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/modern/consumer/ConsumerGroup.java (L1140)`), we don't set the group's subscriptionMetadata. Later in the consumer group heartbeat, we [call updateSubscriptionMetadata](`0ff4dafb7d/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java (L1748)`), which [notices that the group's subscriptionMetadata needs an update](`0ff4dafb7d/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java (L2757)`) and bumps the epoch. Since the epoch is bumped, we [recompute the assignment](`0ff4dafb7d/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java (L1766)`). As a fix, this patch sets the subscriptionMetadata in ConsumerGroup.fromClassicGroup. Reviewers: Sean Quah <squah@confluent.io>, David Jacot <djacot@confluent.io>	2025-05-27 11:31:13 +02:00
Alyssa Huang	3170e1130c	KAFKA-18345; Prevent livelocked elections (#19658 ) CI / build (push) Has been cancelled Details At the retry limit binaryExponentialElectionBackoffMs it becomes statistically likely that the exponential backoff returned electionBackoffMaxMs. This is an issue as multiple replicas can get stuck starting elections at the same cadence. This change fixes that by added a random jitter to the max election backoff. Reviewers: José Armando García Sancio <jsancio@apache.org>, TaiJuWu <tjwu1217@gmail.com>, Yung <yungyung7654321@gmail.com>	2025-05-20 16:10:23 -04:00
Andy Li	14fd498ed0	MINOR: API Responses missing latest version in Kafka protocol guide (#19769 ) ### Issue: API Responses missing latest version in [Kafka protocol guide](https://kafka.apache.org/protocol.html) #### For example: These are missing: - ApiVersions Response (Version: 4) — Only versions 0–3 are documented, though version 4 of the request is included. - DescribeTopicPartitions Response — Not listed at all. - Fetch Response (Version: 17) — Only versions 4–16 are documented, though version 17 of the request is included. #### After the fix: docs/generated/protocol_messages.html <img width="1045" alt="image" src="https://github.com/user-attachments/assets/5ea79ced-aab5-4c47-8e09-9956047c9bf1" /> Reviewers: dengziming <dengziming1993@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-21 00:27:32 +08:00
Matthias J. Sax	923086dba2	KAFKA-19171: Kafka Streams crashes with UnsupportedOperationException (#19507 ) CI / build (push) Has been cancelled Details This PR fixes a regression bug introduced with KAFKA-17203. We need to pass in mutable collections into `closeTaskClean(...)`. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Bruno Cadonna <bruno@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>	2025-05-15 21:40:08 -07:00
Matthias J. Sax	62c6697ac9	KAFKA-19208: KStream-GlobalKTable join should not drop left-null-key record (#19580 ) CI / build (push) Waiting to run Details Reviewers: Lucas Brutschy <lbrutschy@confluent.io>	2025-05-15 18:37:39 -07:00
David Jacot	d7d7876989	KAFKA-19274; Group Coordinator Shards are not unloaded when `__consumer_offsets` topic is deleted (#19713 ) CI / build (push) Waiting to run Details Group Coordinator Shards are not unloaded when `__consumer_offsets` topic is deleted. The unloading is scheduled but it is ignored because the epoch is equal to the current epoch: ``` [2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1] Scheduling unloading of metadata for __consumer_offsets-0 with epoch OptionalInt[0] (org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime) [2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1] Scheduling unloading of metadata for __consumer_offsets-1 with epoch OptionalInt[0] (org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime) [2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1] Ignored unloading metadata for __consumer_offsets-0 in epoch OptionalInt[0] since current epoch is 0. (org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime) [2025-05-13 08:46:00,883] INFO [GroupCoordinator id=1] Ignored unloading metadata for __consumer_offsets-1 in epoch OptionalInt[0] since current epoch is 0. (org.apache.kafka.coordinator.common.runtime.CoordinatorRuntime) ``` This patch fixes the issue by not setting the leader epoch in this case. The coordinator expects the leader epoch to be incremented when the resignation code is called. When the topic is deleted, the epoch is not incremented. Therefore, we must not use it. Note that this is aligned with deleted partitions are handled too. Reviewers: Dongnuo Lyu <dlyu@confluent.io>, José Armando García Sancio <jsancio@apache.org>	2025-05-15 19:12:42 +02:00
Kuan-Po Tseng	f99db0804e	KAFKA-19275 client-state and thread-state metrics are always "Unavailable" (#19712 ) CI / build (push) Has been cancelled Details Fix the issue where JMC is unable to correctly display client-state and thread-state metrics. The root cause is that these two metrics directly return the `State` class to JMX. If the user has not set up the RMI server, JMC or other monitoring tools will be unable to interpret the `State` class. To resolve this, we should return a string representation of the state instead of the State class in these two metrics. Reviewers: Luke Chen <showuon@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2025-05-14 14:08:01 +08:00
Sean Quah	d64a97099d	KAFKA-18688: Fix uniform homogeneous assignor stability (#19677 ) CI / build (push) Waiting to run Details When the number of partitions is not divisible by the number of members, some members will end up with one more partition than others. Previously, we required these to be the members at the start of the iteration order, which meant that partitions could be reassigned even when the previous assignment was already balanced. Allow any member to have the extra partition, so that we do not move partitions around when the previous assignment is already balanced. Before the PR ``` Benchmark (assignmentType) (assignorType) (isRackAware) (memberCount) (partitionsToMemberRatio) (subscriptionType) (topicCount) Mode Cnt Score Error Units ServerSideAssignorBenchmark.doAssignment FULL RANGE false 10000 50 HOMOGENEOUS 1000 avgt 2 26.175 ms/op ServerSideAssignorBenchmark.doAssignment FULL RANGE false 10000 50 HETEROGENEOUS 1000 avgt 2 123.955 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 10000 50 HOMOGENEOUS 1000 avgt 2 24.408 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 10000 50 HETEROGENEOUS 1000 avgt 2 114.873 ms/op ``` After the PR ``` Benchmark (assignmentType) (assignorType) (isRackAware) (memberCount) (partitionsToMemberRatio) (subscriptionType) (topicCount) Mode Cnt Score Error Units ServerSideAssignorBenchmark.doAssignment FULL RANGE false 10000 50 HOMOGENEOUS 1000 avgt 2 24.259 ms/op ServerSideAssignorBenchmark.doAssignment FULL RANGE false 10000 50 HETEROGENEOUS 1000 avgt 2 118.513 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 10000 50 HOMOGENEOUS 1000 avgt 2 24.636 ms/op ServerSideAssignorBenchmark.doAssignment INCREMENTAL RANGE false 10000 50 HETEROGENEOUS 1000 avgt 2 115.503 ms/op ``` Reviewers: David Jacot <djacot@confluent.io>	2025-05-13 17:03:19 +02:00
Sean Quah	48f75616d7	KAFKA-19163: Avoid deleting groups with pending transactional offsets (#19496 ) When a group has pending transactional offsets but no committed offsets, we can accidentally delete it while cleaning up expired offsets. Add a check to avoid this case. Reviewers: David Jacot <djacot@confluent.io>	2025-05-13 14:10:55 +02:00
David Jacot	9aad1849e2	MINOR: Fix version in 4.0 branch (#19686 ) CI / build (push) Waiting to run Details This patch fixes the version used in the `4.0` branch. It should be `4.0.1` instead of `4.1.0`. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-05-12 20:17:30 +02:00
ChickenchickenLove	5015183c1c	KAFKA-19242: Fix commit bugs caused by race condition during rebalancing. (#19631 ) ### Motivation While investigating “events skipped in group rebalancing” ([spring‑projects/spring‑kafka#3703](https://github.com/spring-projects/spring-kafka/issues/3703)) I discovered a race condition between - the main poll/commit thread, and - the consumer‑coordinator heartbeat thread. If the main thread enters `ConsumerCoordinator.sendOffsetCommitRequest()` while the heartbeat thread is finishing a rebalance (`SyncGroupResponseHandler.handle()`), the group state transitions in the following order: ``` COMPLETING_REBALANCE → (race window) → STABLE ``` Because we read the state twice without a lock: 1. `generationIfStable()` returns `null` (state still `COMPLETING_REBALANCE`), 2. the heartbeat thread flips the state to `STABLE`, 3. the main thread re‑checks with `rebalanceInProgress()` and wrongly decides that a rebalance is still active, 4. a spurious `CommitFailedException` is returned even though the commit could succeed. For more details, please refer to sequence diagram below. <img width="1494" alt="image" src="https://github.com/user-attachments/assets/90f19af5-5e2d-4566-aece-ef764df2d89c" /> ### Impact - The exception is semantically wrong: the consumer is in a stable group, but reports failure. - Frameworks and applications that rely on the semantics of `CommitFailedException` and `RetryableCommitException` (for example `Spring Kafka`) take the wrong code path, which can ultimately skip the events and break “at‑most‑once” guarantees. ### Fix We enlarge the synchronized block in `ConsumerCoordinator.sendOffsetCommitRequest()` so that the consumer group state is examined atomically with respect to the heartbeat thread: ### Jira https://issues.apache.org/jira/browse/KAFKA-19242 https: //github.com/spring-projects/spring-kafka/issues/3703 Signed-off-by: chickenchickenlove <ojt90902@naver.com> Reviewers: David Jacot <david.jacot@gmail.com>	2025-05-12 17:01:54 +02:00
Sean Quah	cf3c177936	KAFKA-19160;KAFKA-19164; Improve performance of fetching stable offsets (#19497 ) CI / build (push) Waiting to run Details When fetching stable offsets in the group coordinator, we iterate over all requested partitions. For each partition, we iterate over the group's ongoing transactions to check if there is a pending transactional offset commit for that partition. This can get slow when there are a large number of partitions and a large number of pending transactions. Instead, maintain a list of pending transactions per partition to speed up lookups. Reviewers: Shaan, Dongnuo Lyu <dlyu@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, David Jaco <djacot@confluent.io>	2025-05-12 09:34:43 +02:00
Luke Chen	1f856d437d	MINOR: exclude error_prone_annotations lib from caffeine dependency (#19638 ) CI / build (push) Has been cancelled Details In https://github.com/apache/kafka/pull/16578 , we tried to exclude both `checker-qual` and `error_prone_annotations`, but when excluding `error_prone_annotations`, the compilation failed. So in the end, we only excluded `checker-qual` and shipped `error_prone_annotations.jar` to users. In Kafka v4.0.0, thanks to jdk 8 removal, we upgraded caffeine to the latest v3.1.8, instead of v2.x.x, and now, we can successfully pass the compilation without error after excluding `error_prone_annotations` from `caffeine`. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ken Huang <s7133700@gmail.com>	2025-05-08 10:11:20 +08:00
Kamal Chandraprakash	c2068878c9	KAFKA-19131: Adjust remote storage reader thread maximum pool size to avoid illegal argument (#19629 ) CI / build (push) Has been cancelled Details The remote storage reader thread pool use same count for both maximum and core size. If users adjust the pool size larger than original value, it throws `IllegalArgumentException`. Updated both value to fix the issue. cherry-pick PR: #19532 cherry-pick commit: `965743c35b` --------- Signed-off-by: PoAn Yang <payang@apache.org> Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang <payang@apache.org> Co-authored-by: PoAn Yang <payang@apache.org>	2025-05-04 18:53:16 +05:30
Lucas Brutschy	0832c2ceb1	KAFKA-19195: Only send the right group ID subset to each GC shard (#19555 ) CI / build (push) Has been cancelled Details Cherry-picked from [`e79f5f0`](`e79f5f0f65`) If a share or consumer group is described, all group IDs sent to all shards of the group coordinator. This change fixes it. It tested in the unit tests, since it's somewhat inconvenient to test the passed read operation lambda.	2025-04-28 15:33:20 +02:00
Colin Patrick McCabe	0297ba2c67	KAFKA-19192; Old bootstrap checkpoint files cause problems updated servers (#19545 ) Old bootstrap.metadata files cause problems with server that include KAFKA-18601. When the server tries to read the bootstrap.checkpoint file, it will fail if the metadata.version is older than 3.3-IV3 (feature level 7). This causes problems when these clusters are upgraded. This PR makes it possible to represent older MVs in BootstrapMetadata objects without causing an exception. An exception is thrown only if we attempt to access the BootstrapMetadata. This ensures that only the code path in which we start with an empty metadata log checks that the metadata version is 7 or newer. Reviewers: José Armando García Sancio <jsancio@apache.org>, Ismael Juma <ismael@juma.me.uk>, PoAn Yang <payang@apache.org>, Liu Zeyu <zeyu.luke@gmail.com>, Alyssa Huang <ahuang@confluent.io>	2025-04-24 16:55:22 -04:00
David Arthur	3901c8519e	KAFKA-19166: Fix RC tag in release script (#19518 ) The release script was pushing the RC tag off of a temporary branch that was never merged back into the release branch. This meant that our RC and release tags were detached from the rest of the repository. This patch changes the release script to merge the RC tag back into the release branch and pushes both the tag and the branch. Reviewers: Luke Chen <showuon@gmail.com>	2025-04-22 17:57:31 +08:00
Manikumar Reddy	56743b37f6	MINOR: Supress stdout when checking Log4j 1.x configuration compatibility mode (#19502 ) when using log41 config, we are printing addtional line like below. This PR is to fix that.	2025-04-17 14:30:47 +05:30
Hong-Yi Chen	8a515da2c8	KAFKA-19054: StreamThread exception handling with SHUTDOWN_APPLICATION may trigger a tight loop with MANY logs (#19394 ) Under the `SHUTDOWN_APPLICATION` configuration in Kafka Streams, a tight loop in the shutdown process can flood logs with repeated messages. This PR introduces a check to ensure that the shutdown log is emitted only once every 10 seconds, thereby preventing log flooding. Reviewers: PoAn Yang <payang@apache.org>, Matthias J. Sax <matthias@confluent.io>	2025-04-16 20:46:17 -07:00
Rajini Sivaram	f98dec9440	KAFKA-19147: Start authorizer before group coordinator to ensure coordinator authorizes regex topics (#19488 ) [KAFKA-18813](https://issues.apache.org/jira/browse/KAFKA-18813) added `Topic:Describe` authorization of topics matching regex patterns to the group coordinator since it was difficult to authorize these in the broker when processing consumer heartbeats using the new protocol. But group coordinator is started in `BrokerServer` before the authorizer is created. And hence group coordinator doesn't have an authorizer and never performs authorization. As a result, topics that are not authorized for `Describe` may be assigned to consumers. This potentially leaks information about topic existence, topic id and partition count to users who are not authorized to describe a topic. This PR starts authorizer earlier to ensure that authorization is performed by the group coordinator. Also adds integration tests for verification. Note that we still have a second issue when members have different permissions. If regex is resolved by a member with permission to more topics, unauthorized topics may be assigned to members with lower permissions. In this case, we still return assignment containing topic id and partitions to the member without `Topic:Describe` access. This is not addressed by this PR, but an integration test that illustrates the issue has been added so that we can verify when the issue is fixed. Reviewers: David Jacot <david.jacot@gmail.com>	2025-04-16 13:08:37 +01:00
Azhar Ahmed	143fcb1d7c	KAFKA-19071: Fix doc for remote.storage.enable (#19345 ) As of 3.9, Kafka allows disabling remote storage on a topic after it was enabled. It allows subsequent enabling and disabling too. However the documentation says otherwise and needs to be corrected. Doc: https://kafka.apache.org/39/documentation/#topicconfigs_remote.storage.enable Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>	2025-04-14 11:10:22 +08:00
Florian Hussonnois	de27409e30	KAFKA-18962: Fix onBatchRestored call in GlobalStateManagerImpl (#19188 ) Call the StateRestoreListener#onBatchRestored with numRestored and not the totalRestored when reprocessing state See: https://issues.apache.org/jira/browse/KAFKA-18962 Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias Sax <mjsax@apache.org>	2025-04-09 13:38:03 -07:00
José Armando García Sancio	83f6a1d7e6	KAFKA-18991; Missing change for cherry-pick	2025-04-09 12:52:15 -04:00
TengYao Chi	952c8a5e94	KAFKA-18991: FetcherThread should match leader epochs between fetch request and fetch state (#19223 ) This PR fixes a potential issue where the `FetchResponse` returns `divergingEndOffsets` with an older leader epoch. This can lead to committed records being removed from the follower's log, potentially causing data loss. In detail: `processFetchRequest` gets the requested leader epoch of partition data by `topicPartition` and compares it with the leader epoch of the current fetch state. If they don't match, the response is ignored. Reviewers: Jun Rao <junrao@gmail.com>	2025-04-09 12:32:22 -04:00
José Armando García Sancio	4dbe4739bd	KAFKA-18723; Better handle invalid records during replication (#18852 ) For the KRaft implementation there is a race between the network thread, which read bytes in the log segments, and the KRaft driver thread, which truncates the log and appends records to the log. This race can cause the network thread to send corrupted records or inconsistent records. The corrupted records case is handle by catching and logging the CorruptRecordException. The inconsistent records case is handle by only appending record batches who's partition leader epoch is less than or equal to the fetching replica's epoch and the epoch didn't change between the request and response. For the ISR implementation there is also a race between the network thread and the replica fetcher thread, which truncates the log and appends records to the log. This race can cause the network thread send corrupted records or inconsistent records. The replica fetcher thread already handles the corrupted record case. The inconsistent records case is handle by only appending record batches who's partition leader epoch is less than or equal to the leader epoch in the FETCH request. Reviewers: Jun Rao <junrao@apache.org>, Alyssa Huang <ahuang@confluent.io>, Chia-Ping Tsai <chia7712@apache.org>	2025-04-09 11:15:52 -04:00
Parker Chang	4844bc3067	KAFKA-18984: Reset interval.ms By Using kafka-client-metrics.sh (#19213 ) kafka-client-metrics.sh cannot reset the interval using `--interval=`. Reviewers: Andrew Schofield <aschofield@confluent.io>	2025-04-09 15:42:04 +08:00
PoAn Yang	33fa572a1c	MINOR: remove transform and through from repartition description (#19291 ) `transform` and `through` are removed in 4.0. Since users cannot reference them in 4.0 document, it's not good to keep using them as example in `repartition` description. Reviewers: Matthias J. Sax <matthias@confluent.io>	2025-04-08 18:13:26 -07:00
Xuan-Zhang Gong	8de7b69ced	MINOR: small optimization by judgment (#19386 ) judgments can help avoid unnecessary `segments.sizeInBytes()` loops from https://github.com/apache/kafka/pull/18393/files#r2029925512 Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>	2025-04-07 00:32:33 +08:00
Ayoub Omari	71c9d83b20	KAFKA-16407: Fix foreign key INNER join on change of FK from/to a null value (#19303 ) Fixes both KAFKA-16407 and KAFKA-16434. Summary of existing issues: - We are ignoring new left record when its previous FK value is null - We do not unset foreign key join result when FK becomes null Reviewers: Matthias J. Sax <matthias@confluent.io>	2025-04-05 20:46:56 -07:00
nilmadhab mondal	2c48809fad	KAFKA-18713: Fix FK Left-Join result race condition (#19005 ) When a row in a FK-join left table is updated, we should send a "delete subscription with no response" for the old FK to the right hand side, to avoid getting two responses from the right hand side. Only the "new subscription" for the new FK should request a response. If two responses are requested, there is a race condition for which both responses could be processed in the wrong order, leading to an incorrect join result. This PR fixes the "delete subscription" case accordingly, to no request a response. Reviewers: Matthias J. Sax <matthias@confluent.io>	2025-04-03 16:53:01 -07:00
TengYao Chi	b0b4f42f4c	KAFKA-18067: Add a flag to disable producer reset during active task creator shutting down (#19269 ) JIRA: KAFKA-18067 Fix producer client double-closing issue in Kafka Streams. During StreamThread shutdown, TaskManager closes first, which closes the producer client. Later, calling `unsubscribe` on the main consumer may trigger the `onPartitionsLost` callback, attempting to reset StreamsProducer when EOS is enabled. This causes an already closed producer to be closed twice while the newly created producer is never closed. In detail: This patch adds a flag to control the producer reset and has a new method to change this flag, which is only invoked in `ActiveTaskCreator#close`. This would guarantee that the disable reset producer will only occur when StreamThread shuts down. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias Sax <mjsax@apache.org>	2025-04-03 14:14:26 -07:00
Jorge Esteban Quilcate Otoya	617c96cea4	KAFKA-15931: Cancel RemoteLogReader gracefully (#19331 ) Backports `f24945b519` to 4.0 Instead of reopening the transaction index, it cancels the RemoteFetchTask without interrupting it--avoiding to close the TransactionIndex channel. This will lead to complete the execution of the remote fetch but ignoring the results. Given that this is considered a rare case, we could live with this. If it becomes a performance issue, it could be optimized. Reviewers: Jun Rao <junrao@gmail.com>	2025-04-01 16:22:53 -07:00
PoAn Yang	4dd893ba21	KAFKA-806 Index may not always observe log.index.interval.bytes (#18842 ) Currently, each log.append() will add at most 1 index entry, even when the appended data is larger than log.index.interval.bytes. One potential issue is that if a follower restarts after being down for a long time, it may fetch data much bigger than log.index.interval.bytes at a time. This means that fewer index entries are created, which can increase the fetch time from the consumers. (cherry picked from commit `e124d3975b`) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-03-20 16:38:54 +08:00
David Jacot	16da5a885e	MINOR: Bump to 4.0.1-SNAPSHOT (#19224 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2025-03-18 11:56:02 +01:00
TengYao Chi	4ae681446f	KAFKA-18993 Remove confusing notable change section from upgrade.html (#19212 ) Currently, the "Notable changes in 4.0.0" for the client is very confusing. We should remove it. Reviewers: mingdaoy <mingdaoy@gmail.com>, Luke Chen <showuon@gmail.com>, Ken Huang <s7133700@gmail.com>, David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>	2025-03-16 02:53:14 +08:00
Matthias J. Sax	3e791d7d48	KAFKA-18943: Kafka Streams incorrectly commits TX during task revokation (#19164 ) Fixes two issues: - only commit TX if no revoked tasks need to be committed - commit revoked tasks after punctuation triggered Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Anna Sophie Blee-Goldman <sophie@responsive.dev>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bill@confluent.io>	2025-03-13 16:56:31 -07:00
José Armando García Sancio	4990c970c1	KAFKA-18979; Report correct kraft.version in ApiVersions (#19205 ) Skip kraft.version when applying FeatureLevelRecord records. The kraft.version is stored as control records and not as metadata records. This solution has the benefits of removing from snapshots any FeatureLevelRecord for kraft.version that was incorrectly written to the log and allows ApiVersions to report the correct finalized kraft.version. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2025-03-13 18:44:52 -04:00

1 2 3 4 5 ...

14971 Commits All Branches Search

14971 Commits

All Branches