kafka

Commit Graph

Author	SHA1	Message	Date
Lucas Brutschy	07a18478be	KAFKA-15326: [7/N] Processing thread non-busy waiting (#14180 ) Avoid busy waiting for processable tasks. We need to be a bit careful here to not have the task executors to sleep when work is available. We have to make sure to signal on the condition variable any time a task becomes "processable". Here are some situations where a task becomes processable: - Task is unassigned from another TaskExecutor. - Task state is changed (should only happen inside when a task is locked inside the polling phase). - When tasks are unlocked. - When tasks are added. - New records available. - A task is resumed. So in summary, we - We should probably lock tasks when they are paused and unlock them when they are resumed. We should also wake the task executors after every polling phase. This belongs to the StreamThread integration work (separate PR). We add DefaultTaskManager.signalProcessableTasks for this. - We need to awake the task executors in DefaultTaskManager.unassignTask, DefaultTaskManager.unlockTasks and DefaultTaskManager.add. Reviewers: Walker Carlson <wcarlson@confluent.io>, Bruno Cadonna <cadonna@apache.org>	2023-09-11 09:58:20 +02:00
Ruslan Krivoshein	b72d92919f	KAFKA-14581: Moving GetOffsetShell to tools (#13562 ) This PR moves GetOffsetShell from core module to tools module with rewriting from Scala to Java. Reviewers: Federico Valeri fedevaleri@gmail.com, Ziming Deng dengziming1993@gmail.com, Mickael Maison mimaison@apache.org.	2023-09-11 10:30:22 +08:00
Kamal Chandraprakash	672ea644f0	MINOR: Removed the RSM and RLMM classpath config validator (#14358 ) - RSM and RLMM classpath can be empty since it's optional so removed the non-empty string validator - Fix getting the `localTieredStorage` by brokerId after stopping a broker. Reviewers: Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>	2023-09-09 19:02:42 +05:30
David Arthur	b24ccd65b7	KAFKA-15441 Allow broker heartbeats to complete in metadata transaction (#14351 ) This patch allows broker heartbeat events to be completed while a metadata transaction is in-flight. More generally, this patch allows any RUNS_IN_PREMIGRATION event to complete while the controller is in pre-migration mode even if the migration transaction is in-flight. We had a problem with broker heartbeats timing out because they could not be completed while a large ZK migration transaction was in-flight. This resulted in the controller fencing all the ZK brokers which has many undesirable downstream effects. Reviewers: Akhilesh Chaganti <akhileshchg@users.noreply.github.com>, Colin Patrick McCabe <cmccabe@apache.org>	2023-09-08 16:36:13 -04:00
Nikhil Ramakrishnan	84c49c6a09	MINOR: Log when remote fetch execution is rejected (#14350 ) Log at warning level when remote fetch execution is rejected. This can be potentially useful to troubleshoot UNKNOWN_SERVER_ERROR responses for remote fetch requests. Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>	2023-09-08 19:51:12 +08:00
Lucia Cerchie	17862ffaf2	KAFKA-15418: update Kafka design docs with decompression information (#14322 ) Reviewers: Divij Vaidya <diviv@amazon.com>, Matthias J. Sax <mjsax@apache.org> --------- Co-authored-by: Cerchie <lcerchie@confluent.io>	2023-09-08 11:58:32 +02:00
Lucas Brutschy	01b91af59c	MINOR: fix currentLag javadoc (#14224 ) Reviewers: Matthias J. Sax <matthias@confluent.io>	2023-09-07 19:25:31 -07:00
atu-sharm	9ecf6f7f1c	KAFKA-15338: The metric group documentation for metrics added in KAFKA-13945 is incorrect (#14221 ) Reviewers: Matthias J. Sax <matthias@confluent.io>	2023-09-07 19:05:14 -07:00
Chris Egerton	54ab5b29e4	KAFKA-15416: Fix flaky TopicAdminTest::retryEndOffsetsShouldRetryWhenTopicNotFound test case (#14313 ) Reviewers: Philip Nee <pnee@confluent.io>, Greg Harris <greg.harris@aiven.io>	2023-09-07 19:24:17 -04:00
José Armando García Sancio	7b669e8806	KAFKA-14273; Close file before atomic move (#14354 ) In the Windows OS atomic move are not allowed if the file has another open handle. E.g __cluster_metadata-0\quorum-state: The process cannot access the file because it is being used by another process at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:403) at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:293) at java.base/java.nio.file.Files.move(Files.java:1430) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:949) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:932) at org.apache.kafka.raft.FileBasedStateStore.writeElectionStateToFile(FileBasedStateStore.java:152) This is fixed by first closing the temporary quorum-state file before attempting to move it. Reviewers: Colin Patrick McCabe <cmccabe@apache.org> Co-Authored-By: Renaldo Baur Filho <renaldobf@gmail.com>	2023-09-07 16:17:03 -07:00
Kirk True	a2de7d32c8	KAFKA-14274 #1 : basic refactoring (#14305 ) This change introduces some basic clean up and refactoring for forthcoming commits related to the revised fetch code for the consumer threading refactor project. Reviewers: Christo Lolov <lolovc@amazon.com>, Jun Rao <junrao@gmail.com>	2023-09-07 15:23:44 -07:00
Colin Patrick McCabe	41b695b6e3	KAFKA-15369: Implement KIP-919: Allow AC to Talk Directly with Controllers (#14306 ) Implement KIP-919: Allow AdminClient to Talk Directly with the KRaft Controller Quorum and add Controller Registration. This KIP adds a new version of DescribeClusterRequest which is supported by KRaft controllers. It also teaches AdminClient how to use this new DESCRIBE_CLUSTER request to talk directly with the controller quorum. This is all gated behind a new MetadataVersion, IBP_3_7_IV0. In order to share the DESCRIBE_CLUSTER logic between broker and controller, this PR factors it out into AuthHelper.computeDescribeClusterResponse. The KIP adds three new errors codes: MISMATCHED_ENDPOINT_TYPE, UNSUPPORTED_ENDPOINT_TYPE, and UNKNOWN_CONTROLLER_ID. The endpoint type errors can be returned from DescribeClusterRequest On the controller side, the controllers now try to register themselves with the current active controller, by sending a CONTROLLER_REGISTRATION request. This, in turn, is converted into a RegisterControllerRecord by the active controller. ClusterImage, ClusterDelta, and all other associated classes have been upgraded to propagate the new metadata. In the metadata shell, the cluster directory now contains both broker and controller subdirectories. QuorumFeatures previously had a reference to the ApiVersions structure used by the controller's NetworkClient. Because this PR removes that reference, QuorumFeatures now contains only immutable data. Specifically, it contains the current node ID, the locally supported features, and the list of quorum node IDs in the cluster. Reviewers: David Arthur <mumrah@gmail.com>, Ziming Deng <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>	2023-09-07 15:21:52 -07:00
Chris Egerton	77a91be22e	KAFKA-15425: Fail fast in Admin::listOffsets when topic (but not partition) metadata is not found (#14314 ) This restores previous behavior for Admin::listOffsets, which was to fail immediately if topic metadata could not be found, and only retry if metadata for one or more specific partitions could not be found. There is a subtle difference here: prior to https://github.com/apache/kafka/pull/13432, the operation would be retried if any metadata error was reported for any individual topic partition, even if an error was also reported for the entire topic. With this change, the operation always fails if an error is reported for the entire topic, even if an error is also reported for one or more individual topic partitions. I am not aware of any cases where brokers might return both topic- and topic partition-level errors for a metadata request, and if there are none, then this change should be safe. However, if there are such cases, we may need to refine this PR to remove the discrepancy in behavior. Reviewers: Justine Olshan <jolshan@confluent.io>	2023-09-07 14:02:57 -07:00
Lucia Cerchie	01c7c7a399	KAFKA-15307: Removes non-existent configs (#14341 ) `partition.grouper` was removed in 3.0 release. Reviewers: Matthias J. Sax <matthias@confluent.io>	2023-09-07 12:58:13 -07:00
Nikolay	0029bc4897	KAFKA-14595: ReassignPartitionsCommandArgsTest rewritten in java (#14217 ) Reviewers: Taras Ledkov <tledkov@apache.org>, Greg Harris <greg.harris@aiven.io>	2023-09-07 10:12:07 -07:00
Yash Mayya	88b554fdbd	KAFKA-15179: Add integration tests for the file sink and source connectors (#14279 ) Reviewers: Ashwin Pankaj <apankaj@confluent.io>, Chris Egerton <chrise@aiven.io>	2023-09-07 12:24:13 -04:00
Kamal Chandraprakash	6e818c6b02	KAFKA-15410: Delete records with tiered storage integration test (4/4) (#14330 ) * Added the integration test for DELETE_RECORDS API for tiered storage enabled topic * Added validation checks before removing remote log segments for log-start-offset breach Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>	2023-09-07 21:02:39 +05:30
Luke Chen	cd897e6c76	MINOR: Update the javadoc in RSM (#14352 ) Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>	2023-09-07 20:55:11 +05:30
Max Riedel	90e646052a	KAFKA-14509; [1/2] Define ConsumerGroupDescribe API request and response schemas and classes. (#14124 ) This patch adds the schemas of the new ConsumerGroupDescribe API (KIP-848) and adds an handler to the KafkaApis. Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>	2023-09-07 08:05:04 -07:00
Kamal Chandraprakash	6d762480c9	KAFKA-15351: Update log-start-offset after leader election for topics enabled with remote storage (#14340 ) On leadership failover, the new leader's start offset may be older than the start offset of old leader. This works fine for local storage scenario because the new leader still contains data associated with stale start offset. But in case of remote storage, although new leader has a stale offset, the data associated with it has been deleted from remote by the old leader. Hence, we end up in a situation where leader has a start offset but no data associated with it. This commit fixes the situation by ensuring that on every leadership failover, for topics with remote storage, the leader will update it's start offset from the base of first segment in current leader chain present in the remote storage (if any). Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>, Divij Vaidya <diviv@amazon.com>	2023-09-07 16:32:16 +02:00
David Arthur	65e2ecffab	KAFKA-15435 Fix counts in MigrationManifest (#14342 ) Reviewers: Liu Zeyu <zeyu.luke@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2023-09-06 13:02:13 -04:00
Lucas Brutschy	eb39c95080	MINOR: StoreChangelogReaderTest fails with log-level DEBUG (#14300 ) A mocked method is executed unexpectedly when we enable DEBUG log level, leading to confusing test failures during debugging. Since the log message itself seems useful, we adapt the test to take the additional mocked method call into account). Reviewer: Bruno Cadonna <cadonna@apache.org>	2023-09-06 14:49:48 +02:00
Ritika Reddy	cc289d04c7	MINOR: Fix trailing white spaces on reviewers.py (#14343 ) Fixing trailing white spaces on reviewers.py. Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Jacot <djacot@confluent.io>	2023-09-06 00:28:23 -07:00
David Jacot	7054625c45	KAFKA-14499: [6/N] Add MemberId and MemberEpoch to OffsetFetchRequest (#14321 ) This patch adds the MemberId and the MemberEpoch fields to the OffsetFetchRequest. Those fields will be populated when the new consumer group protocol is used to ensure that the member fetching the offset has the correct member id and epoch. If it does not, UNKNOWN_MEMBER_ID or STALE_MEMBER_EPOCH are returned to the client. Our initial idea was to implement the same for the old protocol. The field is called GenerationIdOrMemberEpoch in KIP-848 to materialize this. As a second though, I think that we should only do it for the new protocol. The effort to implement it in the old protocol is not worth it in my opinion. Reviewers: Ritika Reddy <rreddy@confluent.io>, Calvin Liu <caliu@confluent.io>, Justine Olshan <jolshan@confluent.io>	2023-09-05 23:36:38 -07:00
Kamal Chandraprakash	80982c4ae3	KAFKA-15410: Delete topic integration test with LocalTieredStorage and TBRLMM (3/4) (#14329 ) Added delete topic integration tests for tiered storage enabled topics with LocalTieredStorage and TBRLMM Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>	2023-09-06 05:50:12 +05:30
Andrew Schofield	b49013b73e	KAFKA-9800: Exponential backoff for Kafka clients - KIP-580 (#14111 ) Implementation of KIP-580 to add exponential back-off to situations in which retry.backoff.ms is used to delay backoff attempts. This KIP adds exponential backoff behavior with a maximum controlled by a new config retry.backoff.max.ms, together with a +/- 20% of jitter to spread the retry attempts of the client fleet. Reviewers: Mayank Shekhar Narula <mayanks.narula@gmail.com>, Milind Luthra <i.milind.luthra@gmail.com>, Kirk True <kirk@mustardgrain.com>, Jun Rao<junrao@gmail.com>	2023-09-05 11:57:51 -07:00
Yash Mayya	1f473ebb5e	KAFKA-14876: Add stopped state to Kafka Connect Administration docs section (#14336 ) Reviewers: Chris Egerton <chrise@aiven.io>	2023-09-05 14:39:49 -04:00
Yash Mayya	79598b49d6	MINOR: Update the documentation's table of contents to add missing headings for Kafka Connect (#14337 ) Reviewers: Chris Egerton <chrise@aiven.io>	2023-09-05 13:58:44 -04:00
Abhijeet Kumar	37a51e286d	KAFKA-15293 Added documentation for tiered storage metrics (#14331 ) Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>	2023-09-05 22:19:10 +05:30
Luke Chen	be0de2124a	MINOR: Update comment in consumeAction (#14335 ) Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>	2023-09-05 21:36:28 +05:30
Kamal Chandraprakash	9f2ac375c2	KAFKA-15410: Reassign replica expand, move and shrink integration tests (2/4) (#14328 ) - Updated the log-start-offset to the correct value while building the replica state in ReplicaFetcherTierStateMachine#buildRemoteLogAuxState Integration tests added: 1. ReassignReplicaExpandTest 2. ReassignReplicaMoveTest and 3. ReassignReplicaShrinkTest Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>	2023-09-05 19:28:17 +05:30
Justine Olshan	b892acae5e	KAFKA-15424: Make the transaction verification a dynamic configuration (#14324 ) This will allow enabling and disabling transaction verification (KIP-890 part 1) without having to roll the cluster. Tested that restarting the cluster persists the configuration. If a verification is disabled/enabled while we have an inflight request, depending on the step of the process, the change may or may not be seen in the inflight request (enabling will typically fail unverified requests, but we may still verify and reject when we first disable) Subsequent requests/retries will behave as expected for verification. Sequence checks will continue to take place after disabling until the first message is written to the partition (thus clearing the verification entry with the tentative sequence) or the broker restarts/partition is reassigned which will clear the memory. On enabling, we will only track sequences that for requests received after the verification is enabled. Reviewers: Jason Gustafson <jason@confluent.io>, Satish Duggana <satishd@apache.org>	2023-09-04 20:40:50 -07:00
Kamal Chandraprakash	caaa4c55fe	KAFKA-15410: Expand partitions, segment deletion by retention and enable remote log on topic integration tests (1/4) (#14307 ) Added the below integration tests with tiered storage - PartitionsExpandTest - DeleteSegmentsByRetentionSizeTest - DeleteSegmentsByRetentionTimeTest and - EnableRemoteLogOnTopicTest - Enabled the test for both ZK and Kraft modes. These are enabled for both ZK and Kraft modes. Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>, Divij Vaidya <diviv@amazon.com>	2023-09-05 05:13:16 +05:30
Yash Mayya	d34d84dbef	KAFKA-7438: Migrate WindowStoreBuilderTest from EasyMock to Mockito (#14152 ) Reviewers: Divij Vaidya <diviv@amazon.com>	2023-09-04 13:54:18 +02:00
Christo Lolov	7a516b0386	KAFKA-14133: Move AbstractStreamTest and RocksDBMetricsRecordingTriggerTest to Mockito (#14223 ) Reviewers: Divij Vaidya <diviv@amazon.com>	2023-09-04 12:58:50 +02:00
Dimitar Dimitrov	78c59cd2b0	KAFKA-15052 Fix the flaky testBalancePartitionLeaders - part II (#13908 ) A follow-up to https://github.com/apache/kafka/pull/13804. This follow-up adds the alternative fix approach mentioned in the PR above - bumping the session timeout used in the test with 1 second. Reproducing the flake-out locally has been much harder than on the CI runs, as neither Gradle with Java 11 or Java 14 nor IntelliJ with Java 14 could show it, but IntelliJ with Java 11 could occasionally reproduce the failure the first time immediately after a rebuild. While I was unable to see the failure with the bumped session timeout, the testing procedure definitely didn't provide sufficient reassurance for the fix as even without it often I'd see hundreds of consecutive successful test runs when the first run didn't fail. Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>	2023-09-04 17:02:32 +08:00
Proven Provenzano	a6409e8e61	KAFKA-15422: Update documenttion for delegation tokens when working with Kafka with KRaft (#14318 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2023-09-04 14:16:12 +05:30
Abhijeet Kumar	5074c8038e	KAFKA-15260: RLM Task should handle uninitialized RLMM for the associated topic-parititon (#14113 ) This change is about RLM task handling retriable exception when it tries to copy segments to remote but the RLMM is not yet initialized. On encountering the exception, we log the error and throw the exception back to the caller. We also make sure that the failure metrics are updated since this is a temporary error because RLMM is not yet initialized. Added unit tests to verify RLM task does not attempt to copy segments to remote on encountering the retriable exception and that failure metrics remain unchanged. Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>	2023-09-04 09:13:04 +05:30
Luke Chen	da99879df7	KAFKA-15421: fix network thread leak in testThreadPoolResize (#14320 ) In SocketServerTest, we create SocketServer and enableRequestProcessing on each test class initialization. That's fine since we shutdown it in @AfterEach. The issue we have is we disabled 2 tests in this test suite. And when running these disabled tests, we will go through class initialization, but without @AfterEach. That causes 2 network thread leaked. Compared the error message in DynamicBrokerReconfigurationTest#testThreadPoolResize test here: org.opentest4j.AssertionFailedError: Invalid threads: expected 6, got 8: List(data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, data-plane-kafka-socket-acceptor-ListenerName(PLAINTEXT)-PLAINTEXT-0, data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, data-plane-kafka-socket-acceptor-ListenerName(PLAINTEXT)-PLAINTEXT-0, data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0) ==> expected: <true> but was: <false> The 2 unexpected network threads are leaked from SocketServerTest. Reviewers: Satish Duggana <satishd@apache.org>, Christo Lolov <lolovc@amazon.com>, Divij Vaidya <diviv@amazon.com>, Kamal Chandraprakash <kchandraprakash@uber.com>, Chris Egerton <chrise@aiven.io>	2023-09-03 16:16:54 +08:00
Rohan	cc53889aaa	KAFKA-15429: reset transactionInFlight on StreamsProducer close (#14326 ) Resets the value of transactionInFlight to false when closing the StreamsProducer. This ensures we don't try to commit against a closed producer Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2023-09-02 18:14:14 -07:00
Rohan	d293cd0735	KAFKA-15429: catch+log errors from unsubscribe in streamthread shutdown (#14325 ) Preliminary fix for KAFKA-15429 which updates StreamThread.completeShutdown to catch-and-log errors from consumer.unsubscribe. Though this does not prevent the exception, it does preserve the original exception that caused the stream thread to exit. Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>	2023-09-02 18:13:16 -07:00
Lianet Magrans	1bb8c11f5a	KAFKA-14965 - OffsetsRequestsManager implementation & API integration (#14308 ) Implementation of the OffsetRequestsManager, responsible for building requests and processing responses for requests related to partition offsets. In this PR, the manager includes support for ListOffset requests, generated when the user makes any of the following consumer API calls: beginningOffsets endOffsets offsetsForTimes All previous consumer API calls interact with the OffsetsRequestsManager by generating a ListOffsetsApplicationEvent. Includes tests to cover the new functionality and to ensure no API level changes are introduced. This covers KAFKA-14965 and KAFKA-15081. Reviewers: Philip Nee <pnee@confluent.io>, Kirk True <kirk@mustardgrain.com>, Jun Rao<junrao@gmail.com>	2023-09-01 13:57:17 -07:00
Christo Lolov	134f6c07a4	KAFKA-15427: Fix resource leak in integration tests for tiered storage (#14319 ) Co-authored-by: Nikhil Ramakrishnan <nikrmk@amazon.com> Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>	2023-09-01 23:12:57 +05:30
Jeff Kim	6391c66035	KAFKA-14500; [7/7] Refactor GroupMetadataManagerTest (#14122 ) This patch makes the styling consistent inside GroupMetadataManagerTest. Also, it adds JoinResult to simplify the JoinGroup API responses in the tests. Reviewers: David Arthur <mumrah@gmail.com>, David Jacot <djacot@confluent.io>	2023-09-01 06:36:33 -07:00
David Jacot	dcff0878c4	KAFKA-14499: [5/N] Refactor GroupCoordinator.fetchOffsets and GroupCoordinator.fetchAllOffsets (#14310 ) This patch refactors the GroupCoordinator.fetchOffsets and GroupCoordinator.fetchAllOffsets methods to take an OffsetFetchRequestGroup and to return an OffsetFetchResponseGroup. It prepares the ground for adding the member id and the member epoch to the OffsetFetchRequest. This change also makes those two methods more aligned with the others in the interface. Reviewers: Calvin Liu <caliu@confluent.io>, Justine Olshan <jolshan@confluent.io>	2023-09-01 03:45:24 -07:00
Kamal Chandraprakash	d0f3cf1f9f	KAFKA-15351: Ensure log-start-offset not updated to local-log-start-offset when remote storage enabled (#14301 ) When tiered storage is enabled on the topic, and the last-standing-replica is restarted, then the log-start-offset should not reset its offset to first-local-log-segment-base-offset. Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>	2023-09-01 06:33:33 +05:30
Lucas Brutschy	16dc983ad6	Kafka Streams Threading: Timeout behavior (#14171 ) Implement setting and clearing task timeouts, as well as changing the output on exceptions to make it similar to the existing code path. Reviewer: Walker Carlson <wcarlson@apache.org>	2023-08-31 15:21:01 -05:00
Kamal Chandraprakash	43fe13350f	KAFKA-15404: Disable the flaky integration tests. (#14296 ) Disabled the below tests to fix the thread leak: 1. kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize() and 2. org.apache.kafka.tiered.storage.integration.OffloadAndConsumeFromLeaderTest Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Justine Olshan <jolshan@confluent.io>	2023-08-31 11:39:26 -07:00
Luke Chen	c2bb8eb875	MINOR: Close topic based RLMM correctly in integration tests (#14315 ) Reviewers: Divij Vaidya <diviv@amazon.com>	2023-08-31 10:44:32 +02:00
A. Sophie Blee-Goldman	95e1cdc4ef	HOTFIX: avoid placement of unnecessary transient standby tasks & improve assignor logging (#14149 ) Minor fix to avoid creating unnecessary standby tasks, especially when these may be surprising or unexpected as in the case of an application with num.standby.replicas = 0 and warmup replicas disabled. The "bug" here was introduced during the fix for an issue with cooperative rebalancing and in-memory stores. The fundamental problem is that in-memory stores cannot be unassigned from a consumer for any period, however temporary, without being closed and losing all the accumulated state. This caused some grief when the new HA task assignor would assign an active task to a node based on the readiness of the standby version of that task, but would have to remove the active task from the initial assignment so it could first be revoked from its previous owner, as per the cooperative rebalancing protocol. This temporary gap in any version of that task among the consumer's assignment for that one intermediate rebalance would end up causing the consumer to lose all state for it, in the case of in-memory stores. To fix this, we simply began to place standby tasks on the intended recipient of an active task awaiting revocation by another consumer. However, the fix was a bit of an overreach, as we assigned these temporary standby tasks in all cases, regardless of whether there had previously been a standby version of that task. We can narrow this down without sacrificing any of the intended functionality by only assigning this kind of standby task where the consumer had previously owned some version of it that would otherwise potentially be lost. Also breaks up some of the long log lines in the StreamsPartitionAssignor and expands the summary info while moving it all to the front of the line (following reports of missing info due to truncation of long log lines in larger applications)	2023-08-30 13:29:38 -07:00

... 3 4 5 6 7 ...

11852 Commits All Branches Search

11852 Commits

All Branches