kafka

Commit Graph

Author	SHA1	Message	Date
Mickael Maison	74be72a559	MINOR: Various fixes in the docs (#14914 ) - Only use https links - Fix broken HTML tags - Replace usage of <tt> which is deprecated with <code> - Replace hardcoded version numbers Reviewers: Chris Egerton <fearthecellos@gmail.com>, Greg Harris <gharris1727@gmail.com>	2023-12-04 22:06:49 +01:00
Apoorv Mittal	7a6d2664cd	KAFKA-15663, KAFKA-15794: Telemetry reporter and request handling (KIP-714) (#14909 ) Part of KIP-714. Implements ClientTelemetryReporter which manages the lifecycle for client metrics collection. The reporter also defines TelemetrySender which will be used by Network clients to send API calls to broker. Reviewers: Andrew Schofield <aschofield@confluent.io>, Philip Nee <pnee@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2023-12-04 11:44:56 -08:00
David Jacot	ddf99880d7	MINOR: Fix ConsumerNetworkThread shutdown (#14913 ) This patch fixes a race condition in the shutdown logic of the `ConsumerNetworkThread`. The `running` variable could be set to `true` after `closeInternal` was called. Reviewers: Andrew Schofield <aschofield@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>	2023-12-04 11:01:59 -08:00
David Mao	bbe87322e6	MINOR: Fix flaky test RefreshingHttpsJwksTest.testBasicScheduleRefresh (#14888 ) This test is flaky because maybeExpediteRefresh schedules a refresh in a background thread. Instead pass through a mock executor service so that the refresh is executed directly. --------- Co-authored-by: ashwinpankaj <appankaj@amazon.com> Reviewers: Ashwin Pankaj <apankaj@confluent.io>, Kirk True <ktrue@confluent.io>, Justine Olshan <jolshan@confluent.io>	2023-12-04 09:52:38 -08:00
Christo Lolov	d4c95cfc2a	KAFKA-14133: Migrate ProcessorStateManagerTest and StreamThreadTest to Mockito (#13932 ) This pull request is an attempt to get what has started in #12524 to completion as part of the Streams project migration to Mockito. Reviewers: Divij Vaidya <diviv@amazon.com>, Bruno Cadonna <cadonna@apache.org>	2023-12-04 18:37:57 +01:00
Colin Patrick McCabe	397582678b	MINOR: fix BrokerRegistrationRequest broken by KAFKA-15922 (#14887 ) Reviewers: David Arthur <mumrah@gmail.com>, Justine Olshan <jolshan@confluent.io>	2023-12-04 09:22:35 -08:00
Max Riedel	b7c99e22a7	KAFKA-14509: [2/N] Implement server side logic for ConsumerGroupDescribe API (#14544 ) This patch implements the ConsumerGroupDescribe API. Reviewers: David Jacot <djacot@confluent.io>	2023-12-04 07:19:28 -08:00
Andras Katona	270be2dea5	MINOR: Upgrade jetty to 9.4.53.v20231009 (#14877 )	2023-12-04 10:54:27 +01:00
Andrew Schofield	b6571a5f44	MINOR: Experimentally turn off consumer integration tests using new consumer (#14904 ) This is part of the investigation into recent build instability. It simply turns off the consumer integration tests that use the new AsyncKafkaConsumer to see whether the build runs smoothly. Reviewers: David Jacot <djacot@confluent.io>	2023-12-04 01:18:29 -08:00
Bruno Cadonna	0cf227dd4f	KAFKA-14438: Throw if async consumer configured with invalid group ID (#14872 ) Verifies that the group ID passed into the async consumer is valid. That is, if the group ID is not null, it is not empty or it does not consist of only whitespaces. This change stores the group ID in the group metadata because KAFKA-15281 about the group metadata API will build on that. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Kirk True <ktrue@confluent.io>	2023-12-03 23:11:41 +01:00
Andrew Schofield	bce2d4a8b6	KAFKA-15953: Refactor polling delays (#14897 ) Caches the maximum time to wait in the consumer network thread so the application thread is better isolated from the request managers. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>	2023-12-03 23:09:12 +01:00
Lucas Brutschy	59ac9be21c	HOTFIX: fix ConsistencyVectorIntegrationTest failure (#14895 ) #14570 changed the result for KeyQuery from ValueAndTimestamp<V> to V, but forgot to update ConsistencyVectorIntegrationTest accordingly.	2023-12-03 23:06:41 +01:00
Matthias J. Sax	1a2f74be67	MINOR: fix typo	2023-12-01 15:39:32 -08:00
Matthias J. Sax	b22bbd656c	MINOR: cleanup internal Iterator impl (#14889 ) makeNext() is internal and visibility should not be extended to `public` Reviewers: Walker Carlson <wcarlson@confluent.io>	2023-12-01 11:53:07 -08:00
Lucas Brutschy	bfee3b3c6b	KAFKA-15690: Fix restoring tasks on partition loss, flaky EosIntegrationTest (#14869 ) The following race can happen in the state updater code path Task is restoring, owned by state updater We fall out of the consumer group, lose all partitions We therefore register a "TaskManager.pendingUpdateAction", to CLOSE_DIRTY We also register a "StateUpdater.taskAndAction" to remove the task We get the same task reassigned. Since it's still owned by the state updater, we don't do much The task completes restoration The "StateUpdater.taskAndAction" to remove will be ignored, since it's already restored Inside "handleRestoredTasksFromStateUpdater", we close the task dirty because of the pending update action We now have the task assigned, but it's closed. To fix this particular race, we cancel the "close" pending update action. Furthermore, since we may have made progress in other threads during the missed rebalance, we need to add the task back to the state updater, to at least check if we are still at the end of the changelog. Finally, it seems we do not need to close dirty here, it's enough to close clean when we lose the task, related to KAFKA-10532. This should fix the flaky EOSIntegrationTest. Reviewers: Bruno Cadonna <cadonna@apache.org>	2023-12-01 18:57:27 +01:00
Jason Gustafson	a701c0e04f	MINOR: Fix flaky `DescribeClusterRequestTest.testDescribeClusterRequestIncludingClusterAuthorizedOperations` (#14890 ) Test startup does not assure that all brokers are registered. In flaky failures, the `DescribeCluster` API does not return a complete list of brokers. To fix the issue, we add a call to `ensureConsistentKRaftMetadata()` to ensure that all brokers are registered and have caught up to current metadata. Reviewers: David Jacot <djacot@confluent.io>	2023-12-01 09:33:17 -08:00
Jeff Kim	ba49006561	MINOR: disable test_transactions with new group coordinator https://issues.apache.org/jira/browse/KAFKA-14505 is not done yet so we need to disable the system test. Added a comment in the jira to re-enable once it's implemented. Reviewers: Justine Olshan <jolshan@confluent.io>	2023-12-01 08:47:12 -08:00
Andrew Schofield	21edb70788	KAFKA-15890: Consumer.poll with long timeout unaware of assigned partitions (#14835 ) In the new consumer, Consumer.poll(Duration timeout) blocks for the entire duration. If the consumer is joining a group and has not yet received its assignments, the poll begins before an assignment has yet been received. Because the poll is blocked, it does not notice when partitions are assigned, and it subsequently does not return any records. The old consumer only blocks for the duration of the heartbeat interval and loops for until the poll timeout has passed, and is thus able to check for assignments received. When this problem has been fixed, there remains another which prevents the group becoming stable. Because the consumer repeatedly sends the list of topic-partitions that it has been assigned to the group coordinator, the coordinator responds with the list of topic-partitions, which causes the consumer to remain reconciling indefinitely. By making the building of ConsumerGroupHeartbeatRequest stateful, the loop is ended and the group becomes stable as expected. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Kirk True <ktrue@confluent.io>, Lianet Magrans <lianetmr@gmail.com>	2023-12-01 15:41:30 +01:00
Andrew Schofield	1750d735cd	KAFKA-15842: Correct handling of KafkaConsumer.committed for new consumer (#14859 ) This PR fixes some details of the interface to KafkaConsumer.committed which were different between the existing consumer and the new consumer. Adds a unit test that validates the behaviour is the same for both consumer implementations. Reviewers: Kirk True <ktrue@confluent.io>, Bruno Cadonna <cadonna@apache.org>	2023-12-01 14:37:21 +01:00
David Jacot	5fdfb3afaf	MINOR: Disable FetchFromFollowerIntegrationTest.testRackAwareRangeAssignor (#14876 ) `FetchFromFollowerIntegrationTest.testRackAwareRangeAssignor` is extremely flaky and we have never been able to fix it. This patch disables it until we find a solution to make it reliable with https://issues.apache.org/jira/browse/KAFKA-15020. Reviewers: Stanislav Kozlovski <stanislav@confluent.io>	2023-12-01 00:05:46 -08:00
Ismael Juma	db308a9fe5	MINOR: Upgrade to gradle 8.5 (#14883 ) Reviewers: Satish Duggana <satishd@apache.org>	2023-12-01 09:35:45 +05:30
Igor Soarez	6b87c85291	KAFKA-15886: Always specify directories for new partition registrations When creating partition registrations directories must always be defined. If creating a partition from a PartitionRecord or PartitionChangeRecord from an older version that does not support directory assignments, then DirectoryId.MIGRATING is assumed. If creating a new partition, or triggering a change in assignment, DirectoryId.UNASSIGNED should be specified, unless the target broker has a single online directory registered, in which case the replica should be assigned directly to that single directory. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-11-30 14:10:47 -08:00
Hanyu Zheng	f1cd11dcc5	KAFKA-15629: Proposal to introduce IQv2 Query Types: TimestampedKeyQuery and TimestampedRangeQuery (#14570 ) Implements KIP-992. Adds TimestampedKeyQuery and TimestampedRangeQuery (IQv2) for ts-ks-store, plus changes semantics of existing KeyQuery and RangeQuery if issues against a ts-kv-store, now unwrapping value-and-timestamp and only returning the plain value. Reviewers: Matthias J. Sax <matthias@confluent.io>	2023-11-30 12:14:23 -08:00
Luke Chen	37416e1aeb	KAFKA-15489: resign leadership when no fetch or fetch snapshot from majority voters (#14428 ) In KIP-595, we expect to piggy-back on the `quorum.fetch.timeout.ms` config, and if the leader did not receive Fetch requests from a majority of the quorum for that amount of time, it would begin a new election, to resolve the network partition in the quorum. But we missed this implementation in current KRaft. Fixed it in this PR. The commit include: 1. Added a timer with timeout configuration in `LeaderState`, and check if expired each time when leader is polled. If expired, resigning the leadership and start a new election. 2. Added `fetchedVoters` in `LeaderState`, and update the value each time received a FETCH or FETCH_SNAPSHOT request, and clear it and resets the timer if the majority - 1 of the remote voters sent such requests. Reviewers: José Armando García Sancio <jsancio@apache.org>	2023-11-30 11:34:44 -08:00
Colin Patrick McCabe	a94bc8d6d5	KAFKA-15922: Add a MetadataVersion for JBOD (#14860 ) Assign MetadataVersion.IBP_3_7_IV2 to JBOD. Move KIP-966 support to MetadataVersion.IBP_3_7_IV3. Create MetadataVersion.LATEST_PRODUCTION as the latest metadata version that can be used when formatting a new cluster, or upgrading a cluster using kafka-features.sh. This will allow us to clearly distinguish between stable and unstable metadata versions for the first time. Reviewers: Igor Soarez <soarez@apple.com>, Ron Dagostino <rndgstn@gmail.com>, Calvin Liu <caliu@confluent.io>, Proven Provenzano <pprovenzano@confluent.io>	2023-11-30 10:35:13 -08:00
Jason Gustafson	a35e021925	MINOR: Fix flaky `MetadataLoaderTest.testNoPublishEmptyImage` (#14875 ) There is a race in the assertion on `capturedImages`. Since the future is signaled first, it is still possible to see an empty list. By adding to the collection first, we can ensure the assertion will succeed. Reviewers: Reviewers: David Jacot <djacot@confluent.io>	2023-11-30 09:50:19 -08:00
Nick Telford	96b43bf16f	KAFKA-14412: Add ProcessingThread tag interface (#14839 ) This interface provides a common supertype for `StreamThread` and `DefaultTaskExecutor.TaskExecutorThread`, which will be used by KIP-892 to differentiate between "processing" threads and interactive query threads. This is needed because `DefaultTaskExecutor.TaskExecutorThread` is `private`, so cannot be seen directly from `RocksDBStore`. Reviewer: Bruno Cadonna <cadonna@apache.org>	2023-11-30 09:44:02 +01:00
Jason Gustafson	085f1d340b	MINOR: No need for response callback when applying controller mutation throttle (#14861 ) With `AbstractResponse.maybeSetThrottleTimeMs`, we don't need to use a callback to build the response with the respective throttle. Reviewers: David Jacot <djacot@confluent.io>	2023-11-29 16:33:05 -08:00
Colin Patrick McCabe	bd18551b32	MINOR: DirectoryId.MIGRATING should be all zeros (#14858 ) DirectoryId.MIGRATING should be all zeros. All zeros is the default Uuid value in KPRC, and MIGRATING is the default directory ID value. Reviewers: Ron Dagostino <rdagostino@confluent.io>	2023-11-29 13:12:33 -08:00
Greg Harris	9f896ed6c9	KAFKA-15816: Fix leaked sockets in streams tests (#14769 ) Signed-off-by: Greg Harris <greg.harris@aiven.io> Reviewers: Matthias J. Sax <mjsax@apache.org>	2023-11-29 11:53:34 -08:00
Hao Li	e7b9bd5a26	KAFKA-15022: add config for balance subtopology in rack aware task assignment (#14711 ) Part of KIP-925. Reviewers: Matthias J. Sax <matthias@confluent.io>	2023-11-29 11:33:52 -08:00
Lucas Brutschy	c0ec8131d8	KAFKA-15865: Remove autocommit completion event (#14831 ) There is no callback associated with autocommit, so I do not think we need this event. This closes KAFKA-15865. Reviewers: Bruno Cadonna <cadonna@apache.org>	2023-11-29 19:02:08 +01:00
Okada Haruki	d71d0639d9	KAFKA-15046: Get rid of unnecessary fsyncs inside UnifiedLog.lock to stabilize performance (#14242 ) While any blocking operation under holding the UnifiedLog.lock could lead to serious performance (even availability) issues, currently there are several paths that calls fsync(2) inside the lock In the meantime the lock is held, all subsequent produces against the partition may block This easily causes all request-handlers to be busy on bad disk performance Even worse, when a disk experiences tens of seconds of glitch (it's not rare in spinning drives), it makes the broker to unable to process any requests with unfenced from the cluster (i.e. "zombie" like status) This PR gets rid of 4 cases of essentially-unnecessary fsync(2) calls performed under the lock: (1) ProducerStateManager.takeSnapshot at UnifiedLog.roll I moved fsync(2) call to the scheduler thread as part of existing "flush-log" job (before incrementing recovery point) Since it's still ensured that the snapshot is flushed before incrementing recovery point, this change shouldn't cause any problem (2) ProducerStateManager.removeAndMarkSnapshotForDeletion as part of log segment deletion This method calls Utils.atomicMoveWithFallback with needFlushParentDir = true internally, which calls fsync. I changed it to call Utils.atomicMoveWithFallback with needFlushParentDir = false (which is consistent behavior with index files deletion. index files deletion also doesn't flush parent dir) This change shouldn't cause problems neither. (3) LeaderEpochFileCache.truncateFromStart when incrementing log-start-offset This path is called from deleteRecords on request-handler threads. Here, we don't need fsync(2) either actually. On unclean shutdown, few leader epochs might be remained in the file but it will be handled by LogLoader on start-up so not a problem (4) LeaderEpochFileCache.truncateFromEnd as part of log truncation Likewise, we don't need fsync(2) here, since any epochs which are untruncated on unclean shutdown will be handled on log loading procedure Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Justine Olshan <jolshan@confluent.io>, Jun Rao <junrao@gmail.com>	2023-11-29 09:43:44 -08:00
Apoorv Mittal	f1819f4480	KAFKA-15778 & KAFKA-15779: Implement metrics manager (KIP-714) (#14699 ) The PR provide implementation for client metrics manager along with other classes. Manager is responsible to support 3 operations: UpdateSubscription - From kafka-configs.sh and reload from metadata cache. Process Get Telemetry Request - From KafkaApis.scala Process Push Telemetry Request - From KafkaApis.scala Manager maintains an in-memory cache to keep track of client instances against their instance id. Reviewers: Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>	2023-11-29 09:20:07 -08:00
David Jacot	5ae0b49839	KAFKA-14505; [1/N] Add support for transactional writes to CoordinatorRuntime (#14844 ) This patch adds support for transactional writes to the CoordinatorRuntime framework. This mainly consists in adding CoordinatorRuntime#scheduleTransactionalWriteOperation and in adding the producerId and producerEpoch to various interfaces. The patch also extends the CoordinatorLoaderImpl and the CoordinatorPartitionWriter accordingly. Reviewers: Justine Olshan <jolshan@confluent.io>	2023-11-29 08:54:23 -08:00
Josep Prat	68f4c7e22e	Update NOTICE-binary with latest additions (#14865 ) Signed-off-by: Josep Prat <josep.prat@aiven.io> Reviewers: Mickael Maison <mickael.maison@gmail.com>	2023-11-29 11:20:21 +01:00
Philip Nee	7999fd35d7	KAFKA-15887: Ensure FindCoordinatorRequest is sent before closing (#14842 ) A few bugs was created from the previous issues. These are: * During testing or some edge cases, the coordinator request manager might hold on to an inflight request forever. Therefore, when invoking coordinatorRequestManager.poll(), nothing would return. Here we explicitly create a FindCoordinatorRequest regardless of the current request state because we want to actively search for a coordinator * ensureCoordinatorReady() might be stuck in an infinite loop forever if the client fail to do so. Even the consumer would be able to shutdown eventually, this is undesirable. * The current asyncConsumerTest mixes background/network thread shutdown with the consumer shutdown. As the goal of the module is unit testing, we should try to test the shutdown procedure separately. Therefore, this PR adds a Mockito.doAnswer call to the applicationEventHandler.close(). Tests that are testing shutdown are calling shutdown() explicitly. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>	2023-11-29 11:16:43 +01:00
Mickael Maison	a8d5007bfa	MINOR: Update LICENSE-binary for 3.7.0 (#14833 ) Reviewers: Josep Prat <josep.prat@aiven.io>	2023-11-29 11:00:22 +01:00
Proven Provenzano	14571054aa	KAFKA-15904: Only add directory.id to meta.properties when migrating or in kraft mode Only add directory.id to meta.properties when migrating to kraft mode, or already in kraft mode. This prevents incompatibilities with older Kafka releases, which checked that each directory in a JBOD ensemble had the same meta.properties values. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-11-28 23:14:10 -08:00
Apoorv Mittal	009b57d870	KAFKA-15618: Kafka metrics collector and supporting classes (KIP-714) (#14620 ) The PR outlines classes to collect metrics for client by KafkaMetricsCollector implementation. The MetricsCollector defines mechanism to collect client metrics in sum and gauge metrics format. This requires to define cumulative and delta telemetry metrics while collecting raw metrics. Singl point metric class helps creating OTLP format Metric object wrapped over Single point metric class itself. Reviewers: Andrew Schofield <aschofield@confluent.io>, Xavier Léauté <xavier@confluent.io>, Philip Nee <pnee@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2023-11-28 22:07:22 -08:00
Hao Li	10555ec6de	KAFKA-15022: Only relax edge when path exist (#14198 ) If there is no path from u to v, we should not represent it at Integer.MAX_VALUE but null instead. Reviewers: Matthias J. Sax <matthias@confluent.io>	2023-11-28 20:44:12 -08:00
Kamal Chandraprakash	20b0bf063b	MINOR: Fix the flaky TBRLMM `testInternalTopicExists` test (#14840 ) The internal topic creation is asynchronous so the test gets flaky. To fix the test flakiness and in this test I want to assert that doesTopicExist should return true when a topic exists, so created a dummy internal topic. Reviewers: Luke Chen <showuon@gmail.com>, Jun Rao <jun@confluent.io>, Satish Duggana <satishd@apache.org>	2023-11-29 10:50:22 +08:00
Colin Patrick McCabe	4874bf818a	KAFKA-15311: Fix docs about reverting to ZooKeeper mode during KRaft migration (#14160 ) - Remove the outdated statement that delegation tokens aren't supported by KRaft. - Add an invitation to report migration bugs on JIRA. - Define terminology such as "zk migration phases". - Mention MV can't be changed during migration. - Explain how to revert to ZK mode. Reviewers: Ron Dagostino <rndgstn@gmail.com>, David Arthur <mumrah@gmail.com>	2023-11-28 14:03:59 -08:00
Andrew Schofield	161b94d196	KAFKA-15544: Enable integration tests for new consumer (#14758 ) This commit parameterizes the consumer integration tests so they can be run against the existing "generic" group protocol and the new "consumer" group protocol introduced in KIP-848. The KIP-848 client code is under construction so some of the tests do not run on both variants to start with, but the idea is that the tests can be enabled as the gaps in functionality are closed. Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Kirk True <ktrue@confluent.io>	2023-11-28 21:26:59 +01:00
Lucas Brutschy	f3e776fd34	MINOR: time-out hanging ZooKeeperClientTest (#14855 ) As described in KAFKA-9470, testBlockOnRequestCompletionFromStateChangeHandler will block for hours occasionally. If it passes, it takes 0.5 seconds, so a minute timeout should be safe. This is not a fix for KAFKA-9470, it's just aiming to make the CI more stable. Reviewers: David Jacot <djacot@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2023-11-28 12:04:53 -08:00
vamossagar12	bb1c4465c9	KAFKA-14516: [1/N] Static Member leave, join, re-join request using ConsumerGroupHeartbeats (#14432 ) This patch add the support for static membership to the new consumer group protocol. With a static member can join, re-join, temporarily leave and leave. When a member leaves with the expectation to rejoin, it must rejoin within the session timeout. It is kicks out from the consumer group otherwise. Reviewers: David Jacot <djacot@confluent.io>	2023-11-28 10:08:16 -08:00
Apoorv Mittal	38f2faf83f	KAFKA-15681: Add support of client-metrics in kafka-configs.sh (KIP-714) (#14632 ) The PR adds support of alter/describe configs for client-metrics as defined in KIP-714 Reviewers: Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>	2023-11-28 09:24:25 -08:00
Calvin Liu	db626a4804	KAFKA-15582 Unset the previous broker epoch if version < 2 (#14784 ) When using older versions of the broker registration RPC, make sure that the new PreviousBrokerEpoch field is set to the default value when building the request object. Reviewers: David Arthur <mumrah@gmail.com>	2023-11-28 10:36:59 -05:00
Mickael Maison	3c0840d28e	MINOR: Fix typo in 3.2.0 upgrade notes (#14851 ) Reviewers: Josep Prat <josep.prat@aiven.io>	2023-11-28 11:32:46 +01:00
Hao Li	bbd75b80ce	KAFKA-15022: Detect negative cycle from one source (#14696 ) Introduce a dummy node connected to every other node and run Bellman-ford from the dummy node once instead of from every node in the graph. Reviewers: Qichao Chu (@ex172000), Matthias J. Sax <matthias@confluent.io>	2023-11-28 00:29:00 -08:00

1 2 3 4 5 ...

11999 Commits All Branches Search

11999 Commits

All Branches