kafka

Commit Graph

Author	SHA1	Message	Date
Kamal Chandraprakash	f3dbd7ed08	KAFKA-16904: Metric to measure the latency of remote read requests (#16209 ) Reviewers: Satish Duggana <satishd@apache.org>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>	2024-06-11 21:07:12 +05:30
ShivsundarR	68070c94a6	KAFKA-16724: Added support for fractional throughput and monotonic payload in kafka-producer-perf-test.sh Added support for fractional throughput and monotonic payload in kafka-producer-perf-test.sh. https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka#KIP932:QueuesforKafka-kafka-producer-perf-test.sh Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>	2024-06-11 11:19:31 +05:30
gongxuanzhang	816209d187	KAFKA-10787 Apply spotless to transaction-coordinator and server-common (#16172 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-06-09 05:36:17 +08:00
Okada Haruki	3835515fea	KAFKA-16541 Fix potential leader-epoch checkpoint file corruption (#15993 ) A patch for KAFKA-15046 got rid of fsync on LeaderEpochFileCache#truncateFromStart/End for performance reason, but it turned out this could cause corrupted leader-epoch checkpoint file on ungraceful OS shutdown, i.e. OS shuts down in the middle when kernel is writing dirty pages back to the device. To address this problem, this PR makes below changes: (1) Revert LeaderEpochCheckpoint#write to always fsync (2) truncateFromStart/End now call LeaderEpochCheckpoint#write asynchronously on scheduler thread (3) UnifiedLog#maybeCreateLeaderEpochCache now loads epoch entries from checkpoint file only when current cache is absent Reviewers: Jun Rao <junrao@gmail.com>	2024-06-06 15:10:13 +09:00
David Jacot	53d592e369	MINOR: Fix type in MetadataVersion.IBP_4_0_IV0 (#16181 ) This patch fixes a typo in MetadataVersion.IBP_4_0_IV0. It should be 0 not O. Reviewers: Justine Olshan <jolshan@confluent.io>, Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-06-03 20:48:04 -07:00
David Jacot	ba61ff0cd9	KAFKA-16860; [1/2] Introduce group.version feature flag (#16120 ) This patch introduces the `group.version` feature flag with one version: 1) Version 1 enables the new consumer group rebalance protocol (KIP-848). Reviewers: Justine Olshan <jolshan@confluent.io>	2024-05-31 12:48:55 -07:00
Mickael Maison	b6d0fb055d	MINOR: Refactor DynamicConfig (#16133 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-06-01 01:09:46 +08:00
Justine Olshan	7c1bb1585f	KAFKA-16308 [2/N]: Allow unstable feature versions and rename unstable metadata config (#16130 ) As per KIP-1022, we will rename the unstable metadata versions enabled config to support all feature versions. Features is also updated to return latest production and latest testing versions of each feature. A feature is production ready when the corresponding metadata version (bootstrapMetadataVersion) is production ready. Adds tests for the feature usage of the unstableFeatureVersionsEnabled config Reviewers: David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>	2024-05-30 14:52:50 -07:00
Justine Olshan	5e3df22095	KAFKA-16308 [1/N]: Create FeatureVersion interface and add `--feature` flag and handling to StorageTool (#15685 ) As part of KIP-1022, I have created an interface for all the new features to be used when parsing the command line arguments, doing validations, getting default versions, etc. I've also added the --feature flag to the storage tool to show how it will be used. Created a TestFeatureVersion to show an implementation of the interface (besides MetadataVersion which is unique) and added tests using this new test feature. I will add the unstable config and tests in a followup. Reviewers: David Mao <dmao@confluent.io>, David Jacot <djacot@confluent.io>, Artem Livshits <alivshits@confluent.io>, Jun Rao <junrao@apache.org>	2024-05-29 16:36:06 -07:00
Viktor Somogyi-Vass	5a4898450d	KAFKA-15649: Handle directory failure timeout (#15697 ) A broker that is unable to communicate with the controller will shut down after the configurable log.dir.failure.timeout.ms. The implementation adds a new event to the Kafka EventQueue. This event is deferred by the configured timeout and will execute the shutdown if the heartbeat communication containing the failed log dir is still pending with the controller. Reviewers: Igor Soarez <soarez@apple.com>	2024-05-23 16:36:39 +01:00
Mickael Maison	ab0cc72499	MINOR: Move parseCsvList to server-common (#16029 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-05-23 16:01:45 +02:00
Mickael Maison	affe8da54c	KAFKA-7632: Support Compression Levels (KIP-390) (#15516 ) Reviewers: Jun Rao <jun@confluent.io>, Luke Chen <showuon@gmail.com> Co-authored-by: Lee Dongjin <dongjin@apache.org>	2024-05-21 17:58:49 +02:00
Gaurav Narula	412b05df00	KAFKA-16789 Fix thread leak detection for event handler threads (#15984 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-05-19 18:21:56 +08:00
Jeff Kim	8a9dd2beda	KAFKA-16663; Cancel write timeout TimerTask on successful event completion (#15902 ) Write events create and add a TimerTask to schedule the timeout operation. The issue is that we pile up the number of timer tasks which are essentially no-ops if replication was successful. They stay in memory for 15 seconds (default write timeout) and as the rate of write increases, the impact on memory usage increases. Instead, cancel the corresponding write timeout task when the write event is committed to the log. This also applies to complete transaction events. Reviewers: David Jacot <djacot@confluent.io>	2024-05-13 00:18:32 -07:00
Gaurav Narula	510431a732	KAFKA-16688: Use helper method to shutdown ExecutorService (#15886 ) We observe some thread leaks in CI which point to the executor service thread. This change tries to shutdown the executor service using the helper method in `ThreadUtils`. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Igor Soarez <soarez@apple.com>	2024-05-10 10:54:31 +01:00
PoAn Yang	4825c89d14	KAFKA-16588 broker shutdown hangs when log.segment.delete.delay.ms is zero (#15773 ) Instead of entering pending forever, this PR invoke next schedule after 1ms. However, the side effect is busy-waiting. Hence, This PR also update the docs to remind users about that - the issue about smaller log.segment.delete.delay.ms Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-05-01 17:11:20 +08:00
Omnia Ibrahim	cfe5ab5cf2	KAFKA-15853 Move quota configs into server-common package (#15774 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-04-24 13:05:18 +08:00
Omnia Ibrahim	5e96e5c898	KAFKA-15853 Refactor KafkaConfig to use PasswordEncoderConfigs (#15770 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-04-22 00:47:57 +08:00
Kuan-Po (Cooper) Tseng	ced79ee12f	KAFKA-16552 Create an internal config to control InitialTaskDelayMs in LogManager to speed up tests (#15719 ) Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-04-20 20:34:02 +08:00
Omnia Ibrahim	ecb2dd4cdc	KAFKA-15853 Move KafkaConfig log properties and docs out of core (#15569 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Nikolay <nizhikov@apache.org>, Federico Valeri <fvaleri@redhat.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-04-20 04:14:23 +08:00
Mickael Maison	2b9729ba77	MINOR: Various cleanups in server and server-common (#15710 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-04-16 15:20:49 +08:00
Igor Soarez	15c4ade06a	MINOR: Improve logging in AssignmentsManager (#15522 ) At the moment it can be a bit difficult to troubleshoot issues related to the AssignmentsManager. Mainly because: Topic partitions are logged with topic ID and partition index but without the topic name. Directory IDs are logged without the directory path. Assignment reasons aren't tracked. This patch addresses the three issues. Reviewers: Luke Chen <showuon@gmail.com>	2024-04-12 14:13:40 +08:00
Kuan-Po (Cooper) Tseng	169ed60fe1	KAFKA-16477 Detect thread leaked client-metrics-reaper in tests (#15668 ) After profiling the kafka tests, tons of client-metrics-reaper thread not cleanup after BrokerServer shutdown. The thread client-metrics-reaper comes from ClientMetricsManager#expirationTimer, and BrokerServer#shudown doesn't close ClientMetricsManager which let the thread still runs in background. Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-04-09 05:07:33 +08:00
Erik van Oosten	8e61f04228	MINOR: Fix usage of none in javadoc (#15674 ) - Use `Empty` instead of 'none' when referring to `Optional` values. - `Headers.lastHeader` returns `null` when no header is found. - Fix minor spelling mistakes. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-04-08 08:43:05 +08:00
Greg Harris	bf5e04e416	KAFKA-16349: Prevent race conditions in Exit class from stopping test JVM (#15484 ) Signed-off-by: Greg Harris <greg.harris@aiven.io> Reviewers: Chris Egerton <chrise@aiven.io>	2024-03-28 20:07:42 -07:00
Colin Patrick McCabe	8d914b543d	KAFKA-16411: Correctly migrate default client quota entities (#15584 ) KAFKA-16222 fixed a bug whereby we didn't undo the name sanitization used on client quota entity names stored in ZooKeeper. However, it incorrectly claimed to fix the handling of default client quota entities. It also failed to correctly re-sanitize when syncronizing the data back to ZooKeeper. This PR fixes ZkConfigMigrationClient to do the sanitization correctly on both the read and write paths. We do de-sanitization before invoking the visitors, since after all it does not make sense to do the same de-sanitization step in each and every visitor. Additionally, this PR fixes a bug causing default entities to be converted incorrectly. For example, ClientQuotaEntity(user -> null) is stored under the /config/users/<default> znode in ZooKeeper. In KRaft it appears as a ClientQuotaRecord with EntityData(entityType=users, entityName=null). Prior to this PR, this was being converted to a ClientQuotaRecord with EntityData(entityType=users, entityName=""). That represents a quota on the user whose name is the empty string (yes, we allow users to name themselves with the empty string, sadly.) The confusion appears to have arisen because for TOPIC and BROKER configurations, the default ConfigResource is indeed the one named with the empty (not null) string. For example, the default topic configuration resource is ConfigResource(name="", type=TOPIC). However, things are different for client quotas. Default client quota entities in KRaft (and also in AdminClient) are represented by maps with null values. For example, the default User entity is represented by Map("user" -> null). In retrospect, using a map with null values was a poor choice; a Map<String, Optional<String>> would have made more sense. However, this is the way the API currently is and we have to convert correctly. There was an additional level of confusion present in KAFKA-16222 where someone thought that using the ZooKeeper placeholder string "<default>" in the AdminClient API would yield a default client quota entity. Thise seems to have been suggested by the ConfigEntityName class that was created recently. In fact, <default> is not part of any public API in Kafka. Accordingly, this PR also renames ConfigEntityName.DEFAULT to ZooKeeperInternals.DEFAULT_STRING, to make it clear that the string <default> is just a detail of the ZooKeeper implementation. It is not used in the Kafka API to indicate defaults. Hopefully this will avoid confusion in the future. Finally, the PR also creates KRaftClusterTest.testDefaultClientQuotas to get extra test coverage of setting default client quotas. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Igor Soarez <soarez@apple.com>	2024-03-26 16:49:38 -07:00
PoAn Yang	6f8d4fe26b	KAFKA-15949: Unify metadata.version format in log and error message (#15505 ) There were different words for metadata.version like metadata version or metadataVersion. Unify format as metadata.version. Reviewers: Luke Chen <showuon@gmail.com>	2024-03-26 20:09:29 +08:00
Igor Soarez	f8ce7feebc	KAFKA-15950: Serialize heartbeat requests (#14903 ) In between HeartbeatRequest being sent and the response being handled, i.e. while a HeartbeatRequest is in flight, an extra request may be immediately scheduled if propagateDirectoryFailure, setReadyToUnfence, or beginControlledShutdown is called. To prevent the extra request, we can avoid the extra requests by checking whether a request is in flight, and delay the scheduling if necessary. Some of the tests in BrokerLifecycleManagerTest are also improved to remove race conditions and reduce flakiness. Reviewers: Colin McCabe <colin@cmccabe.xyz>, Ron Dagostino <rdagostino@confluent.io>, Jun Rao <junrao@gmail.com>	2024-03-25 10:31:19 -07:00
Kuan-Po (Cooper) Tseng	bf9a27fefd	KAFKA-16388 add production-ready test of 3.3 - 3.6 release to MetadataVersionTest.testFromVersionString (#15563 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-03-24 13:09:21 +08:00
Kuan-Po (Cooper) Tseng	12a1d85362	KAFKA-12187 replace assertTrue(obj instanceof X) with assertInstanceOf (#15512 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-03-20 10:36:25 +08:00
Chris Holland	e878654e95	MINOR: Cleanup BoundedList to Make Constructors More Safe (#15507 ) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-03-15 21:18:24 +08:00
Kamal Chandraprakash	e4c53d093e	KAFKA-15206: Fix the flaky RemoteIndexCacheTest.testClose test (#15523 ) It is possible that due to resource constraint, ShutdownableThread#run might be called later than the ShutdownableThread#close method. Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>	2024-03-15 10:33:40 +08:00
David Jacot	f5c4d522fd	MINOR: Add read/write all operation (#15462 ) There are a few cases in the group coordinator service where we want to read from or write to each of the known coordinators (each of __consumer_offsets partitions). The current implementation needs to get the list of the known coordinators then schedules the operation and finally aggregate the results. This patch is an attempt to streamline this by adding multi read/write to the runtime. Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>	2024-03-07 07:51:04 -08:00
Nikolay	eea369af94	KAFKA-14588 Log cleaner configuration move to CleanerConfig (#15387 ) In order to move ConfigCommand to tools we must move all it's dependencies which includes KafkaConfig and other core classes to java. This PR moves log cleaner configuration to CleanerConfig class of storage module. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>	2024-03-05 18:11:56 +08:00
David Jacot	0472db2cd3	MINOR: Uniformize error handling/transformation in GroupCoordinatorService (#15196 ) This patch uniformizes the error handling in the GroupCoordinatorService with the aim to reuse the same error translation for all operations. It also ensures that exceptions are unwrapped if needed. Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>	2024-01-30 23:23:58 -08:00
Mickael Maison	3e9ef70853	KAFKA-15853: Move PasswordEncoder to server-common (#15246 ) Reviewers: Luke Chen <showuon@gmail.com>, Omnia Ibrahim <o.g.h.ibrahim@gmail.com>	2024-01-30 19:08:50 +01:00
Gaurav Narula	4c6f975ab3	KAFKA-16162: resend broker registration on metadata update to IBP 3.7-IV2 We update metadata update handler to resend broker registration when metadata has been updated to >= 3.7IV2 so that the controller becomes aware of the log directories in the broker. We also update DirectoryId::isOnline to return true on an empty list of log directories while the controller awaits broker registration. Co-authored-by: Proven Provenzano <pprovenzano@confluent.io> Reviewers: Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>, Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2024-01-30 10:00:07 -08:00
Apoorv Mittal	208f9e7765	KAFKA-15813: Evict client instances from cache (KIP-714) (#15234 ) KIP-714 requires client instance cache in broker which should also have a time-based eviction policy where client instances which are not actively sending metrics should be evicted. KIP mentions This client instance specific state is maintained in broker memory up to MAX(601000, PushIntervalMs 3) milliseconds. Reviewers: Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>	2024-01-23 15:06:02 -08:00
David Arthur	7bf7fd99a5	KAFKA-16078: Be more consistent about getting the latest MetadataVersion This PR creates MetadataVersion.latestTesting to represent the highest metadata version (which may be unstable) and MetadataVersion.latestProduction to represent the latest version that should be used in production. It fixes a few cases where the broker was advertising that it supported the testing versions even when unstable metadata versions had not been configured. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>	2024-01-17 14:59:22 -08:00
Nikolay	da2aa68269	KAFKA-14588: Move ConfigEntityName to server-common (#14868 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>	2024-01-08 12:41:43 +01:00
Nikolay	45bd19f2ef	KAFKA-14588: Move ConfigType to server-common (#14867 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2023-12-22 18:35:27 +01:00
Proven Provenzano	b0e99b5593	KAFKA-15922: Bump MetadataVersion to support JBOD with KRaft (#14984 ) Moves ELR from MetadataVersion IBP_3_7_IV3 into the new IBP_3_8_IV0 because the ELR feature was not completed before 3.7 reached feature freeze. Leaves IBP_3_7_IV3 empty -- it is a no-op and is not reused for anything. Adds the new MetadataVersion IBP_3_7_IV4 for the FETCH request changes from KIP-951, which were mistakenly never associated with a MetadataVersion. Updates the LATEST_PRODUCTION MetadataVersion to IBP_3_7_IV4 to declare both KRaft JBOD and the KIP-951 changes ready for production use. Reviewers: Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>, Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@apache.org>, Justine Olshan <jolshan@confluent.io>	2023-12-14 10:08:54 -05:00
Bruno Cadonna	87e3cbe4da	MINOR: Add junit properties to display parameterized test names (#14983 ) In many parameterized tests, the display name is broken. Example - testMetadataFetch appears as [1] true, [2] false link This is because the constant in @ParameterizedTest String DEFAULT_DISPLAY_NAME = "[{index}] {argumentsWithNames}"; This PR adds a new junit-platform.properties which overrides to add a {displayName} which shows the the display name of the method For existing tests which override the name, should work as is. The precedence rules are explained name attribute in @ParameterizedTest, if present value of the junit.jupiter.params.displayname.default configuration parameter, if present DEFAULT_DISPLAY_NAME constant defined in @ParameterizedTest Source: https://junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests-display-names Sample test run output Before: [1] true link After: testMetadataExpiry(boolean).false link This commit is an extension of `bdf6d46b41` which needed to reverted due to introduces test failures. Reviewers: David Jacot <djacot@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>	2023-12-13 09:42:18 +01:00
David Jacot	b96ded9859	Revert "MINOR: Add junit properties to display parameterized test names (#14687 )" (#14961 ) This reverts commit `bdf6d46b41`. We found out that this commit introduced flakiness in Streams' tests. We will revise it. Reviewers: Bruno Cadonna <cadonna@apache.org>	2023-12-07 23:20:03 -08:00
Omnia Ibrahim	ec92410e59	KAFKA-15363: Broker log directory failure changes (#14790 ) Part of JBOD KIP-858, https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft Reviewers: Igor Soarez <i@soarez.me>, Colin P. McCabe <cmccabe@apache.org>, Ron Dagostino <rdagostino@confluent.io>	2023-12-07 20:44:56 -05:00
Igor Soarez	c515bf51f8	KAFKA-15426: Process and persist directory assignments Handle AssignReplicasToDirs requests, persist metadata changes with new directory assignments and possible leader elections. Reviewers: Proven Provenzano <pprovenzano@confluent.io>, Ron Dagostino <rndgstn@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2023-12-07 11:44:45 -08:00
Alok Thatikunta	bdf6d46b41	MINOR: Add junit properties to display parameterized test names (#14687 ) In many parameterized tests, the display name is broken. Example - `testMetadataFetch` appears as `[1] true`, `[2] false` [link](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14607/9/testReport/junit/org.apache.kafka.clients.producer/KafkaProducerTest/) This is because the constant in `@ParameterizedTest` ```java String DEFAULT_DISPLAY_NAME = "[{index}] {argumentsWithNames}"; ``` This PR adds a new `junit-platform.properties` which overrides to add a `{displayName}` which shows the `the display name of the method` For existing tests which override the name, should work as is. The precedence rules are explained > 1. `name` attribute in `@ParameterizedTest`, if present > 2. value of the `junit.jupiter.params.displayname.default` configuration parameter, if present > 3. `DEFAULT_DISPLAY_NAME` constant defined in `@ParameterizedTest` Source: https://junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests-display-names Sample test run output Before: `[1] true` [link](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14607/9/testReport/junit/org.apache.kafka.clients.producer/KafkaProducerTest/) After: `testMetadataExpiry(boolean).false` [link](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14687/1/testReport/junit/org.apache.kafka.clients.producer/KafkaProducerTest/) Reviewers: Divij Vaidya <diviv@amazon.com>, Bruno Cadonna <cadonna@apache.org>, David Jacot <djacot@confluent.io>	2023-12-06 08:42:45 -08:00
Igor Soarez	6b87c85291	KAFKA-15886: Always specify directories for new partition registrations When creating partition registrations directories must always be defined. If creating a partition from a PartitionRecord or PartitionChangeRecord from an older version that does not support directory assignments, then DirectoryId.MIGRATING is assumed. If creating a new partition, or triggering a change in assignment, DirectoryId.UNASSIGNED should be specified, unless the target broker has a single online directory registered, in which case the replica should be assigned directly to that single directory. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-11-30 14:10:47 -08:00
Colin Patrick McCabe	a94bc8d6d5	KAFKA-15922: Add a MetadataVersion for JBOD (#14860 ) Assign MetadataVersion.IBP_3_7_IV2 to JBOD. Move KIP-966 support to MetadataVersion.IBP_3_7_IV3. Create MetadataVersion.LATEST_PRODUCTION as the latest metadata version that can be used when formatting a new cluster, or upgrading a cluster using kafka-features.sh. This will allow us to clearly distinguish between stable and unstable metadata versions for the first time. Reviewers: Igor Soarez <soarez@apple.com>, Ron Dagostino <rndgstn@gmail.com>, Calvin Liu <caliu@confluent.io>, Proven Provenzano <pprovenzano@confluent.io>	2023-11-30 10:35:13 -08:00
Colin Patrick McCabe	bd18551b32	MINOR: DirectoryId.MIGRATING should be all zeros (#14858 ) DirectoryId.MIGRATING should be all zeros. All zeros is the default Uuid value in KPRC, and MIGRATING is the default directory ID value. Reviewers: Ron Dagostino <rdagostino@confluent.io>	2023-11-29 13:12:33 -08:00
Okada Haruki	d71d0639d9	KAFKA-15046: Get rid of unnecessary fsyncs inside UnifiedLog.lock to stabilize performance (#14242 ) While any blocking operation under holding the UnifiedLog.lock could lead to serious performance (even availability) issues, currently there are several paths that calls fsync(2) inside the lock In the meantime the lock is held, all subsequent produces against the partition may block This easily causes all request-handlers to be busy on bad disk performance Even worse, when a disk experiences tens of seconds of glitch (it's not rare in spinning drives), it makes the broker to unable to process any requests with unfenced from the cluster (i.e. "zombie" like status) This PR gets rid of 4 cases of essentially-unnecessary fsync(2) calls performed under the lock: (1) ProducerStateManager.takeSnapshot at UnifiedLog.roll I moved fsync(2) call to the scheduler thread as part of existing "flush-log" job (before incrementing recovery point) Since it's still ensured that the snapshot is flushed before incrementing recovery point, this change shouldn't cause any problem (2) ProducerStateManager.removeAndMarkSnapshotForDeletion as part of log segment deletion This method calls Utils.atomicMoveWithFallback with needFlushParentDir = true internally, which calls fsync. I changed it to call Utils.atomicMoveWithFallback with needFlushParentDir = false (which is consistent behavior with index files deletion. index files deletion also doesn't flush parent dir) This change shouldn't cause problems neither. (3) LeaderEpochFileCache.truncateFromStart when incrementing log-start-offset This path is called from deleteRecords on request-handler threads. Here, we don't need fsync(2) either actually. On unclean shutdown, few leader epochs might be remained in the file but it will be handled by LogLoader on start-up so not a problem (4) LeaderEpochFileCache.truncateFromEnd as part of log truncation Likewise, we don't need fsync(2) here, since any epochs which are untruncated on unclean shutdown will be handled on log loading procedure Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Justine Olshan <jolshan@confluent.io>, Jun Rao <junrao@gmail.com>	2023-11-29 09:43:44 -08:00
Igor Soarez	a03a71d7b5	KAFKA-15357: Aggregate and propagate assignments A new AssignmentsManager accumulates, batches, and sends KIP-858 assignment events to the Controller. Assignments are sent via AssignReplicasToDirs requests. Move QuorumTestHarness.formatDirectories into TestUtils so it can be used in other test contexts. Fix a bug in ControllerRegistration.java where the wrong version of the record was being generated in ControllerRegistration.toRecord. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>, Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>	2023-11-16 16:19:49 -08:00
Colin P. McCabe	e3dd60ef3c	HOTFIX: fix checkstyle	2023-11-02 11:35:44 -07:00
Colin P. McCabe	a672a19e80	MINOR: small optimization for DirectoryId.random DirectoryId.random doesn't need to instantiate the first 100 IDs to check if an ID is one of them. Reviewers: Ismael Juma <ismael@juma.me.uk>, Justine Olshan <jolshan@confluent.io>, Proven Provenzano <93720617+pprovenzano@users.noreply.github.com>	2023-11-02 11:29:11 -07:00
Igor Soarez	0390d5b1a2	KAFKA-15355: Message schema changes (#14290 ) Reviewers: Christo Lolov <lolovc@amazon.com>, Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>, Ron Dagostino <rdagostino@confluent.io>	2023-11-02 09:46:05 -04:00
Crispin Bernier	c8f687ac15	KAFKA-15661: KIP-951: protocol changes (#14627 ) Separating out the protocol changes from #14444 in an effort to more quickly unblock the client side PR. This is the protocol changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are included in the response. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump is required. https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client Reviewers: Justine Olshan <jolshan@confluent.io>, Mayank Shekhar Narula <mayanks.narula@gmail.com>	2023-10-31 17:16:11 -07:00
Igor Soarez	9dbee599f1	MINOR: Rename log dir UUIDs (#14517 ) After a late discussion in the voting thread for KIP-858 we decided to improve the names for the designated reserved log directory UUID values. Reviewers: Christo Lolov <lolovc@amazon.com>, Ismael Juma <ismael@juma.me.uk>, Ziming Deng <dengziming1993@gmail.com>.	2023-10-30 19:10:57 +08:00
Josep Prat	eed5e68880	MINOR: Server-Commons cleanup (#14572 ) MINOR: Server-Commons cleanup Fixes Javadoc and minor issues in the Java files of Server-Commons modules. Javadoc is now formatted as intended by the author of the doc itself. Signed-off-by: Josep Prat <josep.prat@aiven.io> Reviewers: Mickael Maison <mickael.maison@gmail.com>	2023-10-20 21:04:04 +02:00
Calvin Liu	af747fbfed	KAFKA-15581: Introduce ELR (#14312 ) This patch introduces preliminary changes for Eligible Leader Replicas (KIP-966) * New MetadataVersion 16 (3.7-IV1) * New record versions for PartitionRecord and PartitionChangeRecord * New tagged fields on PartitionRecord and PartitionChangeRecord * New static config "eligible.leader.replicas.enable" to gate the whole feature Reviewers: Artem Livshits <alivshits@confluent.io>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2023-10-19 14:05:15 -04:00
Ismael Juma	4cf86c5d2f	KAFKA-15492: Upgrade and enable spotbugs when building with Java 21 (#14533 ) Spotbugs was temporarily disabled as part of KAFKA-15485 to support Kafka build with JDK 21. This PR upgrades the spotbugs version to 4.8.0 which adds support for JDK 21 and enables it's usage on build again. Reviewers: Divij Vaidya <diviv@amazon.com>	2023-10-12 14:09:10 +02:00
Ritika Reddy	bcfc9543d1	MINOR: Move TopicIdPartition class to server-common (#14418 ) This patch moves the TopicIdPartition from the metadata module to the server-common module so it can be used by the group-coordinator module as well. Reviewers: Sagar Rao <sagarmeansocean@gmail.com>, David Jacot <djacot@confluent.io>	2023-09-28 13:55:44 -07:00
Colin Patrick McCabe	fcac880fd5	KAFKA-15466: Add KIP-919 support for some admin APIs (#14399 ) Add support for --bootstrap-controller in the following command-line tools: - kafka-cluster.sh - kafka-configs.sh - kafka-features.sh - kafka-metadata-quorum.sh To implement this, the following AdminClient APIs now support the new bootstrap.controllers configuration: - Admin.alterConfigs - Admin.describeCluster - Admin.describeConfigs - Admin.describeFeatures - Admin.describeMetadataQuorum - Admin.incrementalAlterConfigs - Admin.updateFeatures Command-line tool changes: - Add CommandLineUtils.initializeBootstrapProperties to handle parsing --bootstrap-controller in addition to --bootstrap-server. - Add --bootstrap-controller to ConfigCommand.scala, ClusterTool.java, FeatureCommand.java, and MetadataQuorumCommand.java. KafkaAdminClient changes: - Add the AdminBootstrapAddresses class to handle extracting bootstrap.servers or bootstrap.controllers from the config map for KafkaAdminClient. - In AdminMetadataManager, store the new usingBootstrapControllers boolean. Generalize authException to encompass the concept of fatal exceptions in general. (For example, the fatal exception where we talked to the wrong node type.) Treat MismatchedEndpointTypeException and UnsupportedEndpointTypeException as fatal exceptions. - Extend NodeProvider to include information about whether bootstrap.controllers is supported. - Modify the APIs described above to support bootstrap.controllers. Server-side changes: - Support DescribeConfigsRequest on kcontrollers. - Add KRaftMetadataCache to the kcontroller to simplify implemeting describeConfigs (and probably more APIs in the future). It's mainly a wrapper around MetadataImage, so there is essentially no extra resource consumption. - Split RuntimeLoggerManager out of ConfigAdminManager to handle the incrementalAlterConfigs support for BROKER_LOGGER. This is now supported on kcontrollers as well as brokers. - Fix bug in AuthHelper.computeDescribeClusterResponse that resulted in us always sending back BROKER as the endpoint type, even on the kcontroller. Miscellaneous: - Fix a few places in exceptions and log messages where we wrote "broker" instead of "node". For example, an exception in NodeApiVersions.java, and a log message in NetworkClient.java. - Fix the slf4j log prefix used by KafkaRequestHandler logging so that request handlers on a controller don't look like they're on a broker. - Make the FinalizedVersionRange constructor public for the sake of a junit test. - Add unit and integration tests for the above. Reviewers: David Arthur <mumrah@gmail.com>, Doguscan Namal <namal.doguscan@gmail.com>	2023-09-26 14:43:42 -07:00
Ismael Juma	98febb989a	KAFKA-15485: Fix "this-escape" compiler warnings introduced by JDK 21 (1/N) (#14427 ) This is one of the steps required for kafka to compile with Java 21. For each case, one of the following fixes were applied: 1. Suppress warning if fixing would potentially result in an incompatible change (for public classes) 2. Add final to one or more methods so that the escape is not possible 3. Replace method calls with direct field access. In addition, we also fix a couple of compiler warnings related to deprecated references in the `core` module. See the following for more details regarding the new lint warning: https://www.oracle.com/java/technologies/javase/21-relnote-issues.html#JDK-8015831 Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>, Chris Egerton <chrise@aiven.io>	2023-09-24 05:59:29 -07:00
Colin Patrick McCabe	41b695b6e3	KAFKA-15369: Implement KIP-919: Allow AC to Talk Directly with Controllers (#14306 ) Implement KIP-919: Allow AdminClient to Talk Directly with the KRaft Controller Quorum and add Controller Registration. This KIP adds a new version of DescribeClusterRequest which is supported by KRaft controllers. It also teaches AdminClient how to use this new DESCRIBE_CLUSTER request to talk directly with the controller quorum. This is all gated behind a new MetadataVersion, IBP_3_7_IV0. In order to share the DESCRIBE_CLUSTER logic between broker and controller, this PR factors it out into AuthHelper.computeDescribeClusterResponse. The KIP adds three new errors codes: MISMATCHED_ENDPOINT_TYPE, UNSUPPORTED_ENDPOINT_TYPE, and UNKNOWN_CONTROLLER_ID. The endpoint type errors can be returned from DescribeClusterRequest On the controller side, the controllers now try to register themselves with the current active controller, by sending a CONTROLLER_REGISTRATION request. This, in turn, is converted into a RegisterControllerRecord by the active controller. ClusterImage, ClusterDelta, and all other associated classes have been upgraded to propagate the new metadata. In the metadata shell, the cluster directory now contains both broker and controller subdirectories. QuorumFeatures previously had a reference to the ApiVersions structure used by the controller's NetworkClient. Because this PR removes that reference, QuorumFeatures now contains only immutable data. Specifically, it contains the current node ID, the locally supported features, and the list of quorum node IDs in the cluster. Reviewers: David Arthur <mumrah@gmail.com>, Ziming Deng <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>	2023-09-07 15:21:52 -07:00
Mehari Beyene	25b128de81	KAFKA-14991: KIP-937-Improve message timestamp validation (#14135 ) This implementation introduces two new configurations `log.message.timestamp.before.max.ms` and `log.message.timestamp.after.max.ms` and deprecates `log.message.timestamp.difference.max.ms`. The default value for all these three configs is maintained to be Long.MAX_VALUE for backward compatibility but with the newly added configurations we can have a finer control when validating message timestamps that are in the past and the future compared to the broker's timestamp. To maintain backward compatibility if the default value of `log.message.timestamp.before.max.ms` is not changed, we are assuming users are still using the deprecated config `log.message.timestamp.difference.max.ms` and validation is done using its value. This ensures that existing customers who have customized the value of `log.message.timestamp.difference.max.ms` will continue to see no change in behavior. Reviewers: Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>	2023-08-24 12:04:55 +02:00
Ron Dagostino	8394ddc0d2	MINOR: Move delegation token support to Metadata Version 3.6-IV2 (#14270 ) #14083 added support for delegation tokens in KRaft and attached that support to the existing MetadataVersion 3.6-IV1. This patch moves that support into a separate MetadataVersion 3.6-IV2. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-08-22 16:04:53 -07:00
David Arthur	418b8a6e59	KAFKA-14538 Metadata transactions in MetadataLoader (#14208 ) This PR contains three main changes: - Support for transactions in MetadataLoader - Abort in-progress transaction during controller failover - Utilize transactions for ZK to KRaft migration A new MetadataBatchLoader class is added to decouple the loading of record batches from the publishing of metadata in MetadataLoader. Since a transaction can span across multiple batches (or multiple transactions could exist within one batch), some buffering of metadata updates was needed before publishing out to the MetadataPublishers. MetadataBatchLoader accumulates changes into a MetadataDelta, and uses a callback to publish to the publishers when needed. One small oddity with this approach is that since we can "splitting" batches in some cases, the number of bytes returned in the LogDeltaManifest has new semantics. The number of bytes included in a batch is now only included in the last metadata update that is published as a result of a batch. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-08-21 16:02:14 -07:00
Proven Provenzano	c2759df067	KAFKA-15219: KRaft support for DelegationTokens (#14083 ) Reviewers: David Arthur <mumrah@gmail.com>, Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Viktor Somogyi <viktor.somogyi@cloudera.com>	2023-08-19 14:01:08 -04:00
Colin Patrick McCabe	adc16d0f31	KAFKA-14538: Implement KRaft metadata transactions in QuorumController Implement the QuorumController side of KRaft metadata transactions. As specified in KIP-868, this PR creates a new metadata version, IBP_3_6_IV1, which contains the three new records: AbortTransactionRecord, BeginTransactionRecord, EndTransactionRecord. In order to make offset management unit-testable, this PR moves it out of QuorumController.java and into OffsetControlManager.java. The general approach here is to track the "last stable offset," which is calculated by looking at the latest committed offset and the in-progress transaction (if any). When a transaction is aborted, we revert back to this last stable offset. We also revert back to it when the controller is transitioning from active to inactive. In a follow-up PR, we will add support for the transaction records in MetadataLoader. We will also add support for automatically aborting pending transactions after a controller failover. Reviewers: David Arthur <mumrah@gmail.com>	2023-08-14 16:58:56 -07:00
Nikolay	1fd58e30cf	KAFKA-14595: Move classes from ReassignPartitionsCommand to tools (#14172 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2023-08-11 14:52:14 +02:00
Federico Valeri	8de3e0436a	KAFKA-15239: Fix system tests using producer performance service (#14092 ) Reviewers: Greg Harris <greg.harris@aiven.io>	2023-08-10 14:23:43 -07:00
Nikolay	ddeb89f4a9	KAFKA-14595: Move AdminUtils to server-common (#14096 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2023-08-09 10:32:45 +02:00
Colin Patrick McCabe	9bc4a2d4d1	KAFKA-15271: Historicalterator can exposes elements that are too new (#14125 ) A HistoricalIterator at epoch N is supposed to only reveal elements at epoch N or earlier. However, due to a bug, we sometimes will reveal elements which are at a newer epoch than N. The bug does not affect elements that are in the latest epoch (aka topTier). It only affects elements that are newer than N, but which do not persist until the latest epoch. This PR fixes the bug and adds a unit test for this case. Reviewers: David Arthur <mumrah@gmail.com>	2023-08-08 16:36:59 -07:00
Kamal Chandraprakash	d89b26ff44	KAFKA-12969: Add broker level config synonyms for topic level tiered storage configs (#14114 ) KAFKA-12969: Add broker level config synonyms for topic level tiered storage configs. Topic -> Broker Synonym: local.retention.bytes -> log.local.retention.bytes local.retention.ms -> log.local.retention.ms We cannot add synonym for `remote.storage.enable` topic property as it depends on KIP-950 Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>	2023-08-03 13:56:00 +05:30
Federico Valeri	1bf73d89d0	KAFKA-15232: Move ToolsUtils to tools (#14066 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2023-07-21 20:27:44 +02:00
David Jacot	2528dd4116	KAFKA-14499: [2/N] Add OffsetCommit record & related (#14047 ) This patch does a few things: 1) It introduces the `OffsetAndMetadata` class which hold the committed offsets in the group coordinator. 2) It adds methods to deal with OffsetCommit records to `RecordHelpers`. 3) It adds `MetadataVersion#offsetCommitValueVersion` to get the version of the OffsetCommit value record that should be used. Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Arthur <mumrah@gmail.com>, Justine Olshan <jolshan@confluent.io>	2023-07-21 20:09:06 +02:00
Nikolay	4bba2c8a32	KAFKA-14591: Move DeleteRecordsCommand to tools (#13278 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>	2023-07-21 17:30:28 +02:00
Omnia G H Ibrahim	0c6b1a4e9a	KAFKA-14737: Move kafka.utils.json to server-common (#13585 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>	2023-07-18 11:02:40 +02:00
vamossagar12	fa5b493241	KAFKA-14647: Move TopicFilter to server-common/utils (#13158 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>	2023-07-18 10:38:56 +02:00
Justine Olshan	ea0bb00126	KAFKA-14884: Include check transaction is still ongoing right before append (take 2) (#13787 ) Introduced extra mapping to track verification state. When verifying, there is a race condition that the add partitions verification response returns that the partition is in the ongoing transaction, but an abort marker is written before we get to append. Therefore, we track any given transaction we are verifying with an object unique to that transaction. We check this unique state upon the first append to the log. After that, we can rely on currentTransactionFirstOffset. We remove the verification state on appending to the log with a transactional data record or marker. We will also clean up lingering verification state entries via the producer state entry expiration mechanism. We do not update the the timestamp on retrying a verification for a transaction, so each entry must be verified before producer.id.expiration.ms. There were a few other fixes: - Moved the transaction manager handling for failed batch into the future completed exceptionally block to avoid processing it twice (this caused issues in unit tests) - handle interrupted exceptions encountered when callback thread encountered them - change handling to throw error if we try to set verification state and leaderLogIfLocal is None. Reviewers: David Jacot <djacot@confluent.io>, Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>	2023-07-14 15:18:11 -07:00
David Jacot	bd1f02b2be	MINOR: Move MockTimer to server-common (#13954 ) This patch rewrites MockTimer in Java and moves it from core to server-common. This continues the work started in https://github.com/apache/kafka/pull/13820. Reviewers: Divij Vaidya <diviv@amazon.com>	2023-07-06 14:56:05 +02:00
David Arthur	fc7d912e8b	KAFKA-15109 Ensure the leader epoch bump occurs for older MetadataVersions (#13910 ) This fixes a regression introduced by the previous KAFKA-15109 commit (`d0457f7360` on trunk). Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@apache.org>	2023-06-27 11:49:20 -04:00
Jeff Kim	1dbcb7da9e	KAFKA-14694: RPCProducerIdManager should not wait on new block (#13267 ) RPCProducerIdManager initiates an async request to the controller to grab a block of producer IDs and then blocks waiting for a response from the controller. This is done in the request handler threads while holding a global lock. This means that if many producers are requesting producer IDs and the controller is slow to respond, many threads can get stuck waiting for the lock. This patch aims to: * resolve the deadlock scenario mentioned above by not waiting for a new block and returning an error immediately * remove synchronization usages in RpcProducerIdManager.generateProducerId() * handle errors returned from generateProducerId() so that KafkaApis does not log unexpected errors * confirm producers backoff before retrying * introduce backoff if manager fails to process AllocateProducerIdsResponse Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>	2023-06-22 10:19:39 -07:00
Divij Vaidya	88e784f7c6	KAFKA-15084: Remove lock contention from RemoteIndexCache (#13850 ) Use thread safe Caffeine to cache indexes fetched from RemoteTier locally. This PR removes a lock contention that led to higher fetch latencies as the IO threads spent time unnecessarily waiting on global cache lock while a single thread fetches the index from remote tier. See PR #13850 for details and rejected alternatives. Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>	2023-06-21 18:22:49 +02:00
minjian.cai	af678a563d	MINOR: fix typos for server common (#13887 ) Reviewers: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Divij Vaidya <diviv@amazon.com>	2023-06-20 22:56:01 +02:00
Dimitar Dimitrov	b100f1efac	KAFKA-15087 Move/rewrite InterBrokerSendThread to server-commons (#13856 ) The Java rewrite is kept relatively close to the Scala original to minimize potential newly introduced bugs and to make reviewing simpler. The following details might be of note: - The `Logging` trait moved to InterBrokerSendThread with the rewrite of ShutdownableThread has been similarly moved to any subclasses that currently use it. InterBrokerSendThread's own logging has been made to use ShutdownableThread's logger which mimics the prefix/log identifier that the trait provided. - The case RequestAndCompletionHandler class has been made a separate POJO class and the internal-use UnsentRequests class has been kept as a static nested class. - The relatively commonly used but internal (not part of the public API) clients classes that InterBrokerSendThread relies on have been allowlisted in the server-common import control. - The accompanying test class has also been moved and rewritten with one new test added and most of the pre-existing tests made stricter. Reviewers: David Jacot <djacot@confluent.io>	2023-06-20 16:50:46 +02:00
Colin P. McCabe	cd3c0ab1a3	KAFKA-15060: fix the ApiVersionManager interface This PR expands the scope of ApiVersionManager a bit to include returning the current MetadataVersion and features that are in effect. This is useful in general because that information needs to be returned in an ApiVersionsResponse. It also allows us to fix the ApiVersionManager interface so that all subclasses implement all methods of the interface. Having subclasses that don't implement some methods is dangerous because they could cause exceptions at runtime in unexpected scenarios. On the KRaft controller, we were previously performing a read operation in the QuorumController thread to get the current metadata version and features. With this PR, we now read a volatile variable maintained by a separate MetadataVersionContextPublisher object. This will improve performance and simplify the code. It should not change the guarantees we are providing; in both the old and new scenarios, we need to be robust against version skew scenarios during updates. Add a Features class which just has a 3-tuple of metadata version, features, and feature epoch. Remove MetadataCache.FinalizedFeaturesAndEpoch, since it just duplicates the Features class. (There are some additional feature-related classes that can be consolidated in in a follow-on PR.) Create a java class, EndpointReadyFutures, for managing the futures associated with individual authorizer endpoints. This avoids code duplication between ControllerServer and BrokerServer and makes this code unit-testable. Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>	2023-06-19 16:46:44 -07:00
Joobi S B	f4981790c4	KAFKA-15085: Make Timer.java implement AutoCloseable (#13872 ) Change Timer.java to implement AutoCloseable because automatic bug finders will flag a warning if an object of a class is marked as AutoCloseable but is not closed properly in the code. Reviewers: Divij Vaidya <diviv@amazon.com>	2023-06-19 15:50:30 +02:00
David Jacot	45a279ec70	MINOR: Move Timer/TimingWheel to server-common (#13820 ) This patch rewrite `Timer` and the related classes in Java and moves them to `server-common` module. It is basically a one to one rewrite of the Scala code. Note that `MockTimer` is not moved as part of this patch. It will be done separately. Reviewers: Divij Vaidya <diviv@amazon.com>	2023-06-14 18:21:30 +02:00
David Jacot	7eea2a3908	MINOR: Move MockTime to server-common (#13823 ) This patch rewrite `MockTime` in Java and moves it to `server-common` module. This is a prerequisite to move `MockTimer` later on to `server-common` as well. Reviewers: David Arthur <mumrah@gmail.com>	2023-06-09 08:54:25 +02:00
José Armando García Sancio	8ad0ed3e61	KAFKA-15021; Skip leader epoch bump on ISR shrink (#13765 ) When the KRaft controller removes a replica from the ISR because of the controlled shutdown there is no need for the leader epoch to be increased by the KRaft controller. This is accurate as long as the topic partition leader doesn't add the removed replica back to the ISR. This change also fixes a bug when computing the HWM. When computing the HWM, replicas that are not eligible to join the ISR but are caught up should not be included in the computation. Otherwise, the HWM will never increase for replica.lag.time.max.ms because the shutting down replica is not sending FETCH request. Without this additional fix PRODUCE requests would timeout if the request timeout is greater than replica.lag.time.max.ms. Because of the bug above the KRaft controller needs to check the MV to guarantee that all brokers support this bug fix before skipping the leader epoch bump. Reviewers: David Mao <47232755+splett2@users.noreply.github.com>, Divij Vaidya <diviv@amazon.com>, David Jacot <djacot@confluent.io>	2023-06-07 07:20:40 -07:00
Colin Patrick McCabe	b74204fa0a	KAFKA-14996: Handle overly large user operations on the kcontroller (#13742 ) Previously, if a user tried to perform an overly large batch operation on the KRaft controller (such as creating a million topics), we would create a very large number of records in memory. Our attempt to write these records to the Raft layer would fail, because there were too many to fit in an atomic batch. This failure, in turn, would trigger a controller failover. (Note: I am assuming here that no topic creation policy was in place that would prevent the creation of a million topics. I am also assuming that the user operation must be done atomically, which is true for all current user operations, since we have not implemented KIP-868 yet.) With this PR, we fail immediately when the number of records we have generated exceeds the threshold that we can apply. This failure does not generate a controller failover. We also now fail with a PolicyViolationException rather than an UnknownServerException. In order to implement this in a simple way, this PR adds the BoundedList class, which wraps any list and adds a maximum length. Attempts to grow the list beyond this length cause an exception to be thrown. Reviewers: David Arthur <mumrah@gmail.com>, Ismael Juma <ijuma@apache.org>, Divij Vaidya <diviv@amazon.com>	2023-05-26 13:16:17 -07:00
Jeff Kim	c98c1ed41c	KAFKA-14500; [3/N] add GroupMetadataKey/Value record helpers (#13704 ) This path enables the new group metadata manager to generate GroupMetadataKey/Value records. Reviewers: David Jacot <djacot@confluent.io>	2023-05-23 10:42:13 +02:00
Colin P. McCabe	63f9f23ec0	MINOR: improve QuorumController logging #13540 When creating the QuorumController, log whether ZK migration is enabled. When applying a feature level record which sets the metadata version, log the metadata version enum rather than the numeric feature level. Improve the logging when we replay snapshots in QuorumController. Log both the beginning and the end of replay. When TRACE is enabled, log every record that is replayed in QuorumController. Since some records may contain sensitive information, create RecordRedactor to assist in logging only what is safe to put in the log4j file. Add logging to ControllerPurgatory. Successful completions are logged at DEBUG; failures are logged at INFO, and additions are logged at TRACE. Remove SnapshotReason.java, SnapshotReasonTest.java, and QuorumController#generateSnapshotScheduled. They are deadcode now that snapshot generation moved to org.apache.kafka.image.publisher.SnapshotGenerator. Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>	2023-05-04 11:18:03 -07:00
Luke Chen	b620c03ccf	KAFKA-14946: fix NPE when merging the deltatable (#13653 ) Fix NPE while merging the deltatable. Because it's possible that hashTier is not null but deltatable is null (ex: removing data), we should have null check while merging for deltatable like other places did. Also added tests that will fail without this change. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-05-03 10:08:25 -07:00
Luke Chen	21af1918ea	MINOR: Add reason to exceptions in QuorumController (#13648 ) Saw this error message in log: ERROR [QuorumController id=1] writeNoOpRecord: unable to start processing because of RejectedExecutionException. Reason: null (org.apache.kafka.controller.QuorumController) The null reason is not helpful with only RejectedExecutionException. Adding the reason to it. Reviewers: David Arthur <mumrah@gmail.com>, Divij Vaidya <diviv@amazon.com>, Manyanda Chitimbo <manyanda.chitimbo@gmail.com>	2023-05-02 09:54:12 +08:00
David Jacot	2d0b816150	MINOR: Move `ControllerPurgatory` to `server-common` (#13555 ) This patch renames from `ControllerPurgatory` to `DeferredEventQueue` and moves it from the `metadata` module to `server-common` module. Reviewers: Alexandre Dupriez <alexandre.dupriez@gmail.com>, Ziming Deng <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@apache.org>	2023-04-21 11:19:04 +02:00
Purshotam Chauhan	df13775254	KAFKA-14828: Remove R/W locks using persistent data structures (#13437 ) Currently, StandardAuthorizer uses a R/W lock for maintaining the consistency of data. For the clusters with very high traffic, we will typically see an increase in latencies whenever a write operation comes. The intent of this PR is to get rid of the R/W lock with the help of immutable or persistent collections. Basically, new object references are used to hold the intermediate state of the write operation. After the completion of the operation, the main reference to the cache is changed to point to the new object. Also, for the read operation, the code is changed such that all accesses to the cache for a single read operation are done to a particular cache object only. In the PR description, you can find the performance of various libraries at the time of both read and write. Read performance is checked with the existing AuthorizerBenchmark. For write performance, a new AuthorizerUpdateBenchmark has been added which evaluates the performance of the addAcl operation. Reviewers: Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Divij Vaidya <diviv@amazon.com>	2023-04-21 14:08:23 +05:30
Proven Provenzano	abca86511e	KAFKA-14881: Rework UserScramCredentialRecord (#13513 ) Rework UserScramCredentialRecord to store serverKey and StoredKey rather than saltedPassword. This is necessary to support migration from ZK, since those are the fields we stored in ZK. Update latest MetadataVersion to IBP_3_5_IV2 and make SCRAM support conditional on this version. Moved ScramCredentialData.java from org.apache.kafka.image to org.apache.kafka.metadata, which seems more appropriate. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-04-18 09:41:38 -07:00
Ron Dagostino	e27926f92b	KAFKA-14735: Improve KRaft metadata image change performance at high … (#13280 ) topic counts. Introduces the use of persistent data structures in the KRaft metadata image to avoid copying the entire TopicsImage upon every change. Performance that was O(<number of topics in the cluster>) is now O(<number of topics changing>), which has dramatic time and GC improvements for the most common topic-related metadata events. We abstract away the chosen underlying persistent collection library via ImmutableMap<> and ImmutableSet<> interfaces and static factory methods. Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>, Purshotam Chauhan <pchauhan@confluent.io>	2023-04-17 17:52:28 -04:00
Colin Patrick McCabe	cfd0503006	MINOR: fix some flaky KRaft-related tests (#13543 ) (#13543 ) In SharedServer, fix some cases where a volatile variable could change to null while we were using it, during shutdown. This is mainly a junit test issue, although it could also cause ugly error messages during shutdown when running the server in a production context. Fix a race in KafkaEventQueueTest.testSize. Reviewers: David Arthur <mumrah@gmail.com>	2023-04-14 13:39:08 -04:00
Satish Duggana	e99984248d	KAFKA-9550 Copying log segments to tiered storage in RemoteLogManager (#13487 ) Added functionality to copy log segments, indexes to the target remote storage for each topic partition enabled with tiered storage. This involves creating scheduled tasks for all leader partition replicas to copy their log segments in sequence to tiered storage. Reviewers: Jun Rao <junrao@gmail.com>, Luke Chen <showuon@gmail.com>	2023-04-12 13:55:36 +08:00
Chia-Ping Tsai	3bbff167fa	MINOR: fix invalid usage in java docs (#13506 ) Reviewers: Luke Chen <showuon@gmail.com>	2023-04-06 16:01:14 +08:00
Luke Chen	31f9a54cba	KAFKA-14850: introduce InMemoryLeaderEpochCheckpoint (#13456 ) The motivation for introducing InMemoryLeaderEpochCheckpoint is to allow remote log manager to create the RemoteLogSegmentMetadata(RLSM) with the correct leader epoch info for a specific segment. To do that, we need to rely on the LeaderEpochCheckpointCache to truncate from start and end, to get the epoch info. However, we don't really want to truncate the epochs in cache (and write to checkpoint file in the end). So, we introduce this InMemoryLeaderEpochCheckpoint to feed into LeaderEpochCheckpointCache, and when we truncate the epoch for RLSM, we can do them in memory without affecting the checkpoint file, and without interacting with file system. Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>	2023-04-05 20:11:32 +08:00
Colin Patrick McCabe	09e59bc776	KAFKA-14857: Fix some MetadataLoader bugs (#13462 ) The MetadataLoader is not supposed to publish metadata updates until we have loaded up to the high water mark. Previously, this logic was broken, and we published updates immediately. This PR fixes that and adds a junit test. Another issue is that the MetadataLoader previously assumed that we would periodically get callbacks from the Raft layer even if nothing had happened. We relied on this to install new publishers in a timely fashion, for example. However, in older MetadataVersions that don't include NoOpRecord, this is not a safe assumption. Aside from the above changes, also fix a deadlock in SnapshotGeneratorTest, fix the log prefix for BrokerLifecycleManager, and remove metadata publishers on brokerserver shutdown (like we do for controllers). Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>	2023-03-29 12:30:12 -07:00
Colin Patrick McCabe	ddd652c672	MINOR: Standardize KRaft logging, thread names, and terminology (#13390 ) Standardize KRaft thread names. - Always use kebab case. That is, "my-thread-name". - Thread prefixes are just strings, not Option[String] or Optional<String>. If you don't want a prefix, use the empty string. - Thread prefixes end in a dash (except the empty prefix). Then you can calculate thread names as $prefix + "my-thread-name" - Broker-only components get "broker-$id-" as a thread name prefix. For example, "broker-1-" - Controller-only components get "controller-$id-" as a thread name prefix. For example, "controller-1-" - Shared components get "kafka-$id-" as a thread name prefix. For example, "kafka-0-" - Always pass a prefix to KafkaEventQueue, so that threads have names like "broker-0-metadata-loader-event-handler" rather than "event-handler". Prior to this PR, we had several threads just named "EventHandler" which was not helpful for debugging. - QuorumController thread name is "quorum-controller-123-event-handler" - Don't set a thread prefix for replication threads started by ReplicaManager. They run only on the broker, and already include the broker ID. Standardize KRaft slf4j log prefixes. - Names should be of the form "[ComponentName id=$id] ". So for a ControllerServer with ID 123, we will have "[ControllerServer id=123] " - For the QuorumController class, use the prefix "[QuorumController id=$id] " rather than "[Controller <nodeId] ", to make it clearer that this is a KRaft controller. - In BrokerLifecycleManager, add isZkBroker=true to the log prefix for the migration case. Standardize KRaft terminology. - All synonyms of combined mode (colocated, coresident, etc.) should be replaced by "combined" - All synonyms of isolated mode (remote, non-colocated, distributed, etc.) should be replaced by "isolated".	2023-03-16 15:33:03 -07:00
Calvin Liu	79b5f7f1ce	KAFKA-14617: Add ReplicaState to FetchRequest (KIP-903) (#13323 ) This patch is the first part of KIP-903. It updates the FetchRequest to include the new tagged ReplicaState field which replaces the now deprecated ReplicaId field. The FetchRequest version is bumped to version 15 and the MetadataVersion to 3.5-IV1. Reviewers: David Jacot <djacot@confluent.io>	2023-03-16 14:04:34 +01:00
Colin Patrick McCabe	aaa976a340	MINOR: Some metadata publishing fixes and refactors (#13337 ) This PR refactors MetadataPublisher's interface a bit. There is now an onControllerChange callback. This is something that some publishers might want. A good example is ZkMigrationClient. Instead of two different publish functions (one for snapshots, one for log deltas), we now have a single onMetadataUpdate function. Most publishers didn't want to do anything different in those two cases. The ones that do want to do something different for snapshots can always check the manifest type. The close function now has a default empty implementation, since most publishers didn't need to do anything there. Move the SCRAM logic out of BrokerMetadataPublisher and run it on the controller as well. On the broker, simply use dynamicClientQuotaPublisher to handle dynamic client quotas changes. That is what the controller already does, and the code is exactly the same in both cases. Fix the logging in FutureUtils.waitWithLogging a bit. Previously, when invoked from BrokerServer or ControllerServer, it did not include the standard "[Controller 123] " style prefix indicating server name and ID. This was confusing, especially when debugging junit tests. Reviewers: Ron Dagostino <rdagostino@confluent.io>, David Arthur <mumrah@gmail.com>	2023-03-09 14:52:40 -08:00
Ivan Yurchenko	e28e0bf0f2	KAFKA-14524: Rewrite KafkaMetricsGroup in Java (#13067 ) * KAFKA-14524: Rewrite KafkaMetricsGroup in Java Instead of being a base trait for classes, `KafkaMetricsGroup` is now an independent object. User classes could override methods in it to adjust its behavior like they used to with the trait model. Some classes were extending the `KafkaMetricsGroup` trait, but it wasn't actually used. Reviewers: Ismael Juma <ismael@juma.me.uk>, lbownik <lukasz.bownik@gmail.com>, Satish Duggana <satishd@pache.org>	2023-03-08 15:59:51 +05:30
David Jacot	6d37b0f07f	KAFKA-14462; [2/N] Add ConsumerGroupHeartbeart to GroupCoordinator interface (#13329 ) This patch adds ConsumerGroupHeartbeat to the GroupCoordinator interface and implements the API in KafkaApis. Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>	2023-03-07 09:20:03 +01:00
Christo Lolov	5b295293c0	MINOR: Remove unnecessary toString(); fix comment references (#13212 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>, Lucas Brutschy <lbrutschy@confluent.io>	2023-03-06 18:39:04 +01:00
Proven Provenzano	38c409cf33	KAFKA-14084: SCRAM support in KRaft. (#13114 ) This commit adds support to store the SCRAM credentials in a cluster with KRaft quorum servers and no ZK cluster backing the metadata. This includes creating ScramControlManager in the controller, and adding support for SCRAM to MetadataImage and MetadataDelta. Change UserScramCredentialRecord to contain only a single tuple (name, mechanism, salt, pw, iter) rather than a mapping between name and a list. This will avoid creating an excessively large record if a single user has many entries. Because record ID 11 (UserScramCredentialRecord) has not been used before, this is a compatible change. SCRAM will be supported in 3.5-IV0 and later. This commit does not include KIP-900 SCRAM bootstrapping support, or updating the credential cache on the controller (as opposed to broker). We will implement these in follow-on commits. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2023-03-03 10:23:34 -08:00
Kowshik Prakasam	9f55945270	MINOR: Introduce OffsetAndEpoch in LeaderEndpoint interface return values (#13268 ) Reviewers: Satish Duggana <satishd@apache.org>, Alexandre Dupriez <alexandre.dupriez@gmail.com>, Jun Rao <junrao@gmail.com>	2023-02-23 17:29:32 -08:00
Satish Duggana	322ac86ba2	KAFKA-14706: Move/rewrite ShutdownableThread to server-common module. (#13234 ) Move/rewrite ShutdownableThread to server-common module. Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <ismael@juma.me.uk>	2023-02-17 11:51:17 +08:00
Christo Lolov	ba0c5b0902	MINOR: Simplify JUnit assertions in tests; remove accidental unnecessary code in tests (#13219 ) * assertEquals called on array * Method is identical to its super method * Simplifiable assertions * Unused imports Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>	2023-02-16 16:13:31 +01:00
José Armando García Sancio	10164a6d2e	KAFKA-14693; Kafka node should halt instead of exit (#13227 ) Extend the implementation of ProcessTerminatingFaultHandler to support calling either Exit.halt or Exit.exit. Change the fault handler used by the Controller thread and the KRaft thread to use a halting fault handler. Those threads cannot call Exit.exit because Runtime.exit joins on the default shutdown hook thread. The shutdown hook thread joins on the controller and kraft thread terminating. This causes a deadlock. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>	2023-02-14 09:53:38 -08:00
David Arthur	cb4d9d1abf	KAFKA-14668 Avoid unnecessary UMR during ZK migration (#13183 ) Only send UMR to ZK brokers if the cluster metadata or topic metadata has changed. Reviewers: Akhilesh C <akhileshchg@users.noreply.github.com>, Colin P. McCabe <cmccabe@apache.org>	2023-02-09 13:24:02 -05:00
Christo Lolov	a0a9b6ffea	MINOR: Remove unnecessary code (#13210 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>	2023-02-07 17:37:45 +01:00
Ron Dagostino	6d11261d5d	MINOR: IBP_3_4_IV1 should be IBP_3_5_IV0 because it is not in 3.4 (#13198 ) The KIP-405 MetadataVersion changes will be released as part of AK 3.5, but were added as BP_3_4_IV1. This change fixes them to be IBP_3_5_IV0. There is no incompatibility because this feature has not yet been released. Also set didMetadataChange to false because KRaft metadata log records did not change. Reviewers: Satish Duggana <satishd@apache.org>, Christo Lolov <christo_lolov@yahoo.com>, Colin P. McCabe <cmccabe@apache.org>	2023-02-06 10:37:50 -08:00
Colin Patrick McCabe	6625214c52	KAFKA-14658: Do not open broker ports until we are ready to accept traffic (#13169 ) When we are listening on fixed ports, we should defer opening ports until we're ready to accept traffic. If we open the broker port too early, it can confuse monitoring and deployment systems. This is a particular concern when in KRaft mode, since in that mode, we create the SocketServer object earlier in the startup process than when in ZK mode. The approach taken in this PR is to defer opening the acceptor port until Acceptor.start is called. Note that when we are listening on a random port, we continue to open the port "early," in the SocketServer constructor. The reason for doing this is that there is no other way to find the random port number the kernel has selected. Since random port assignment is not used in production deployments, this should be reasonable. FutureUtils.java: add chainFuture and tests. SocketServerTest.scala: add timeouts to cases where we call get() on futures. Reviewers: David Arthur <mumrah@gmail.com>, Alexandre Dupriez <hangleton@users.noreply.github.com>	2023-02-01 09:42:03 -08:00
Colin Patrick McCabe	eb7d5cbf15	MINOR: add startup timeouts to KRaft integration tests (#13153 ) When running junit tests, it is not good to block forever on CompletableFuture objects. When there are bugs, this can lead to junit tests hanging forever. Jenkins does not deal with this well -- it often brings down the whole multi-hour test run. Therefore, when running integration tests in JUnit, set some reasonable time limits on broker and controller startup time. Reviewers: Jason Gustafson <jason@confluent.io>	2023-01-30 11:29:30 -08:00
Federico Valeri	72cfc994f5	KAFKA-14628: Move CommandLineUtils and CommandDefaultOptions to tools (#13131 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Christo Lolov <christololov@gmail.com>, Sagar Rao <sagarmeansocean@gmail.com>	2023-01-26 20:06:09 +01:00
Colin Patrick McCabe	8478bbb589	KAFKA-14601: Improve exception handling in KafkaEventQueue #13089 If KafkaEventQueue gets an InterruptedException while waiting for a condition variable, it currently exits immediately. Instead, it should complete the remaining events exceptionally and then execute the cleanup event. This will allow us to finish any necessary cleanup steps. In order to do this, we require the cleanup event to be provided when the queue is contructed, rather than when it's being shut down. Also, handle cases where Event#handleException itself throws an exception. Remove timed shutdown from the event queue code since nobody was using it, and it adds complexity. Add server-common/src/test/resources/test/log4j.properties since this gradle module somehow avoided having a test log4j.properties up to this point. Reviewers: David Arthur <mumrah@gmail.com>	2023-01-12 10:03:14 -08:00
Ismael Juma	8ac644d2b1	KAFKA-14607: Move Scheduler/KafkaScheduler to server-common (#13092 ) There were some concurrency inconsistencies in `KafkaScheduler` flagged by spotBugs that had to be fixed, summary of changes below: * Executor is `volatile` * We always synchronize and check `isStarted` as the first thing within the critical section when a mutating operation is performed. * We don't synchronize (but ensure the executor is not null in a safe way) in read-only operations that operate on the executor. With regards to `MockScheduler/MockTask`: * Set the type of `nextExecution` to `AtomicLong` and replaced inconsistent synchronization * Extracted logic into `MockTask.rescheduleIfPeriodic` Tweaked the `Scheduler` interface a bit: * Removed `unit` parameter since we always used `ms` except one invocation * Introduced a couple of `scheduleOnce` overloads to replace the usage of default arguments in Scala * Pulled up `resizeThreadPool` to the interface and removed `isStarted` from the interface. Other cleanups: * Removed spotBugs exclusion affecting `kafka.log.LogConfig`, which no longer exists. For broader context, see: * KAFKA-14470: Move log layer to storage module Reviewers: Jun Rao <junrao@gmail.com>	2023-01-10 23:51:58 -08:00
Ismael Juma	96d9710c17	KAFKA-14478: Move LogConfig/CleanerConfig and related to storage module (#13049 ) Additional notable changes to fix multiple dependency ordering issues: * Moved `ConfigSynonym` to `server-common` * Moved synonyms from `LogConfig` to `ServerTopicConfigSynonyms ` * Removed `LogConfigDef` `define` overrides and rely on `ServerTopicConfigSynonyms` instead. * Moved `LogConfig.extractLogConfigMap` to `KafkaConfig` * Consolidated relevant defaults from `KafkaConfig`/`LogConfig` in the latter * Consolidate relevant config name definitions in `TopicConfig` * Move `ThrottledReplicaListValidator` to `storage` Reviewers: Satish Duggana <satishd@apache.org>, Mickael Maison <mickael.maison@gmail.com>	2023-01-04 02:42:52 -08:00
Josep Prat	5f1810209f	MINOR: Fix small warning on javadoc and scaladoc (#11049 ) Escape the `>` character in javadoc Escape the `$` character when part of `${}` in scaladoc as this is the way to reference a variable Reviewers: Matthias J. Sax <matthias@confluent.io>	2022-12-28 13:41:45 -08:00
Ismael Juma	7b634c7034	KAFKA-14521: Replace BrokerCompressionCodec with BrokerCompressionType (#13011 ) This is a requirement for: * KAFKA-14477: Move LogValidator to storage module. For broader context on this change, please check: * KAFKA-14470: Move log layer to storage module Reviewers: dengziming <dengziming1993@gmail.com>	2022-12-20 11:53:49 -08:00
José Armando García Sancio	44b3177a08	KAFKA-14457; Controller metrics should only expose committed data (#12994 ) The controller metrics in the controllers has three problems. 1) the active controller exposes uncommitted data in the metrics. 2) the active controller doesn't update the metrics when the uncommitted data gets aborted. 3) the controller doesn't update the metrics when the entire state gets reset. We fix these issues by only updating the metrics when processing committed metadata records and reset the metrics when the metadata state is reset. This change adds a new type `ControllerMetricsManager` which processes committed metadata records and updates the metrics accordingly. This change also removes metrics updating responsibilities from the rest of the controller managers. Reviewers: Ron Dagostino <rdagostino@confluent.io>	2022-12-20 10:55:14 -08:00
Satish Duggana	7146ac57ba	[KAFKA-13369] Follower fetch protocol changes for tiered storage. (#11390 ) This PR implements the follower fetch protocol as mentioned in KIP-405. Added a new version for ListOffsets protocol to receive local log start offset on the leader replica. This is used by follower replicas to find the local log star offset on the leader. Added a new version for FetchRequest protocol to receive OffsetMovedToTieredStorageException error. This is part of the enhanced fetch protocol as described in KIP-405. We introduced a new field locaLogStartOffset to maintain the log start offset in the local logs. Existing logStartOffset will continue to be the log start offset of the effective log that includes the segments in remote storage. When a follower receives OffsetMovedToTieredStorage, then it tries to build the required state from the leader and remote storage so that it can be ready to move to fetch state. Introduced RemoteLogManager which is responsible for initializing RemoteStorageManager and RemoteLogMetadataManager instances. receives any leader and follower replica events and partition stop events and act on them also provides APIs to fetch indexes, metadata about remote log segments. Followup PRs will add more functionality like copying segments to tiered storage, retention checks to clean local and remote log segments. This will change the local log start offset and make sure the follower fetch protocol works fine for several cases. You can look at the detailed protocol changes in KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-FollowerReplication Co-authors: satishd@apache.org, kamal.chandraprakash@gmail.com, yingz@uber.com Reviewers: Kowshik Prakasam <kprakasam@confluent.io>, Cong Ding <cong@ccding.com>, Tirtha Chatterjee <tirtha.p.chatterjee@gmail.com>, Yaodong Yang <yangyaodong88@gmail.com>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Jun Rao <junrao@gmail.com>	2022-12-17 09:36:44 -08:00
Daniel Scanteianu	e3585a4cd5	MINOR: Document Offset and Partition 0-indexing, fix typo (#12753 ) Add comments to clarify that both offsets and partitions are 0-indexed, and fix a minor typo. Clarify which offset will be retrieved by poll() after seek() is used in various circumstances. Also added integration tests. Reviewers: Luke Chen <showuon@gmail.com>	2022-12-16 17:12:40 +08:00
Akhilesh C	8b045dcbf6	KAFKA-14446: API forwarding support from zkBrokers to the Controller (#12961 ) This PR enables brokers which are upgrading from ZK mode to KRaft mode to forward certain metadata change requests to the controller instead of applying them directly through ZK. To faciliate this, we now support EnvelopeRequest on zkBrokers (instead of only on KRaft nodes.) In BrokerToControllerChannelManager, we can now reinitialize our NetworkClient. This is needed to handle the case when we transition from forwarding requests to a ZK-based broker over the inter-broker listener, to forwarding requests to a quorum node over the controller listener. In MetadataCache.scala, distinguish between KRaft and ZK controller nodes with a new type, CachedControllerId. In LeaderAndIsrRequest, StopReplicaRequest, and UpdateMetadataRequest, switch from sending both a zk and a KRaft controller ID to sending a single controller ID plus a boolean to express whether it is KRaft. The previous scheme was ambiguous as to whether the system was in KRaft or ZK mode when both IDs were -1 (although this case is unlikely to come up in practice). The new scheme avoids this ambiguity and is simpler to understand. Reviewers: dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2022-12-15 14:16:41 -08:00
David Arthur	67c72596af	KAFKA-14448 Let ZK brokers register with KRaft controller (#12965 ) Prior to starting a KIP-866 migration, the ZK brokers must register themselves with the active KRaft controller. The controller waits for all brokers to register in order to verify that all the brokers can A) Communicate with the quorum B) Have the migration config enabled C) Have the proper IBP set This patch uses the new isMigratingZkBroker field in BrokerRegistrationRequest and RegisterBrokerRecord. The type was changed from int8 to bool for BrokerRegistrationRequest (a mistake from #12860). The ZK brokers use the existing BrokerLifecycleManager class to register and heartbeat with the controllers. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2022-12-13 13:15:21 -08:00
Ismael Juma	88725669e7	MINOR: Move MetadataQuorumCommand from `core` to `tools` (#12951 ) `core` should only be used for legacy cli tools and tools that require access to `core` classes instead of communicating via the kafka protocol (typically by using the client classes). Summary of changes: 1. Convert the command implementation and tests to Java and move it to the `tools` module. 2. Introduce mechanism to capture stdout and stderr from tests. 3. Change `kafka-metadata-quorum.sh` to point to the new command class. 4. Adjusted the test classpath of the `tools` module so that it supports tests that rely on the `@ClusterTests` annotation. 5. Improved error handling when an exception different from `TerseFailure` is thrown. 6. Changed `ToolsUtils` to avoid usage of arrays in favor of `List`. Reviewers: dengziming <dengziming1993@gmail.com>	2022-12-09 09:22:58 -08:00
David Arthur	7b7e40a536	KAFKA-14304 Add RPC changes, records, and config from KIP-866 (#12928 ) Reviewers: Colin Patrick McCabe <cmccabe@apache.org>	2022-12-02 19:59:52 -05:00
Colin Patrick McCabe	5514f372b3	MINOR: extract jointly owned parts of BrokerServer and ControllerServer (#12837 ) Extract jointly owned parts of BrokerServer and ControllerServer into SharedServer. Shut down SharedServer when the last component using it shuts down. But make sure to stop the raft manager before closing the ControllerServer's sockets. This PR also fixes a memory leak where ReplicaManager was not removing some topic metric callbacks during shutdown. Finally, we now release memory from the BatchMemoryPool in KafkaRaftClient#close. These changes should reduce memory consumption while running junit tests. Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>	2022-12-02 00:27:22 -08:00
Colin Patrick McCabe	a3f5eb6e35	MINOR: Implement EventQueue#size and EventQueue#empty (#12930 ) Implement functions to measure the number of events in the event queue. Reviewers: David Arthur <mumrah@gmail.com>	2022-12-01 09:04:04 -08:00
David Jacot	bc780c7c32	MINOR: Move timeline data structures from metadata to server-common (#12811 ) This path moves the timeline data structures from metadata module to server-common module as those will be used in the new group coordinator. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Colin Patrick McCabe <cmccabe@apache.org>	2022-11-04 08:52:32 +01:00
Colin Patrick McCabe	dac81161db	MINOR; Introduce ImageWriter and ImageWriterOptions (#12715 ) This PR adds a new ImageWriter interface which replaces the generic Consumer interface which accepted lists of records. It is better to do batching in the ImageWriter than to try to deal with that complexity in the MetadataImage#write functions, especially since batching is not semantically meaningful in KRaft snapshots. The new ImageWriter interface also supports freeze and close, which more closely matches the semantics of the underlying Raft classes. The PR also adds an ImageWriterOptions class which we can use to pass parameters to control how the new image is written. Right now, the parameters that we are interested in are the target metadata version (which may be more or less than the original image's version) and a handler function which is invoked whenever metadata is lost due to the target version. Convert over the MetadataImage#write function (and associated functions) to use the new ImageWriter and ImageWriterOptions. In particular, we now have a way to handle metadata losses by invoking ImageWriterOptions#handleLoss. This allows us to handle writing an image at a lower version, for the first time. This support is still not enabled externally by this PR, though. That will come in a future PR. Get rid of the use of SOME_RECORD_TYPE.highestSupportedVersion() in several places. In general, we do not want to "silently" change the version of a record that we output, just because a new version was added. We should be explicit about what record version numbers we are outputting. Implement ProducerIdsDelta#toString, to make debug logs look better. Move MockRandom to the server-common package so that other internal broker packages can use it. Reviewers: José Armando García Sancio <jsancio@apache.org>	2022-10-13 09:56:19 -07:00
Colin Patrick McCabe	f0f918b242	KAFKA-14177: Correctly support older kraft versions without FeatureLevelRecord (#12513 ) The main changes here are ensuring that we always have a metadata.version record in the log, making ˘sure that the bootstrap file can be used for records other than the metadata.version record (for example, we will want to put SCRAM initialization records there), and fixing some bugs. If no feature level record is in the log and the IBP is less than 3.3IV0, then we assume the minimum KRaft version for all records in the log. Fix some issues related to initializing new clusters. If there are no records in the log at all, then insert the bootstrap records in a single batch. If there are records, but no metadata version, process the existing records as though they were metadata.version 3.3IV0 and then append a metadata version record setting version 3.3IV0. Previously, we were not clearly distinguishing between the case where the metadata log was empty, and the case where we just needed to add a metadata.version record. Refactor BootstrapMetadata into an immutable class which contains a 3-tuple of metadata version, record list, and source. The source field is used to log where the bootstrap metadata was obtained from. This could be a bootstrap file, the static configuration, or just the software defaults. Move the logic for reading and writing bootstrap files into BootstrapDirectory.java. Add LogReplayTracker, which tracks whether the log is empty. Fix a bug in FeatureControlManager where it was possible to use a "downgrade" operation to transition to a newer version. Do not store whether we have seen a metadata version or not in FeatureControlManager, since that is now handled by LogReplayTracker. Introduce BatchFileReader, which is a simple way of reading a file containing batches of snapshots that does not require spawning a thread. Rename SnapshotFileWriter to BatchFileWriter to be consistent, and to reflect the fact that bootstrap files aren't snapshots. QuorumController#processBrokerHeartbeat: add an explanatory comment. Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-08-25 18:12:31 -07:00
dengziming	150fd5b0b1	KAFKA-13914: Add command line tool kafka-metadata-quorum.sh (#12469 ) Add `MetadataQuorumCommand` to describe quorum status, I'm trying to use arg4j style command format, currently, we only support one sub-command which is "describe" and we can specify 2 arguments which are --status and --replication. ``` # describe quorum status kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --replication ReplicaId LogEndOffset Lag LastFetchTimeMs LastCaughtUpTimeMs Status 0 10 0 -1 -1 Leader 1 10 0 -1 -1 Follower 2 10 0 -1 -1 Follower kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --status ClusterId: fMCL8kv1SWm87L_Md-I2hg LeaderId: 3002 LeaderEpoch: 2 HighWatermark: 10 MaxFollowerLag: 0 MaxFollowerLagTimeMs: -1 CurrentVoters: [3000,3001,3002] CurrentObservers: [0,1,2] # specify AdminClient properties kafka-metadata-quorum.sh --bootstrap-server localhost:9092 --command-config config.properties describe --status ``` Reviewers: Jason Gustafson <jason@confluent.io>	2022-08-20 08:37:26 -07:00
Niket Goel	ac64693434	KAFKA-14114: Add Metadata Error Related Metrics This PR adds in 3 metrics as described in KIP-859: kafka.server:type=broker-metadata-metrics,name=metadata-load-error-count kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count kafka.controller:type=KafkaController,name=MetadataErrorCount These metrics are incremented by fault handlers when the appropriate fault happens. Broker-side load errors happen in BrokerMetadataListener. Broker-side apply errors happen in the BrokerMetadataPublisher. The metric on the controller is incremented when the standby controller (not active) encounters a metadata error. In BrokerMetadataPublisher, try to limit the damage caused by an exception by introducing more catch blocks. The only fatal failures here are those that happen during initialization, when we initialize the manager objects (these would also be fatal in ZK mode). In BrokerMetadataListener, try to improve the logging of faults, especially ones that happen when replaying a snapshot. Try to limit the damage caused by an exception. Replace MetadataFaultHandler with LoggingFaultHandler, which is more flexible and takes a Runnable argument. Add LoggingFaultHandlerTest. Make QuorumControllerMetricsTest stricter. Fix a bug where we weren't cleaning up some metrics from the yammer registry on close in QuorumControllerMetrics. Co-author: Colin P. McCabe <cmccabe@apache.org>	2022-08-09 15:22:15 -07:00
Colin Patrick McCabe	555744da70	KAFKA-14124: improve quorum controller fault handling (#12447 ) Before trying to commit a batch of records to the __cluster_metadata log, the active controller should try to apply them to its current in-memory state. If this application process fails, the active controller process should exit, allowing another node to take leadership. This will prevent most bad metadata records from ending up in the log and help to surface errors during testing. Similarly, if the active controller attempts to renounce leadership, and the renunciation process itself fails, the process should exit. This will help avoid bugs where the active controller continues in an undefined state. In contrast, standby controllers that experience metadata application errors should continue on, in order to avoid a scenario where a bad record brings down the whole controller cluster. The intended effect of these changes is to make it harder to commit a bad record to the metadata log, but to continue to ride out the bad record as well as possible if such a record does get committed. This PR introduces the FaultHandler interface to implement these concepts. In junit tests, we use a FaultHandler implementation which does not exit the process. This allows us to avoid terminating the gradle test runner, which would be very disruptive. It also allows us to ensure that the test surfaces these exceptions, which we previously were not doing (the mock fault handler stores the exception). In addition to the above, this PR fixes a bug where RaftClient#resign was not being called from the renounce() function. This bug could have resulted in the raft layer not being informed of an active controller resigning. Reviewers: David Arthur <mumrah@gmail.com>	2022-08-04 22:49:45 -07:00
David Arthur	cc384054c6	KAFKA-13935 Fix static usages of IBP in KRaft mode (#12250 ) * Set the minimum supported MetadataVersion to 3.0-IV1 * Remove MetadataVersion.UNINITIALIZED * Relocate RPC version mapping for fetch protocols into MetadataVersion * Replace static IBP calls with dynamic calls to MetadataCache A side effect of removing the UNINITIALIZED metadata version is that the FeatureControlManager and FeatureImage will initialize themselves with the minimum KRaft version (3.0-IV1). The rationale for setting the minimum version to 3.0-IV1 is so that we can avoid any cases of KRaft mode running with an old log message format (KIP-724 was introduced in 3.0-IV1). As a side-effect of increasing this minimum version, the feature level values decreased by one. Reviewers: Jason Gustafson <jason@confluent.io>, Jun Rao <junrao@gmail.com>	2022-06-13 14:23:28 -04:00
Divij Vaidya	4426b05e54	MINOR: Use Exit.addShutdownHook instead of directly adding hooks to Runtime (#12283 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Igor Soarez <soarez@apple.com>, Kvicii <kvicii.yu@gmail.com>	2022-06-13 17:25:40 +02:00
David Jacot	151ca12a56	KAFKA-13916; Fenced replicas should not be allowed to join the ISR in KRaft (#12240 ) This PR implements the first part of KIP-841. Specifically, it implements the following: 1. Adds a new metadata version. 2. Adds the InControlledShutdown field to the BrokerRegistrationRecord and BrokerRegistrationChangeRecord and bump their versions. The newest versions are only used if the new metadata version is enabled. 3. Writes a BrokerRegistrationChangeRecord with InControlledShutdown set when a broker requests a controlled shutdown. 4. Ensures that fenced and in controlled shutdown replicas are not picked as leaders nor included in the ISR. 5. Adds or extends unit tests. Reviewes: José Armando García Sancio <jsancio@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>	2022-06-07 10:37:20 -07:00
Colin Patrick McCabe	65b4374203	MINOR: implement BrokerRegistrationChangeRecord (#12195 ) Implement BrokerRegistrationChangeRecord as specified in KIP-746. This is a more flexible record than the single-purpose Fence / Unfence records. Reviewers: José Armando García Sancio <jsancio@gmail.com>, dengziming <dengziming1993@gmail.com>	2022-06-01 16:33:01 -07:00
José Armando García Sancio	7d1b0926fa	KAFKA-13883: Implement NoOpRecord and metadata metrics (#12183 ) Implement NoOpRecord as described in KIP-835. This is controlled by the new metadata.max.idle.interval.ms configuration. The KRaft controller schedules an event to write NoOpRecord to the metadata log if the metadata version supports this feature. This event is scheduled at the interval defined in metadata.max.idle.interval.ms. Brokers and controllers were improved to ignore the NoOpRecord when replaying the metadata log. This PR also addsffour new metrics to the KafkaController metric group, as described KIP-835. Finally, there are some small fixes to leader recovery. This PR fixes a bug where metadata version 3.3-IV1 was not marked as changing the metadata. It also changes the ReplicaControlManager to accept a metadata version supplier to determine if the leader recovery state is supported. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-06-01 10:48:24 -07:00
dengziming	54d60ced86	KAFKA-13833: Remove the min_version_level from the finalized version range written to ZooKeeper (#12062 ) Reviewers: David Arthur <mumrah@gmail.com>	2022-05-25 14:02:34 -04:00
David Arthur	1135f22eaf	KAFKA-13830 MetadataVersion integration for KRaft controller (#12050 ) This patch builds on #12072 and adds controller support for metadata.version. The kafka-storage tool now allows a user to specify a specific metadata.version to bootstrap into the cluster, otherwise the latest version is used. Upon the first leader election of the KRaft quroum, this initial metadata.version is written into the metadata log. When writing snapshots, a FeatureLevelRecord for metadata.version will be written out ahead of other records so we can decode things at the correct version level. This also includes additional validation in the controller when setting feature levels. It will now check that a given metadata.version is supportable by the quroum, not just the brokers. Reviewers: José Armando García Sancio <jsancio@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>, Alyssa Huang <ahuang@confluent.io>	2022-05-18 12:08:36 -07:00
José Armando García Sancio	e94934b6b7	MINOR; DeleteTopics version tests (#12141 ) Add a DeleteTopics test for all supported versions. Convert the DeleteTopicsRequestTest to run against both ZK and KRaft mode. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>	2022-05-12 13:04:48 -07:00

1 2 3 4 5 ...

263 Commits