Commit Graph

263 Commits

Author SHA1 Message Date
Kamal Chandraprakash f3dbd7ed08
KAFKA-16904: Metric to measure the latency of remote read requests (#16209)
Reviewers: Satish Duggana <satishd@apache.org>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>
2024-06-11 21:07:12 +05:30
ShivsundarR 68070c94a6
KAFKA-16724: Added support for fractional throughput and monotonic payload in kafka-producer-perf-test.sh
Added support for fractional throughput and monotonic payload in kafka-producer-perf-test.sh.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka#KIP932:QueuesforKafka-kafka-producer-perf-test.sh

Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
2024-06-11 11:19:31 +05:30
gongxuanzhang 816209d187
KAFKA-10787 Apply spotless to transaction-coordinator and server-common (#16172)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-09 05:36:17 +08:00
Okada Haruki 3835515fea
KAFKA-16541 Fix potential leader-epoch checkpoint file corruption (#15993)
A patch for KAFKA-15046 got rid of fsync on LeaderEpochFileCache#truncateFromStart/End for performance reason, but it turned out this could cause corrupted leader-epoch checkpoint file on ungraceful OS shutdown, i.e. OS shuts down in the middle when kernel is writing dirty pages back to the device.

To address this problem, this PR makes below changes: (1) Revert LeaderEpochCheckpoint#write to always fsync
(2) truncateFromStart/End now call LeaderEpochCheckpoint#write asynchronously on scheduler thread
(3) UnifiedLog#maybeCreateLeaderEpochCache now loads epoch entries from checkpoint file only when current cache is absent

Reviewers: Jun Rao <junrao@gmail.com>
2024-06-06 15:10:13 +09:00
David Jacot 53d592e369
MINOR: Fix type in MetadataVersion.IBP_4_0_IV0 (#16181)
This patch fixes a typo in MetadataVersion.IBP_4_0_IV0. It should be 0 not O.

Reviewers: Justine Olshan <jolshan@confluent.io>, Jun Rao <junrao@gmail.com>,  Chia-Ping Tsai <chia7712@gmail.com>
2024-06-03 20:48:04 -07:00
David Jacot ba61ff0cd9
KAFKA-16860; [1/2] Introduce group.version feature flag (#16120)
This patch introduces the `group.version` feature flag with one version:
1) Version 1 enables the new consumer group rebalance protocol (KIP-848).

Reviewers: Justine Olshan <jolshan@confluent.io>
2024-05-31 12:48:55 -07:00
Mickael Maison b6d0fb055d
MINOR: Refactor DynamicConfig (#16133)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-01 01:09:46 +08:00
Justine Olshan 7c1bb1585f
KAFKA-16308 [2/N]: Allow unstable feature versions and rename unstable metadata config (#16130)
As per KIP-1022, we will rename the unstable metadata versions enabled config to support all feature versions.

Features is also updated to return latest production and latest testing versions of each feature.

A feature is production ready when the corresponding metadata version (bootstrapMetadataVersion) is production ready.

Adds tests for the feature usage of the unstableFeatureVersionsEnabled config

Reviewers: David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
2024-05-30 14:52:50 -07:00
Justine Olshan 5e3df22095
KAFKA-16308 [1/N]: Create FeatureVersion interface and add `--feature` flag and handling to StorageTool (#15685)
As part of KIP-1022, I have created an interface for all the new features to be used when parsing the command line arguments, doing validations, getting default versions, etc.

I've also added the --feature flag to the storage tool to show how it will be used.

Created a TestFeatureVersion to show an implementation of the interface (besides MetadataVersion which is unique) and added tests using this new test feature.

I will add the unstable config and tests in a followup.

Reviewers: David Mao <dmao@confluent.io>, David Jacot <djacot@confluent.io>, Artem Livshits <alivshits@confluent.io>, Jun Rao <junrao@apache.org>
2024-05-29 16:36:06 -07:00
Viktor Somogyi-Vass 5a4898450d
KAFKA-15649: Handle directory failure timeout (#15697)
A broker that is unable to communicate with the controller will shut down
after the configurable log.dir.failure.timeout.ms.

The implementation adds a new event to the Kafka EventQueue. This event
is deferred by the configured timeout and will execute the shutdown
if the heartbeat communication containing the failed log dir is still
pending with the controller.

Reviewers: Igor Soarez <soarez@apple.com>
2024-05-23 16:36:39 +01:00
Mickael Maison ab0cc72499
MINOR: Move parseCsvList to server-common (#16029)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-23 16:01:45 +02:00
Mickael Maison affe8da54c
KAFKA-7632: Support Compression Levels (KIP-390) (#15516)
Reviewers: Jun Rao <jun@confluent.io>,  Luke Chen <showuon@gmail.com>
Co-authored-by: Lee Dongjin <dongjin@apache.org>
2024-05-21 17:58:49 +02:00
Gaurav Narula 412b05df00
KAFKA-16789 Fix thread leak detection for event handler threads (#15984)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-05-19 18:21:56 +08:00
Jeff Kim 8a9dd2beda
KAFKA-16663; Cancel write timeout TimerTask on successful event completion (#15902)
Write events create and add a TimerTask to schedule the timeout operation. The issue is that we pile up the number of timer tasks which are essentially no-ops if replication was successful. They stay in memory for 15 seconds (default write timeout) and as the rate of write increases, the impact on memory usage increases.

Instead, cancel the corresponding write timeout task when the write event is committed to the log. This also applies to complete transaction events.

Reviewers: David Jacot <djacot@confluent.io>
2024-05-13 00:18:32 -07:00
Gaurav Narula 510431a732
KAFKA-16688: Use helper method to shutdown ExecutorService (#15886)
We observe some thread leaks in CI which point to the executor service
thread. This change tries to shutdown the executor service using the
helper method in `ThreadUtils`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Igor Soarez <soarez@apple.com>
2024-05-10 10:54:31 +01:00
PoAn Yang 4825c89d14
KAFKA-16588 broker shutdown hangs when log.segment.delete.delay.ms is zero (#15773)
Instead of entering pending forever, this PR invoke next schedule after 1ms. However, the side effect is busy-waiting. Hence, This PR also update the docs to remind users about that - the issue about smaller log.segment.delete.delay.ms

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-05-01 17:11:20 +08:00
Omnia Ibrahim cfe5ab5cf2
KAFKA-15853 Move quota configs into server-common package (#15774)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-04-24 13:05:18 +08:00
Omnia Ibrahim 5e96e5c898
KAFKA-15853 Refactor KafkaConfig to use PasswordEncoderConfigs (#15770)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-22 00:47:57 +08:00
Kuan-Po (Cooper) Tseng ced79ee12f
KAFKA-16552 Create an internal config to control InitialTaskDelayMs in LogManager to speed up tests (#15719)
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-04-20 20:34:02 +08:00
Omnia Ibrahim ecb2dd4cdc
KAFKA-15853 Move KafkaConfig log properties and docs out of core (#15569)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Nikolay <nizhikov@apache.org>, Federico Valeri <fvaleri@redhat.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-04-20 04:14:23 +08:00
Mickael Maison 2b9729ba77
MINOR: Various cleanups in server and server-common (#15710)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-16 15:20:49 +08:00
Igor Soarez 15c4ade06a
MINOR: Improve logging in AssignmentsManager (#15522)
At the moment it can be a bit difficult to troubleshoot issues related to the AssignmentsManager. Mainly because:

    Topic partitions are logged with topic ID and partition index but without the topic name.
    Directory IDs are logged without the directory path.
    Assignment reasons aren't tracked.

This patch addresses the three issues.

Reviewers: Luke Chen <showuon@gmail.com>
2024-04-12 14:13:40 +08:00
Kuan-Po (Cooper) Tseng 169ed60fe1
KAFKA-16477 Detect thread leaked client-metrics-reaper in tests (#15668)
After profiling the kafka tests, tons of client-metrics-reaper thread not cleanup after BrokerServer shutdown.
The thread client-metrics-reaper comes from ClientMetricsManager#expirationTimer, and BrokerServer#shudown doesn't close ClientMetricsManager which let the thread still runs in background.

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-04-09 05:07:33 +08:00
Erik van Oosten 8e61f04228
MINOR: Fix usage of none in javadoc (#15674)
- Use `Empty` instead of 'none' when referring to `Optional` values.
- `Headers.lastHeader` returns `null` when no header is found.
- Fix minor spelling mistakes.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-08 08:43:05 +08:00
Greg Harris bf5e04e416
KAFKA-16349: Prevent race conditions in Exit class from stopping test JVM (#15484)
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewers: Chris Egerton <chrise@aiven.io>
2024-03-28 20:07:42 -07:00
Colin Patrick McCabe 8d914b543d
KAFKA-16411: Correctly migrate default client quota entities (#15584)
KAFKA-16222 fixed a bug whereby we didn't undo the name sanitization used on client quota entity names
stored in ZooKeeper. However, it incorrectly claimed to fix the handling of default client quota
entities. It also failed to correctly re-sanitize when syncronizing the data back to ZooKeeper.

This PR fixes ZkConfigMigrationClient to do the sanitization correctly on both the read and write
paths. We do de-sanitization before invoking the visitors, since after all it does not make sense to
do the same de-sanitization step in each and every visitor.

Additionally, this PR fixes a bug causing default entities to be converted incorrectly. For example,
ClientQuotaEntity(user -> null) is stored under the /config/users/<default> znode in ZooKeeper. In
KRaft it appears as a ClientQuotaRecord with EntityData(entityType=users, entityName=null).
Prior to this PR, this was being converted to a ClientQuotaRecord with EntityData(entityType=users,
entityName=""). That represents a quota on the user whose name is the empty string (yes, we allow
users to name themselves with the empty string, sadly.)

The confusion appears to have arisen because for TOPIC and BROKER configurations, the default
ConfigResource is indeed the one named with the empty (not null) string. For example, the default
topic configuration resource is ConfigResource(name="", type=TOPIC).  However, things are different
for client quotas. Default client quota entities in KRaft (and also in AdminClient) are represented
by maps with null values. For example, the default User entity is represented by Map("user" ->
null).  In retrospect, using a map with null values was a poor choice; a Map<String,
Optional<String>> would have made more sense. However, this is the way the API currently is and we
have to convert correctly.

There was an additional level of confusion present in KAFKA-16222 where someone thought that using
the ZooKeeper placeholder string "<default>" in the AdminClient API would yield a default client
quota entity. Thise seems to have been suggested by the ConfigEntityName class that was created
recently. In fact, <default> is not part of any public API in Kafka. Accordingly, this PR also
renames ConfigEntityName.DEFAULT to ZooKeeperInternals.DEFAULT_STRING, to make it clear that the
string <default> is just a detail of the ZooKeeper implementation.  It is not used in the Kafka API
to indicate defaults. Hopefully this will avoid confusion in the future.

Finally, the PR also creates KRaftClusterTest.testDefaultClientQuotas to get extra test coverage of
setting default client quotas.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Igor Soarez <soarez@apple.com>
2024-03-26 16:49:38 -07:00
PoAn Yang 6f8d4fe26b
KAFKA-15949: Unify metadata.version format in log and error message (#15505)
There were different words for metadata.version like metadata version or metadataVersion. Unify format as metadata.version.

Reviewers: Luke Chen <showuon@gmail.com>
2024-03-26 20:09:29 +08:00
Igor Soarez f8ce7feebc
KAFKA-15950: Serialize heartbeat requests (#14903)
In between HeartbeatRequest being sent and the response being handled,
i.e. while a HeartbeatRequest is in flight, an extra request may be
immediately scheduled if propagateDirectoryFailure, setReadyToUnfence,
or beginControlledShutdown is called.

To prevent the extra request, we can avoid the extra requests by checking
whether a request is in flight, and delay the scheduling if necessary.

Some of the tests in BrokerLifecycleManagerTest are also improved to
remove race conditions and reduce flakiness.

Reviewers: Colin McCabe <colin@cmccabe.xyz>, Ron Dagostino <rdagostino@confluent.io>, Jun Rao <junrao@gmail.com>
2024-03-25 10:31:19 -07:00
Kuan-Po (Cooper) Tseng bf9a27fefd
KAFKA-16388 add production-ready test of 3.3 - 3.6 release to MetadataVersionTest.testFromVersionString (#15563)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-24 13:09:21 +08:00
Kuan-Po (Cooper) Tseng 12a1d85362
KAFKA-12187 replace assertTrue(obj instanceof X) with assertInstanceOf (#15512)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-20 10:36:25 +08:00
Chris Holland e878654e95
MINOR: Cleanup BoundedList to Make Constructors More Safe (#15507)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-15 21:18:24 +08:00
Kamal Chandraprakash e4c53d093e
KAFKA-15206: Fix the flaky RemoteIndexCacheTest.testClose test (#15523)
It is possible that due to resource constraint, ShutdownableThread#run might be called later than the ShutdownableThread#close method.

Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>
2024-03-15 10:33:40 +08:00
David Jacot f5c4d522fd
MINOR: Add read/write all operation (#15462)
There are a few cases in the group coordinator service where we want to read from or write to each of the known coordinators (each of __consumer_offsets partitions). The current implementation needs to get the list of the known coordinators then schedules the operation and finally aggregate the results. This patch is an attempt to streamline this by adding multi read/write to the runtime.

Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-03-07 07:51:04 -08:00
Nikolay eea369af94
KAFKA-14588 Log cleaner configuration move to CleanerConfig (#15387)
In order to move ConfigCommand to tools we must move all it's dependencies which includes KafkaConfig and other core classes to java. This PR moves log cleaner configuration to CleanerConfig class of storage module.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-05 18:11:56 +08:00
David Jacot 0472db2cd3
MINOR: Uniformize error handling/transformation in GroupCoordinatorService (#15196)
This patch uniformizes the error handling in the GroupCoordinatorService with the aim to reuse the same error translation for all operations. It also ensures that exceptions are unwrapped if needed.

Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
2024-01-30 23:23:58 -08:00
Mickael Maison 3e9ef70853
KAFKA-15853: Move PasswordEncoder to server-common (#15246)
Reviewers: Luke Chen <showuon@gmail.com>, Omnia Ibrahim <o.g.h.ibrahim@gmail.com>
2024-01-30 19:08:50 +01:00
Gaurav Narula 4c6f975ab3 KAFKA-16162: resend broker registration on metadata update to IBP 3.7-IV2
We update metadata update handler to resend broker registration when
metadata has been updated to >= 3.7IV2 so that the controller becomes
aware of the log directories in the broker.

We also update DirectoryId::isOnline to return true on an empty list of
log directories while the controller awaits broker registration.

Co-authored-by: Proven Provenzano <pprovenzano@confluent.io>

Reviewers: Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>, Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2024-01-30 10:00:07 -08:00
Apoorv Mittal 208f9e7765
KAFKA-15813: Evict client instances from cache (KIP-714) (#15234)
KIP-714 requires client instance cache in broker which should also have a time-based eviction policy where client instances which are not actively sending metrics should be evicted. KIP mentions This client instance specific state is maintained in broker memory up to MAX(60*1000, PushIntervalMs * 3) milliseconds.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Jun Rao <junrao@gmail.com>
2024-01-23 15:06:02 -08:00
David Arthur 7bf7fd99a5
KAFKA-16078: Be more consistent about getting the latest MetadataVersion
This PR creates MetadataVersion.latestTesting to represent the highest metadata version (which may be unstable) and MetadataVersion.latestProduction to represent the latest version that should be used in production. It fixes a few cases where the broker was advertising that it supported the testing versions even when unstable metadata versions had not been configured.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
2024-01-17 14:59:22 -08:00
Nikolay da2aa68269
KAFKA-14588: Move ConfigEntityName to server-common (#14868)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2024-01-08 12:41:43 +01:00
Nikolay 45bd19f2ef
KAFKA-14588: Move ConfigType to server-common (#14867)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-12-22 18:35:27 +01:00
Proven Provenzano b0e99b5593
KAFKA-15922: Bump MetadataVersion to support JBOD with KRaft (#14984)
Moves ELR from MetadataVersion IBP_3_7_IV3 into the new IBP_3_8_IV0 because the ELR feature was not completed before 3.7 reached feature freeze.  Leaves IBP_3_7_IV3 empty -- it is a no-op and is not reused for anything.  Adds the new MetadataVersion IBP_3_7_IV4 for the FETCH request changes from KIP-951, which were mistakenly never associated with a MetadataVersion.  Updates the LATEST_PRODUCTION MetadataVersion to IBP_3_7_IV4 to declare both KRaft JBOD and the KIP-951 changes ready for production use.

Reviewers: Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>, Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@apache.org>, Justine Olshan <jolshan@confluent.io>
2023-12-14 10:08:54 -05:00
Bruno Cadonna 87e3cbe4da
MINOR: Add junit properties to display parameterized test names (#14983)
In many parameterized tests, the display name is broken. Example - testMetadataFetch appears as [1] true, [2] false link
This is because the constant in @ParameterizedTest

String DEFAULT_DISPLAY_NAME = "[{index}] {argumentsWithNames}";

This PR adds a new junit-platform.properties which overrides to add a {displayName} which shows the the display name of the method

For existing tests which override the name, should work as is. The precedence rules are explained

    name attribute in @ParameterizedTest, if present
    value of the junit.jupiter.params.displayname.default configuration parameter, if present
    DEFAULT_DISPLAY_NAME constant defined in @ParameterizedTest

Source: https://junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests-display-names

Sample test run output
Before: [1] true link
After: testMetadataExpiry(boolean).false link

This commit is an extension of bdf6d46b41 which needed to reverted due to introduces test failures.

Reviewers: David Jacot <djacot@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2023-12-13 09:42:18 +01:00
David Jacot b96ded9859
Revert "MINOR: Add junit properties to display parameterized test names (#14687)" (#14961)
This reverts commit bdf6d46b41. We found out that this commit introduced flakiness in Streams' tests. We will revise it.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2023-12-07 23:20:03 -08:00
Omnia Ibrahim ec92410e59
KAFKA-15363: Broker log directory failure changes (#14790)
Part of JBOD KIP-858, https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft

Reviewers: Igor Soarez <i@soarez.me>, Colin P. McCabe <cmccabe@apache.org>, Ron Dagostino <rdagostino@confluent.io>
2023-12-07 20:44:56 -05:00
Igor Soarez c515bf51f8 KAFKA-15426: Process and persist directory assignments
Handle AssignReplicasToDirs requests, persist metadata changes
with new directory assignments and possible leader elections.

Reviewers: Proven Provenzano <pprovenzano@confluent.io>, Ron Dagostino <rndgstn@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2023-12-07 11:44:45 -08:00
Alok Thatikunta bdf6d46b41
MINOR: Add junit properties to display parameterized test names (#14687)
In many parameterized tests, the display name is broken. Example - `testMetadataFetch` appears as `[1] true`, `[2] false`  [link](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14607/9/testReport/junit/org.apache.kafka.clients.producer/KafkaProducerTest/) 
This is because the constant in `@ParameterizedTest`
```java
String DEFAULT_DISPLAY_NAME = "[{index}] {argumentsWithNames}";
```

This PR adds a new `junit-platform.properties` which overrides to add a `{displayName}` which shows the `the display name of the method`

For existing tests which override the name, should work as is. The precedence rules are explained

> 1. `name` attribute in `@ParameterizedTest`, if present
> 2. value of the `junit.jupiter.params.displayname.default` configuration parameter, if present
> 3. `DEFAULT_DISPLAY_NAME` constant defined in `@ParameterizedTest`

Source: https://junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests-display-names

Sample test run output 
Before: `[1] true` [link](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14607/9/testReport/junit/org.apache.kafka.clients.producer/KafkaProducerTest/)
After: `testMetadataExpiry(boolean).false` [link](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14687/1/testReport/junit/org.apache.kafka.clients.producer/KafkaProducerTest/)

Reviewers: Divij Vaidya <diviv@amazon.com>, Bruno Cadonna <cadonna@apache.org>, David Jacot <djacot@confluent.io>
2023-12-06 08:42:45 -08:00
Igor Soarez 6b87c85291 KAFKA-15886: Always specify directories for new partition registrations
When creating partition registrations directories must always be defined.

If creating a partition from a PartitionRecord or PartitionChangeRecord from an older version that
does not support directory assignments, then DirectoryId.MIGRATING is assumed.

If creating a new partition, or triggering a change in assignment, DirectoryId.UNASSIGNED should be
specified, unless the target broker has a single online directory registered, in which case the
replica should be assigned directly to that single directory.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-11-30 14:10:47 -08:00
Colin Patrick McCabe a94bc8d6d5
KAFKA-15922: Add a MetadataVersion for JBOD (#14860)
Assign MetadataVersion.IBP_3_7_IV2 to JBOD.

Move KIP-966 support to MetadataVersion.IBP_3_7_IV3.

Create MetadataVersion.LATEST_PRODUCTION as the latest metadata version that can be used when formatting a
new cluster, or upgrading a cluster using kafka-features.sh. This will allow us to clearly distinguish between stable
and unstable metadata versions for the first time.

Reviewers: Igor Soarez <soarez@apple.com>, Ron Dagostino <rndgstn@gmail.com>, Calvin Liu <caliu@confluent.io>, Proven Provenzano <pprovenzano@confluent.io>
2023-11-30 10:35:13 -08:00
Colin Patrick McCabe bd18551b32
MINOR: DirectoryId.MIGRATING should be all zeros (#14858)
DirectoryId.MIGRATING should be all zeros. All zeros is the default Uuid value in KPRC, and
MIGRATING is the default directory ID value.

Reviewers: Ron Dagostino <rdagostino@confluent.io>
2023-11-29 13:12:33 -08:00
Okada Haruki d71d0639d9
KAFKA-15046: Get rid of unnecessary fsyncs inside UnifiedLog.lock to stabilize performance (#14242)
While any blocking operation under holding the UnifiedLog.lock could lead to serious performance (even availability) issues, currently there are several paths that calls fsync(2) inside the lock
In the meantime the lock is held, all subsequent produces against the partition may block
This easily causes all request-handlers to be busy on bad disk performance
Even worse, when a disk experiences tens of seconds of glitch (it's not rare in spinning drives), it makes the broker to unable to process any requests with unfenced from the cluster (i.e. "zombie" like status)
This PR gets rid of 4 cases of essentially-unnecessary fsync(2) calls performed under the lock:
(1) ProducerStateManager.takeSnapshot at UnifiedLog.roll
I moved fsync(2) call to the scheduler thread as part of existing "flush-log" job (before incrementing recovery point)
Since it's still ensured that the snapshot is flushed before incrementing recovery point, this change shouldn't cause any problem
(2) ProducerStateManager.removeAndMarkSnapshotForDeletion as part of log segment deletion
This method calls Utils.atomicMoveWithFallback with needFlushParentDir = true internally, which calls fsync.
I changed it to call Utils.atomicMoveWithFallback with needFlushParentDir = false (which is consistent behavior with index files deletion. index files deletion also doesn't flush parent dir)
This change shouldn't cause problems neither.
(3) LeaderEpochFileCache.truncateFromStart when incrementing log-start-offset
This path is called from deleteRecords on request-handler threads.
Here, we don't need fsync(2) either actually.
On unclean shutdown, few leader epochs might be remained in the file but it will be handled by LogLoader on start-up so not a problem
(4) LeaderEpochFileCache.truncateFromEnd as part of log truncation
Likewise, we don't need fsync(2) here, since any epochs which are untruncated on unclean shutdown will be handled on log loading procedure

Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Justine Olshan <jolshan@confluent.io>, Jun Rao <junrao@gmail.com>
2023-11-29 09:43:44 -08:00
Igor Soarez a03a71d7b5 KAFKA-15357: Aggregate and propagate assignments
A new AssignmentsManager accumulates, batches, and sends KIP-858
assignment events to the Controller. Assignments are sent via
AssignReplicasToDirs requests.

Move QuorumTestHarness.formatDirectories into TestUtils so it can be
used in other test contexts.

Fix a bug in ControllerRegistration.java where the wrong version of the
record was being generated in ControllerRegistration.toRecord.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>, Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>
2023-11-16 16:19:49 -08:00
Colin P. McCabe e3dd60ef3c HOTFIX: fix checkstyle 2023-11-02 11:35:44 -07:00
Colin P. McCabe a672a19e80 MINOR: small optimization for DirectoryId.random
DirectoryId.random doesn't need to instantiate the first 100 IDs to check if an ID is one of them.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Justine Olshan <jolshan@confluent.io>, Proven Provenzano <93720617+pprovenzano@users.noreply.github.com>
2023-11-02 11:29:11 -07:00
Igor Soarez 0390d5b1a2
KAFKA-15355: Message schema changes (#14290)
Reviewers: Christo Lolov <lolovc@amazon.com>, Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>, Ron Dagostino <rdagostino@confluent.io>
2023-11-02 09:46:05 -04:00
Crispin Bernier c8f687ac15
KAFKA-15661: KIP-951: protocol changes (#14627)
Separating out the protocol changes from #14444 in an effort to more quickly unblock the client side PR.

This is the protocol changes to populate the fields in KIP-951. On NOT_LEADER_OR_FOLLOWER errors in both FETCH and PRODUCE the new leader ID and epoch are included in the response. The endpoint for the new leader is retrieved from the metadata cache. The new fields are all optional (tagged) and an IBP bump is required.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client

Reviewers: Justine Olshan <jolshan@confluent.io>, Mayank Shekhar Narula <mayanks.narula@gmail.com>
2023-10-31 17:16:11 -07:00
Igor Soarez 9dbee599f1
MINOR: Rename log dir UUIDs (#14517)
After a late discussion in the voting thread for KIP-858 we
decided to improve the names for the designated reserved
log directory UUID values.

Reviewers: Christo Lolov <lolovc@amazon.com>, Ismael Juma <ismael@juma.me.uk>,  Ziming Deng <dengziming1993@gmail.com>.
2023-10-30 19:10:57 +08:00
Josep Prat eed5e68880
MINOR: Server-Commons cleanup (#14572)
MINOR: Server-Commons cleanup

Fixes Javadoc and minor issues in the Java files of Server-Commons modules.

Javadoc is now formatted as intended by the author of the doc itself.

Signed-off-by: Josep Prat <josep.prat@aiven.io>

Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-10-20 21:04:04 +02:00
Calvin Liu af747fbfed
KAFKA-15581: Introduce ELR (#14312)
This patch introduces preliminary changes for Eligible Leader Replicas (KIP-966)

* New MetadataVersion 16 (3.7-IV1)
* New record versions for PartitionRecord and PartitionChangeRecord
* New tagged fields on PartitionRecord and PartitionChangeRecord
* New static config "eligible.leader.replicas.enable" to gate the whole feature

Reviewers: Artem Livshits <alivshits@confluent.io>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2023-10-19 14:05:15 -04:00
Ismael Juma 4cf86c5d2f
KAFKA-15492: Upgrade and enable spotbugs when building with Java 21 (#14533)
Spotbugs was temporarily disabled as part of KAFKA-15485 to support Kafka build with JDK 21. This PR upgrades the spotbugs version to 4.8.0 which adds support for JDK 21 and enables it's usage on build again.

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-10-12 14:09:10 +02:00
Ritika Reddy bcfc9543d1
MINOR: Move TopicIdPartition class to server-common (#14418)
This patch moves the TopicIdPartition from the metadata module to the server-common module so it can be used by the group-coordinator module as well.

Reviewers: Sagar Rao <sagarmeansocean@gmail.com>, David Jacot <djacot@confluent.io>
2023-09-28 13:55:44 -07:00
Colin Patrick McCabe fcac880fd5
KAFKA-15466: Add KIP-919 support for some admin APIs (#14399)
Add support for --bootstrap-controller in the following command-line tools:
    - kafka-cluster.sh
    - kafka-configs.sh
    - kafka-features.sh
    - kafka-metadata-quorum.sh

To implement this, the following AdminClient APIs now support the new bootstrap.controllers
configuration:
    - Admin.alterConfigs
    - Admin.describeCluster
    - Admin.describeConfigs
    - Admin.describeFeatures
    - Admin.describeMetadataQuorum
    - Admin.incrementalAlterConfigs
    - Admin.updateFeatures

Command-line tool changes:
    - Add CommandLineUtils.initializeBootstrapProperties to handle parsing --bootstrap-controller
      in addition to --bootstrap-server.
    - Add --bootstrap-controller to ConfigCommand.scala, ClusterTool.java, FeatureCommand.java, and
      MetadataQuorumCommand.java.

KafkaAdminClient changes:
    - Add the AdminBootstrapAddresses class to handle extracting bootstrap.servers or
      bootstrap.controllers from the config map for KafkaAdminClient.
    - In AdminMetadataManager, store the new usingBootstrapControllers boolean. Generalize
      authException to encompass the concept of fatal exceptions in general. (For example, the
      fatal exception where we talked to the wrong node type.) Treat
      MismatchedEndpointTypeException and UnsupportedEndpointTypeException as fatal exceptions.
    - Extend NodeProvider to include information about whether bootstrap.controllers is supported.
    - Modify the APIs described above to support bootstrap.controllers.

Server-side changes:
    - Support DescribeConfigsRequest on kcontrollers.
    - Add KRaftMetadataCache to the kcontroller to simplify implemeting describeConfigs (and
      probably more APIs in the future). It's mainly a wrapper around MetadataImage, so there is
      essentially no extra resource consumption.
    - Split RuntimeLoggerManager out of ConfigAdminManager to handle the incrementalAlterConfigs
      support for BROKER_LOGGER. This is now supported on kcontrollers as well as brokers.
    - Fix bug in AuthHelper.computeDescribeClusterResponse that resulted in us always sending back
      BROKER as the endpoint type, even on the kcontroller.

Miscellaneous:
    - Fix a few places in exceptions and log messages where we wrote "broker" instead of "node".
      For example, an exception in NodeApiVersions.java, and a log message in NetworkClient.java.
    - Fix the slf4j log prefix used by KafkaRequestHandler logging so that request handlers on a
      controller don't look like they're on a broker.
    - Make the FinalizedVersionRange constructor public for the sake of a junit test.
    - Add unit and integration tests for the above.

Reviewers: David Arthur <mumrah@gmail.com>, Doguscan Namal <namal.doguscan@gmail.com>
2023-09-26 14:43:42 -07:00
Ismael Juma 98febb989a
KAFKA-15485: Fix "this-escape" compiler warnings introduced by JDK 21 (1/N) (#14427)
This is one of the steps required for kafka to compile with Java 21.

For each case, one of the following fixes were applied:
1. Suppress warning if fixing would potentially result in an incompatible change (for public classes)
2. Add final to one or more methods so that the escape is not possible
3. Replace method calls with direct field access.

In addition, we also fix a couple of compiler warnings related to deprecated references in the `core` module.

See the following for more details regarding the new lint warning:
https://www.oracle.com/java/technologies/javase/21-relnote-issues.html#JDK-8015831

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>, Chris Egerton <chrise@aiven.io>
2023-09-24 05:59:29 -07:00
Colin Patrick McCabe 41b695b6e3
KAFKA-15369: Implement KIP-919: Allow AC to Talk Directly with Controllers (#14306)
Implement KIP-919: Allow AdminClient to Talk Directly with the KRaft Controller Quorum and add
Controller Registration. This KIP adds a new version of DescribeClusterRequest which is supported
by KRaft controllers. It also teaches AdminClient how to use this new DESCRIBE_CLUSTER request to
talk directly with the controller quorum. This is all gated behind a new MetadataVersion,
IBP_3_7_IV0.

In order to share the DESCRIBE_CLUSTER logic between broker and controller, this PR factors it out
into AuthHelper.computeDescribeClusterResponse.

The KIP adds three new errors codes: MISMATCHED_ENDPOINT_TYPE, UNSUPPORTED_ENDPOINT_TYPE, and
UNKNOWN_CONTROLLER_ID. The endpoint type errors can be returned from DescribeClusterRequest

On the controller side, the controllers now try to register themselves with the current active
controller, by sending a CONTROLLER_REGISTRATION request. This, in turn, is converted into a
RegisterControllerRecord by the active controller. ClusterImage, ClusterDelta, and all other
associated classes have been upgraded to propagate the new metadata. In the metadata shell, the
cluster directory now contains both broker and controller subdirectories.

QuorumFeatures previously had a reference to the ApiVersions structure used by the controller's
NetworkClient. Because this PR removes that reference, QuorumFeatures now contains only immutable
data. Specifically, it contains the current node ID, the locally supported features, and the list
of quorum node IDs in the cluster.

Reviewers: David Arthur <mumrah@gmail.com>, Ziming Deng <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>
2023-09-07 15:21:52 -07:00
Mehari Beyene 25b128de81
KAFKA-14991: KIP-937-Improve message timestamp validation (#14135)
This implementation introduces two new configurations `log.message.timestamp.before.max.ms` and `log.message.timestamp.after.max.ms` and deprecates `log.message.timestamp.difference.max.ms`.

The default value for all these three configs is maintained to be Long.MAX_VALUE for backward compatibility but with the newly added configurations we can have a finer control when validating message timestamps that are in the past and the future compared to the broker's timestamp.

To maintain backward compatibility if the default value of `log.message.timestamp.before.max.ms` is not changed, we are assuming users are still using the deprecated config `log.message.timestamp.difference.max.ms` and validation is done using its value. This ensures that existing customers who have customized the value of `log.message.timestamp.difference.max.ms` will continue to see no change in behavior.

Reviewers: Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>
2023-08-24 12:04:55 +02:00
Ron Dagostino 8394ddc0d2
MINOR: Move delegation token support to Metadata Version 3.6-IV2 (#14270)
#14083 added support for delegation tokens in KRaft and attached that support to the existing
MetadataVersion 3.6-IV1. This patch moves that support into a separate MetadataVersion 3.6-IV2.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-08-22 16:04:53 -07:00
David Arthur 418b8a6e59
KAFKA-14538 Metadata transactions in MetadataLoader (#14208)
This PR contains three main changes:

- Support for transactions in MetadataLoader
- Abort in-progress transaction during controller failover
- Utilize transactions for ZK to KRaft migration

A new MetadataBatchLoader class is added to decouple the loading of record batches from the
publishing of metadata in MetadataLoader. Since a transaction can span across multiple batches (or
multiple transactions could exist within one batch), some buffering of metadata updates was needed
before publishing out to the MetadataPublishers. MetadataBatchLoader accumulates changes into a
MetadataDelta, and uses a callback to publish to the publishers when needed.

One small oddity with this approach is that since we can "splitting" batches in some cases, the
number of bytes returned in the LogDeltaManifest has new semantics. The number of bytes included in
a batch is now only included in the last metadata update that is published as a result of a batch.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-08-21 16:02:14 -07:00
Proven Provenzano c2759df067
KAFKA-15219: KRaft support for DelegationTokens (#14083)
Reviewers: David Arthur <mumrah@gmail.com>, Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Viktor Somogyi <viktor.somogyi@cloudera.com>
2023-08-19 14:01:08 -04:00
Colin Patrick McCabe adc16d0f31
KAFKA-14538: Implement KRaft metadata transactions in QuorumController
Implement the QuorumController side of KRaft metadata transactions.

As specified in KIP-868, this PR creates a new metadata version, IBP_3_6_IV1, which contains the
three new records: AbortTransactionRecord, BeginTransactionRecord, EndTransactionRecord.

In order to make offset management unit-testable, this PR moves it out of QuorumController.java and
into OffsetControlManager.java. The general approach here is to track the "last stable offset," which is
calculated by looking at the latest committed offset and the in-progress transaction (if any). When
a transaction is aborted, we revert back to this last stable offset. We also revert back to it when
the controller is transitioning from active to inactive.

In a follow-up PR, we will add support for the transaction records in MetadataLoader. We will also
add support for automatically aborting pending transactions after a controller failover.

Reviewers: David Arthur <mumrah@gmail.com>
2023-08-14 16:58:56 -07:00
Nikolay 1fd58e30cf
KAFKA-14595: Move classes from ReassignPartitionsCommand to tools (#14172)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-08-11 14:52:14 +02:00
Federico Valeri 8de3e0436a
KAFKA-15239: Fix system tests using producer performance service (#14092)
Reviewers: Greg Harris <greg.harris@aiven.io>
2023-08-10 14:23:43 -07:00
Nikolay ddeb89f4a9
KAFKA-14595: Move AdminUtils to server-common (#14096)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-08-09 10:32:45 +02:00
Colin Patrick McCabe 9bc4a2d4d1
KAFKA-15271: Historicalterator can exposes elements that are too new (#14125)
A HistoricalIterator at epoch N is supposed to only reveal elements at epoch N or earlier. However,
due to a bug, we sometimes will reveal elements which are at a newer epoch than N. The bug does
not affect elements that are in the latest epoch (aka topTier). It only affects elements that are
newer than N, but which do not persist until the latest epoch.  This PR fixes the bug and adds a
unit test for this case.

Reviewers: David Arthur <mumrah@gmail.com>
2023-08-08 16:36:59 -07:00
Kamal Chandraprakash d89b26ff44
KAFKA-12969: Add broker level config synonyms for topic level tiered storage configs (#14114)
KAFKA-12969: Add broker level config synonyms for topic level tiered storage configs.

Topic -> Broker Synonym:
local.retention.bytes -> log.local.retention.bytes
local.retention.ms -> log.local.retention.ms

We cannot add synonym for `remote.storage.enable` topic property as it depends on KIP-950

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>
2023-08-03 13:56:00 +05:30
Federico Valeri 1bf73d89d0
KAFKA-15232: Move ToolsUtils to tools (#14066)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-07-21 20:27:44 +02:00
David Jacot 2528dd4116
KAFKA-14499: [2/N] Add OffsetCommit record & related (#14047)
This patch does a few things:
1) It introduces the `OffsetAndMetadata` class which hold the committed offsets in the group coordinator.
2) It adds methods to deal with OffsetCommit records to `RecordHelpers`.
3) It adds `MetadataVersion#offsetCommitValueVersion` to get the version of the OffsetCommit value record that should be used.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, David Arthur <mumrah@gmail.com>, Justine Olshan <jolshan@confluent.io>
2023-07-21 20:09:06 +02:00
Nikolay 4bba2c8a32
KAFKA-14591: Move DeleteRecordsCommand to tools (#13278)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
2023-07-21 17:30:28 +02:00
Omnia G H Ibrahim 0c6b1a4e9a
KAFKA-14737: Move kafka.utils.json to server-common (#13585)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
2023-07-18 11:02:40 +02:00
vamossagar12 fa5b493241
KAFKA-14647: Move TopicFilter to server-common/utils (#13158)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
2023-07-18 10:38:56 +02:00
Justine Olshan ea0bb00126
KAFKA-14884: Include check transaction is still ongoing right before append (take 2) (#13787)
Introduced extra mapping to track verification state.

When verifying, there is a race condition that the add partitions verification response returns that the partition is in the ongoing transaction, but an abort marker is written before we get to append. Therefore, we track any given transaction we are verifying with an object unique to that transaction.

We check this unique state upon the first append to the log. After that, we can rely on currentTransactionFirstOffset. We remove the verification state on appending to the log with a transactional data record or marker.

We will also clean up lingering verification state entries via the producer state entry expiration mechanism. We do not update the the timestamp on retrying a verification for a transaction, so each entry must be verified before producer.id.expiration.ms.

There were a few other fixes:
- Moved the transaction manager handling for failed batch into the future completed exceptionally block to avoid processing it twice (this caused issues in unit tests)
- handle interrupted exceptions encountered when callback thread encountered them
- change handling to throw error if we try to set verification state and leaderLogIfLocal is None.

Reviewers: David Jacot <djacot@confluent.io>, Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>
2023-07-14 15:18:11 -07:00
David Jacot bd1f02b2be
MINOR: Move MockTimer to server-common (#13954)
This patch rewrites MockTimer in Java and moves it from core to server-common. This continues the work started in https://github.com/apache/kafka/pull/13820.

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-07-06 14:56:05 +02:00
David Arthur fc7d912e8b
KAFKA-15109 Ensure the leader epoch bump occurs for older MetadataVersions (#13910)
This fixes a regression introduced by the previous KAFKA-15109 commit (d0457f7360 on trunk).

Reviewers: Colin P. McCabe <cmccabe@apache.org>, José Armando García Sancio <jsancio@apache.org>
2023-06-27 11:49:20 -04:00
Jeff Kim 1dbcb7da9e
KAFKA-14694: RPCProducerIdManager should not wait on new block (#13267)
RPCProducerIdManager initiates an async request to the controller to grab a block of producer IDs and then blocks waiting for a response from the controller.

This is done in the request handler threads while holding a global lock. This means that if many producers are requesting producer IDs and the controller is slow to respond, many threads can get stuck waiting for the lock.

This patch aims to:
* resolve the deadlock scenario mentioned above by not waiting for a new block and returning an error immediately
* remove synchronization usages in RpcProducerIdManager.generateProducerId()
* handle errors returned from generateProducerId() so that KafkaApis does not log unexpected errors
* confirm producers backoff before retrying
* introduce backoff if manager fails to process AllocateProducerIdsResponse

Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>
2023-06-22 10:19:39 -07:00
Divij Vaidya 88e784f7c6
KAFKA-15084: Remove lock contention from RemoteIndexCache (#13850)
Use thread safe Caffeine to cache indexes fetched from RemoteTier locally. This PR removes a lock contention that led to higher fetch latencies as the IO threads spent time unnecessarily waiting on global cache lock while a single thread fetches the index from remote tier. See PR #13850 for details and rejected alternatives.

Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
2023-06-21 18:22:49 +02:00
minjian.cai af678a563d
MINOR: fix typos for server common (#13887)
Reviewers: Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Divij Vaidya <diviv@amazon.com>
2023-06-20 22:56:01 +02:00
Dimitar Dimitrov b100f1efac
KAFKA-15087 Move/rewrite InterBrokerSendThread to server-commons (#13856)
The Java rewrite is kept relatively close to the Scala original
to minimize potential newly introduced bugs and to make reviewing
simpler. The following details might be of note:
- The `Logging` trait moved to InterBrokerSendThread with the
rewrite of ShutdownableThread has been similarly moved to any
subclasses that currently use it. InterBrokerSendThread's own
logging has been made to use ShutdownableThread's logger which
mimics the prefix/log identifier that the trait provided.
- The case RequestAndCompletionHandler class has been made a
separate POJO class and the internal-use UnsentRequests class
has been kept as a static nested class.
- The relatively commonly used but internal (not part of the
public API) clients classes that InterBrokerSendThread relies on
have been allowlisted in the server-common import control.
- The accompanying test class has also been moved and rewritten
with one new test added and most of the pre-existing tests made
stricter.

Reviewers: David Jacot <djacot@confluent.io>
2023-06-20 16:50:46 +02:00
Colin P. McCabe cd3c0ab1a3 KAFKA-15060: fix the ApiVersionManager interface
This PR expands the scope of ApiVersionManager a bit to include returning the current
MetadataVersion and features that are in effect. This is useful in general because that information
needs to be returned in an ApiVersionsResponse. It also allows us to fix the ApiVersionManager
interface so that all subclasses implement all methods of the interface. Having subclasses that
don't implement some methods is dangerous because they could cause exceptions at runtime in
unexpected scenarios.

On the KRaft controller, we were previously performing a read operation in the QuorumController
thread to get the current metadata version and features. With this PR, we now read a volatile
variable maintained by a separate MetadataVersionContextPublisher object. This will improve
performance and simplify the code. It should not change the guarantees we are providing; in both
the old and new scenarios, we need to be robust against version skew scenarios during updates.

Add a Features class which just has a 3-tuple of metadata version, features, and feature epoch.
Remove MetadataCache.FinalizedFeaturesAndEpoch, since it just duplicates the Features class.
(There are some additional feature-related classes that can be consolidated in in a follow-on PR.)

Create a java class, EndpointReadyFutures, for managing the futures associated with individual
authorizer endpoints. This avoids code duplication between ControllerServer and BrokerServer and
makes this code unit-testable.

Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, Luke Chen <showuon@gmail.com>
2023-06-19 16:46:44 -07:00
Joobi S B f4981790c4
KAFKA-15085: Make Timer.java implement AutoCloseable (#13872)
Change Timer.java to implement AutoCloseable because automatic bug finders will flag a warning if an object of a class is marked as AutoCloseable but is not closed properly in the code.

Reviewers:  Divij Vaidya <diviv@amazon.com>
2023-06-19 15:50:30 +02:00
David Jacot 45a279ec70
MINOR: Move Timer/TimingWheel to server-common (#13820)
This patch rewrite `Timer` and the related classes in Java and moves them to `server-common` module. It is basically a one to one rewrite of the Scala code. Note that `MockTimer` is not moved as part of this patch. It will be done separately.

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-06-14 18:21:30 +02:00
David Jacot 7eea2a3908
MINOR: Move MockTime to server-common (#13823)
This patch rewrite `MockTime` in Java and moves it to `server-common` module. This is a prerequisite to move `MockTimer` later on to `server-common` as well. 

Reviewers: David Arthur <mumrah@gmail.com>
2023-06-09 08:54:25 +02:00
José Armando García Sancio 8ad0ed3e61
KAFKA-15021; Skip leader epoch bump on ISR shrink (#13765)
When the KRaft controller removes a replica from the ISR because of the controlled shutdown there is no need for the leader epoch to be increased by the KRaft controller. This is accurate as long as the topic partition leader doesn't add the removed replica back to the ISR.

This change also fixes a bug when computing the HWM. When computing the HWM, replicas that are not eligible to join the ISR but are caught up should not be included in the computation. Otherwise, the HWM will never increase for replica.lag.time.max.ms because the shutting down replica is not sending FETCH request. Without this additional fix PRODUCE requests would timeout if the request timeout is greater than replica.lag.time.max.ms.

Because of the bug above the KRaft controller needs to check the MV to guarantee that all brokers support this bug fix before skipping the leader epoch bump.

Reviewers: David Mao <47232755+splett2@users.noreply.github.com>, Divij Vaidya <diviv@amazon.com>, David Jacot <djacot@confluent.io>
2023-06-07 07:20:40 -07:00
Colin Patrick McCabe b74204fa0a
KAFKA-14996: Handle overly large user operations on the kcontroller (#13742)
Previously, if a user tried to perform an overly large batch operation on the KRaft controller
(such as creating a million topics), we would create a very large number of records in memory. Our
attempt to write these records to the Raft layer would fail, because there were too many to fit in
an atomic batch. This failure, in turn, would trigger a controller failover.

(Note: I am assuming here that no topic creation policy was in place that would prevent the
creation of a million topics. I am also assuming that the user operation must be done atomically,
which is true for all current user operations, since we have not implemented KIP-868 yet.)

With this PR, we fail immediately when the number of records we have generated exceeds the
threshold that we can apply. This failure does not generate a controller failover. We also now
fail with a PolicyViolationException rather than an UnknownServerException.

In order to implement this in a simple way, this PR adds the BoundedList class, which wraps any
list and adds a maximum length. Attempts to grow the list beyond this length cause an exception to
be thrown.

Reviewers: David Arthur <mumrah@gmail.com>, Ismael Juma <ijuma@apache.org>, Divij Vaidya <diviv@amazon.com>
2023-05-26 13:16:17 -07:00
Jeff Kim c98c1ed41c
KAFKA-14500; [3/N] add GroupMetadataKey/Value record helpers (#13704)
This path enables the new group metadata manager to generate GroupMetadataKey/Value records.

Reviewers: David Jacot <djacot@confluent.io>
2023-05-23 10:42:13 +02:00
Colin P. McCabe 63f9f23ec0 MINOR: improve QuorumController logging #13540
When creating the QuorumController, log whether ZK migration is enabled.

When applying a feature level record which sets the metadata version, log the metadata version enum
rather than the numeric feature level.

Improve the logging when we replay snapshots in QuorumController. Log both the beginning and the
end of replay.

When TRACE is enabled, log every record that is replayed in QuorumController. Since some records
may contain sensitive information, create RecordRedactor to assist in logging only what is safe to
put in the log4j file.

Add logging to ControllerPurgatory. Successful completions are logged at DEBUG; failures are logged
at INFO, and additions are logged at TRACE.

Remove SnapshotReason.java, SnapshotReasonTest.java, and
QuorumController#generateSnapshotScheduled. They are deadcode now that snapshot generation moved to
org.apache.kafka.image.publisher.SnapshotGenerator.

Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>
2023-05-04 11:18:03 -07:00
Luke Chen b620c03ccf
KAFKA-14946: fix NPE when merging the deltatable (#13653)
Fix NPE while merging the deltatable. Because it's possible that hashTier is
not null but deltatable is null (ex: removing data), we should have null check
while merging for deltatable like other places did. Also added tests that will
fail without this change.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-05-03 10:08:25 -07:00
Luke Chen 21af1918ea
MINOR: Add reason to exceptions in QuorumController (#13648)
Saw this error message in log:

ERROR [QuorumController id=1] writeNoOpRecord: unable to start processing because of RejectedExecutionException. Reason: null (org.apache.kafka.controller.QuorumController)

The null reason is not helpful with only RejectedExecutionException. Adding the reason to it.

Reviewers: David Arthur <mumrah@gmail.com>, Divij Vaidya <diviv@amazon.com>, Manyanda Chitimbo <manyanda.chitimbo@gmail.com>
2023-05-02 09:54:12 +08:00
David Jacot 2d0b816150
MINOR: Move `ControllerPurgatory` to `server-common` (#13555)
This patch renames from `ControllerPurgatory` to `DeferredEventQueue` and moves it from the `metadata` module to `server-common` module.

Reviewers: Alexandre Dupriez <alexandre.dupriez@gmail.com>, Ziming Deng <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@apache.org>
2023-04-21 11:19:04 +02:00
Purshotam Chauhan df13775254
KAFKA-14828: Remove R/W locks using persistent data structures (#13437)
Currently, StandardAuthorizer uses a R/W lock for maintaining the consistency of data. For the clusters with very high traffic, we will typically see an increase in latencies whenever a write operation comes. The intent of this PR is to get rid of the R/W lock with the help of immutable or persistent collections. Basically, new object references are used to hold the intermediate state of the write operation. After the completion of the operation, the main reference to the cache is changed to point to the new object. Also, for the read operation, the code is changed such that all accesses to the cache for a single read operation are done to a particular cache object only.

In the PR description, you can find the performance of various libraries at the time of both read and write. Read performance is checked with the existing AuthorizerBenchmark. For write performance, a new AuthorizerUpdateBenchmark has been added which evaluates the performance of the addAcl operation.


Reviewers:  Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>,  Divij Vaidya <diviv@amazon.com>
2023-04-21 14:08:23 +05:30
Proven Provenzano abca86511e
KAFKA-14881: Rework UserScramCredentialRecord (#13513)
Rework UserScramCredentialRecord to store serverKey and StoredKey rather than saltedPassword. This
is necessary to support migration from ZK, since those are the fields we stored in ZK.  Update
latest MetadataVersion to IBP_3_5_IV2 and make SCRAM support conditional on this version.  Moved
ScramCredentialData.java from org.apache.kafka.image to org.apache.kafka.metadata, which seems more
appropriate.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-04-18 09:41:38 -07:00
Ron Dagostino e27926f92b
KAFKA-14735: Improve KRaft metadata image change performance at high … (#13280)
topic counts.

Introduces the use of persistent data structures in the KRaft metadata image to avoid copying the entire TopicsImage upon every change.  Performance that was O(<number of topics in the cluster>) is now O(<number of topics changing>), which has dramatic time and GC improvements for the most common topic-related metadata events.  We abstract away the chosen underlying persistent collection library via ImmutableMap<> and ImmutableSet<> interfaces and static factory methods.

Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>, Purshotam Chauhan <pchauhan@confluent.io>
2023-04-17 17:52:28 -04:00
Colin Patrick McCabe cfd0503006
MINOR: fix some flaky KRaft-related tests (#13543) (#13543)
In SharedServer, fix some cases where a volatile variable could change to null while we were using
it, during shutdown. This is mainly a junit test issue, although it could also cause ugly error
messages during shutdown when running the server in a production context.

Fix a race in KafkaEventQueueTest.testSize.

Reviewers: David Arthur <mumrah@gmail.com>
2023-04-14 13:39:08 -04:00
Satish Duggana e99984248d
KAFKA-9550 Copying log segments to tiered storage in RemoteLogManager (#13487)
Added functionality to copy log segments, indexes to the target remote storage for each topic partition enabled with tiered storage. This involves creating scheduled tasks for all leader partition replicas to copy their log segments in sequence to tiered storage.

Reviewers: Jun Rao <junrao@gmail.com>, Luke Chen <showuon@gmail.com>
2023-04-12 13:55:36 +08:00
Chia-Ping Tsai 3bbff167fa
MINOR: fix invalid usage in java docs (#13506)
Reviewers: Luke Chen <showuon@gmail.com>
2023-04-06 16:01:14 +08:00
Luke Chen 31f9a54cba
KAFKA-14850: introduce InMemoryLeaderEpochCheckpoint (#13456)
The motivation for introducing InMemoryLeaderEpochCheckpoint is to allow remote log manager to create the RemoteLogSegmentMetadata(RLSM) with the correct leader epoch info for a specific segment. To do that, we need to rely on the LeaderEpochCheckpointCache to truncate from start and end, to get the epoch info. However, we don't really want to truncate the epochs in cache (and write to checkpoint file in the end). So, we introduce this InMemoryLeaderEpochCheckpoint to feed into LeaderEpochCheckpointCache, and when we truncate the epoch for RLSM, we can do them in memory without affecting the checkpoint file, and without interacting with file system.

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>
2023-04-05 20:11:32 +08:00
Colin Patrick McCabe 09e59bc776
KAFKA-14857: Fix some MetadataLoader bugs (#13462)
The MetadataLoader is not supposed to publish metadata updates until we have loaded up to the high
water mark. Previously, this logic was broken, and we published updates immediately. This PR fixes
that and adds a junit test.

Another issue is that the MetadataLoader previously assumed that we would periodically get
callbacks from the Raft layer even if nothing had happened. We relied on this to install new
publishers in a timely fashion, for example. However, in older MetadataVersions that don't include
NoOpRecord, this is not a safe assumption.

Aside from the above changes, also fix a deadlock in SnapshotGeneratorTest, fix the log prefix for
BrokerLifecycleManager, and remove metadata publishers on brokerserver shutdown (like we do for
controllers).

Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>
2023-03-29 12:30:12 -07:00
Colin Patrick McCabe ddd652c672
MINOR: Standardize KRaft logging, thread names, and terminology (#13390)
Standardize KRaft thread names.

- Always use kebab case. That is, "my-thread-name".

- Thread prefixes are just strings, not Option[String] or Optional<String>.
  If you don't want a prefix, use the empty string.

- Thread prefixes end in a dash (except the empty prefix). Then you can
  calculate thread names as $prefix + "my-thread-name"

- Broker-only components get "broker-$id-" as a thread name prefix. For example, "broker-1-"

- Controller-only components get "controller-$id-" as a thread name prefix. For example, "controller-1-"

- Shared components get "kafka-$id-" as a thread name prefix. For example, "kafka-0-"

- Always pass a prefix to KafkaEventQueue, so that threads have names like
  "broker-0-metadata-loader-event-handler" rather than "event-handler". Prior to this PR, we had
  several threads just named "EventHandler" which was not helpful for debugging.

- QuorumController thread name is "quorum-controller-123-event-handler"

- Don't set a thread prefix for replication threads started by ReplicaManager. They run only on the
  broker, and already include the broker ID.

Standardize KRaft slf4j log prefixes.

- Names should be of the form "[ComponentName id=$id] ". So for a ControllerServer with ID 123, we
  will have "[ControllerServer id=123] "

- For the QuorumController class, use the prefix "[QuorumController id=$id] " rather than
  "[Controller <nodeId] ", to make it clearer that this is a KRaft controller.

- In BrokerLifecycleManager, add isZkBroker=true to the log prefix for the migration case.

Standardize KRaft terminology.

- All synonyms of combined mode (colocated, coresident, etc.) should be replaced by "combined"

- All synonyms of isolated mode (remote, non-colocated, distributed, etc.) should be replaced by
  "isolated".
2023-03-16 15:33:03 -07:00
Calvin Liu 79b5f7f1ce
KAFKA-14617: Add ReplicaState to FetchRequest (KIP-903) (#13323)
This patch is the first part of KIP-903. It updates the FetchRequest to include the new tagged ReplicaState field which replaces the now deprecated ReplicaId field. The FetchRequest version is bumped to version 15 and the MetadataVersion to 3.5-IV1.

Reviewers: David Jacot <djacot@confluent.io>
2023-03-16 14:04:34 +01:00
Colin Patrick McCabe aaa976a340
MINOR: Some metadata publishing fixes and refactors (#13337)
This PR refactors MetadataPublisher's interface a bit. There is now an onControllerChange
callback. This is something that some publishers might want. A good example is ZkMigrationClient.
Instead of two different publish functions (one for snapshots, one for log deltas), we now have a single onMetadataUpdate function. Most publishers didn't want to do anything different in those two cases.
The ones that do want to do something different for snapshots can always check the manifest type.
The close function now has a default empty implementation, since most publishers didn't need to do
anything there.

Move the SCRAM logic out of BrokerMetadataPublisher and run it on the controller as well.

On the broker, simply use dynamicClientQuotaPublisher to handle dynamic client quotas changes.
That is what the controller already does, and the code is exactly the same in both cases.

Fix the logging in FutureUtils.waitWithLogging a bit. Previously, when invoked from BrokerServer
or ControllerServer, it did not include the standard "[Controller 123] " style prefix indicating server
name and ID. This was confusing, especially when debugging junit tests.

Reviewers: Ron Dagostino <rdagostino@confluent.io>, David Arthur <mumrah@gmail.com>
2023-03-09 14:52:40 -08:00
Ivan Yurchenko e28e0bf0f2
KAFKA-14524: Rewrite KafkaMetricsGroup in Java (#13067)
* KAFKA-14524: Rewrite KafkaMetricsGroup in Java

Instead of being a base trait for classes, `KafkaMetricsGroup` is now an independent object. User classes could override methods in it to adjust its behavior like they used to with the trait model.

Some classes were extending the `KafkaMetricsGroup` trait, but it wasn't actually used.

Reviewers: Ismael Juma <ismael@juma.me.uk>, lbownik <lukasz.bownik@gmail.com>, Satish Duggana <satishd@pache.org>
2023-03-08 15:59:51 +05:30
David Jacot 6d37b0f07f
KAFKA-14462; [2/N] Add ConsumerGroupHeartbeart to GroupCoordinator interface (#13329)
This patch adds ConsumerGroupHeartbeat to the GroupCoordinator interface and implements the API in KafkaApis.

Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>
2023-03-07 09:20:03 +01:00
Christo Lolov 5b295293c0
MINOR: Remove unnecessary toString(); fix comment references (#13212)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>, Lucas Brutschy <lbrutschy@confluent.io>
2023-03-06 18:39:04 +01:00
Proven Provenzano 38c409cf33
KAFKA-14084: SCRAM support in KRaft. (#13114)
This commit adds support to store the SCRAM credentials in a cluster with KRaft quorum servers and
no ZK cluster backing the metadata. This includes creating ScramControlManager in the controller,
and adding support for SCRAM to MetadataImage and MetadataDelta.

Change UserScramCredentialRecord to contain only a single tuple (name, mechanism, salt, pw, iter)
rather than a mapping between name and a list. This will avoid creating an excessively large record
if a single user has many entries. Because record ID 11 (UserScramCredentialRecord) has not been
used before, this is a compatible change. SCRAM will be supported in 3.5-IV0 and later.

This commit does not include KIP-900 SCRAM bootstrapping support, or updating the credential cache
on the controller (as opposed to broker). We will implement these in follow-on commits.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2023-03-03 10:23:34 -08:00
Kowshik Prakasam 9f55945270
MINOR: Introduce OffsetAndEpoch in LeaderEndpoint interface return values (#13268)
Reviewers: Satish Duggana <satishd@apache.org>, Alexandre Dupriez <alexandre.dupriez@gmail.com>, Jun Rao <junrao@gmail.com>
2023-02-23 17:29:32 -08:00
Satish Duggana 322ac86ba2
KAFKA-14706: Move/rewrite ShutdownableThread to server-common module. (#13234)
Move/rewrite ShutdownableThread to server-common module.

Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2023-02-17 11:51:17 +08:00
Christo Lolov ba0c5b0902
MINOR: Simplify JUnit assertions in tests; remove accidental unnecessary code in tests (#13219)
* assertEquals called on array
* Method is identical to its super method
* Simplifiable assertions
* Unused imports

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>
2023-02-16 16:13:31 +01:00
José Armando García Sancio 10164a6d2e
KAFKA-14693; Kafka node should halt instead of exit (#13227)
Extend the implementation of ProcessTerminatingFaultHandler to support calling either Exit.halt or Exit.exit. Change the fault handler used by the Controller thread and the KRaft thread to use a halting fault handler.

Those threads cannot call Exit.exit because Runtime.exit joins on the default shutdown hook thread. The shutdown hook thread joins on the controller and kraft thread terminating. This causes a deadlock.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
2023-02-14 09:53:38 -08:00
David Arthur cb4d9d1abf
KAFKA-14668 Avoid unnecessary UMR during ZK migration (#13183)
Only send UMR to ZK brokers if the cluster metadata or topic metadata has changed.

Reviewers: Akhilesh C <akhileshchg@users.noreply.github.com>, Colin P. McCabe <cmccabe@apache.org>
2023-02-09 13:24:02 -05:00
Christo Lolov a0a9b6ffea
MINOR: Remove unnecessary code (#13210)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>
2023-02-07 17:37:45 +01:00
Ron Dagostino 6d11261d5d
MINOR: IBP_3_4_IV1 should be IBP_3_5_IV0 because it is not in 3.4 (#13198)
The KIP-405 MetadataVersion changes will be released as part of AK 3.5, but were added as BP_3_4_IV1.
This change fixes them to be IBP_3_5_IV0. There is no incompatibility  because this feature has not yet
been released. Also set didMetadataChange to false because KRaft metadata log records did not change.

Reviewers: Satish Duggana <satishd@apache.org>, Christo Lolov <christo_lolov@yahoo.com>, Colin P. McCabe <cmccabe@apache.org>
2023-02-06 10:37:50 -08:00
Colin Patrick McCabe 6625214c52
KAFKA-14658: Do not open broker ports until we are ready to accept traffic (#13169)
When we are listening on fixed ports, we should defer opening ports until we're ready to accept
traffic. If we open the broker port too early, it can confuse monitoring and deployment systems.
This is a particular concern when in KRaft mode, since in that mode, we create the SocketServer
object earlier in the startup process than when in ZK mode.

The approach taken in this PR is to defer opening the acceptor port until Acceptor.start is called.
Note that when we are listening on a random port, we continue to open the port "early," in the
SocketServer constructor. The reason for doing this is that there is no other way to find the
random port number the kernel has selected. Since random port assignment is not used in production
deployments, this should be reasonable.

FutureUtils.java: add chainFuture and tests.

SocketServerTest.scala: add timeouts to cases where we call get() on futures.

Reviewers: David Arthur <mumrah@gmail.com>, Alexandre Dupriez <hangleton@users.noreply.github.com>
2023-02-01 09:42:03 -08:00
Colin Patrick McCabe eb7d5cbf15
MINOR: add startup timeouts to KRaft integration tests (#13153)
When running junit tests, it is not good to block forever on CompletableFuture objects.  When there
are bugs, this can lead to junit tests hanging forever. Jenkins does not deal with this well -- it
often brings down the whole multi-hour test run.  Therefore, when running integration tests in
JUnit, set some reasonable time limits on broker and controller startup time.

Reviewers: Jason Gustafson <jason@confluent.io>
2023-01-30 11:29:30 -08:00
Federico Valeri 72cfc994f5
KAFKA-14628: Move CommandLineUtils and CommandDefaultOptions to tools (#13131)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Christo Lolov <christololov@gmail.com>, Sagar Rao <sagarmeansocean@gmail.com>
2023-01-26 20:06:09 +01:00
Colin Patrick McCabe 8478bbb589
KAFKA-14601: Improve exception handling in KafkaEventQueue #13089
If KafkaEventQueue gets an InterruptedException while waiting for a condition variable, it
currently exits immediately. Instead, it should complete the remaining events exceptionally and
then execute the cleanup event. This will allow us to finish any necessary cleanup steps.

In order to do this, we require the cleanup event to be provided when the queue is contructed,
rather than when it's being shut down.

Also, handle cases where Event#handleException itself throws an exception.

Remove timed shutdown from the event queue code since nobody was using it, and it adds complexity.

Add server-common/src/test/resources/test/log4j.properties since this gradle module somehow avoided
having a test log4j.properties up to this point.

Reviewers: David Arthur <mumrah@gmail.com>
2023-01-12 10:03:14 -08:00
Ismael Juma 8ac644d2b1
KAFKA-14607: Move Scheduler/KafkaScheduler to server-common (#13092)
There were some concurrency inconsistencies in `KafkaScheduler` flagged by spotBugs
that had to be fixed, summary of changes below:
* Executor is `volatile`
* We always synchronize and check `isStarted` as the first thing within the critical
   section when a mutating operation is performed.
* We don't synchronize (but ensure the executor is not null in a safe way) in read-only
   operations that operate on the executor.

With regards to `MockScheduler/MockTask`:
* Set the type of `nextExecution` to `AtomicLong` and replaced inconsistent synchronization
* Extracted logic into `MockTask.rescheduleIfPeriodic`

Tweaked the `Scheduler` interface a bit:
* Removed `unit` parameter since we always used `ms` except one invocation
* Introduced a couple of `scheduleOnce` overloads to replace the usage of default
   arguments in Scala
* Pulled up `resizeThreadPool` to the interface and removed `isStarted` from the
  interface.

Other cleanups:
* Removed spotBugs exclusion affecting `kafka.log.LogConfig`, which no longer exists.

For broader context, see:
* KAFKA-14470: Move log layer to storage module

Reviewers: Jun Rao <junrao@gmail.com>
2023-01-10 23:51:58 -08:00
Ismael Juma 96d9710c17
KAFKA-14478: Move LogConfig/CleanerConfig and related to storage module (#13049)
Additional notable changes to fix multiple dependency ordering issues:

* Moved `ConfigSynonym` to `server-common`
* Moved synonyms from `LogConfig` to `ServerTopicConfigSynonyms `
* Removed `LogConfigDef` `define` overrides and rely on
   `ServerTopicConfigSynonyms` instead.
* Moved `LogConfig.extractLogConfigMap` to `KafkaConfig`
* Consolidated relevant defaults from `KafkaConfig`/`LogConfig` in the latter
* Consolidate relevant config name definitions in `TopicConfig`
* Move `ThrottledReplicaListValidator` to `storage`

Reviewers: Satish Duggana <satishd@apache.org>, Mickael Maison <mickael.maison@gmail.com>
2023-01-04 02:42:52 -08:00
Josep Prat 5f1810209f
MINOR: Fix small warning on javadoc and scaladoc (#11049)
Escape the `>` character in javadoc
Escape the `$` character when part of `${}` in scaladoc as this is the way to reference a variable

Reviewers: Matthias J. Sax <matthias@confluent.io>
2022-12-28 13:41:45 -08:00
Ismael Juma 7b634c7034
KAFKA-14521: Replace BrokerCompressionCodec with BrokerCompressionType (#13011)
This is a requirement for:

* KAFKA-14477: Move LogValidator to storage module.

For broader context on this change, please check:

* KAFKA-14470: Move log layer to storage module

Reviewers: dengziming <dengziming1993@gmail.com>
2022-12-20 11:53:49 -08:00
José Armando García Sancio 44b3177a08
KAFKA-14457; Controller metrics should only expose committed data (#12994)
The controller metrics in the controllers has three problems. 1) the active controller exposes uncommitted data in the metrics. 2) the active controller doesn't update the metrics when the uncommitted data gets aborted. 3) the controller doesn't update the metrics when the entire state gets reset.

We fix these issues by only updating the metrics when processing committed metadata records and reset the metrics when the metadata state is reset.

This change adds a new type `ControllerMetricsManager` which processes committed metadata records and updates the metrics accordingly. This change also removes metrics updating responsibilities from the rest of the controller managers. 

Reviewers: Ron Dagostino <rdagostino@confluent.io>
2022-12-20 10:55:14 -08:00
Satish Duggana 7146ac57ba
[KAFKA-13369] Follower fetch protocol changes for tiered storage. (#11390)
This PR implements the follower fetch protocol as mentioned in KIP-405.

Added a new version for ListOffsets protocol to receive local log start offset on the leader replica. This is used by follower replicas to find the local log star offset on the leader.

Added a new version for FetchRequest protocol to receive OffsetMovedToTieredStorageException error. This is part of the enhanced fetch protocol as described in KIP-405.

We introduced a new field locaLogStartOffset to maintain the log start offset in the local logs. Existing logStartOffset will continue to be the log start offset of the effective log that includes the segments in remote storage.

When a follower receives OffsetMovedToTieredStorage, then it tries to build the required state from the leader and remote storage so that it can be ready to move to fetch state.

Introduced RemoteLogManager which is responsible for

initializing RemoteStorageManager and RemoteLogMetadataManager instances.
receives any leader and follower replica events and partition stop events and act on them
also provides APIs to fetch indexes, metadata about remote log segments.
Followup PRs will add more functionality like copying segments to tiered storage, retention checks to clean local and remote log segments. This will change the local log start offset and make sure the follower fetch protocol works fine for several cases.

You can look at the detailed protocol changes in KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-FollowerReplication

Co-authors: satishd@apache.org, kamal.chandraprakash@gmail.com, yingz@uber.com

Reviewers: Kowshik Prakasam <kprakasam@confluent.io>, Cong Ding <cong@ccding.com>, Tirtha Chatterjee <tirtha.p.chatterjee@gmail.com>, Yaodong Yang <yangyaodong88@gmail.com>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Jun Rao <junrao@gmail.com>
2022-12-17 09:36:44 -08:00
Daniel Scanteianu e3585a4cd5
MINOR: Document Offset and Partition 0-indexing, fix typo (#12753)
Add comments to clarify that both offsets and partitions are 0-indexed, and fix a minor typo. Clarify which offset will be retrieved by poll() after seek() is used in various circumstances. Also added integration tests.

Reviewers: Luke Chen <showuon@gmail.com>
2022-12-16 17:12:40 +08:00
Akhilesh C 8b045dcbf6
KAFKA-14446: API forwarding support from zkBrokers to the Controller (#12961)
This PR enables brokers which are upgrading from ZK mode to KRaft mode to forward certain metadata
change requests to the controller instead of applying them directly through ZK. To faciliate this,
we now support EnvelopeRequest on zkBrokers (instead of only on KRaft nodes.)

In BrokerToControllerChannelManager, we can now reinitialize our NetworkClient. This is needed to
handle the case when we transition from forwarding requests to a ZK-based broker over the
inter-broker listener, to forwarding requests to a quorum node over the controller listener.

In MetadataCache.scala, distinguish between KRaft and ZK controller nodes with a new type,
CachedControllerId.

In LeaderAndIsrRequest, StopReplicaRequest, and UpdateMetadataRequest, switch from sending both a
zk and a KRaft controller ID to sending a single controller ID plus a boolean to express whether it
is KRaft. The previous scheme was ambiguous as to whether the system was in KRaft or ZK mode when
both IDs were -1 (although this case is unlikely to come up in practice). The new scheme avoids
this ambiguity and is simpler to understand.

Reviewers: dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2022-12-15 14:16:41 -08:00
David Arthur 67c72596af
KAFKA-14448 Let ZK brokers register with KRaft controller (#12965)
Prior to starting a KIP-866 migration, the ZK brokers must register themselves with the active
KRaft controller. The controller waits for all brokers to register in order to verify that all the
brokers can

A) Communicate with the quorum
B) Have the migration config enabled
C) Have the proper IBP set

This patch uses the new isMigratingZkBroker field in BrokerRegistrationRequest and
RegisterBrokerRecord. The type was changed from int8 to bool for BrokerRegistrationRequest (a
mistake from #12860). The ZK brokers use the existing BrokerLifecycleManager class to register and
heartbeat with the controllers.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2022-12-13 13:15:21 -08:00
Ismael Juma 88725669e7
MINOR: Move MetadataQuorumCommand from `core` to `tools` (#12951)
`core` should only be  used for legacy cli tools and tools that require
access to `core` classes instead of communicating via the kafka protocol
(typically by using the client classes).

Summary of changes:
1. Convert the command implementation and tests to Java and move it to
    the `tools` module.
2. Introduce mechanism to capture stdout and stderr from tests.
3. Change `kafka-metadata-quorum.sh` to point to the new command class.
4. Adjusted the test classpath of the `tools` module so that it supports tests
    that rely on the `@ClusterTests` annotation.
5. Improved error handling when an exception different from `TerseFailure` is
    thrown.
6. Changed `ToolsUtils` to avoid usage of arrays in favor of `List`.

Reviewers: dengziming <dengziming1993@gmail.com>
2022-12-09 09:22:58 -08:00
David Arthur 7b7e40a536
KAFKA-14304 Add RPC changes, records, and config from KIP-866 (#12928)
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>
2022-12-02 19:59:52 -05:00
Colin Patrick McCabe 5514f372b3
MINOR: extract jointly owned parts of BrokerServer and ControllerServer (#12837)
Extract jointly owned parts of BrokerServer and ControllerServer into SharedServer. Shut down
SharedServer when the last component using it shuts down. But make sure to stop the raft manager
before closing the ControllerServer's sockets.

This PR also fixes a memory leak where ReplicaManager was not removing some topic metric callbacks
during shutdown. Finally, we now release memory from the BatchMemoryPool in KafkaRaftClient#close.
These changes should reduce memory consumption while running junit tests.

Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
2022-12-02 00:27:22 -08:00
Colin Patrick McCabe a3f5eb6e35
MINOR: Implement EventQueue#size and EventQueue#empty (#12930)
Implement functions to measure the number of events in the event queue.

Reviewers: David Arthur <mumrah@gmail.com>
2022-12-01 09:04:04 -08:00
David Jacot bc780c7c32
MINOR: Move timeline data structures from metadata to server-common (#12811)
This path moves the timeline data structures from metadata module to server-common module as those will be used in the new group coordinator.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Colin Patrick McCabe <cmccabe@apache.org>
2022-11-04 08:52:32 +01:00
Colin Patrick McCabe dac81161db
MINOR; Introduce ImageWriter and ImageWriterOptions (#12715)
This PR adds a new ImageWriter interface which replaces the generic Consumer interface which
accepted lists of records. It is better to do batching in the ImageWriter than to try to deal with
that complexity in the MetadataImage#write functions, especially since batching is not semantically
meaningful in KRaft snapshots. The new ImageWriter interface also supports freeze and close, which
more closely matches the semantics of the underlying Raft classes.

The PR also adds an ImageWriterOptions class which we can use to pass parameters to control how the
new image is written. Right now, the parameters that we are interested in are the target metadata
version (which may be more or less than the original image's version) and a handler function which
is invoked whenever metadata is lost due to the target version.

Convert over the MetadataImage#write function (and associated functions) to use the new ImageWriter
and ImageWriterOptions. In particular, we now have a way to handle metadata losses by invoking
ImageWriterOptions#handleLoss. This allows us to handle writing an image at a lower version, for
the first time. This support is still not enabled externally by this PR, though. That will come in
a future PR.

Get rid of the use of SOME_RECORD_TYPE.highestSupportedVersion() in several places. In general, we
do not want to "silently" change the version of a record that we output, just because a new version
was added. We should be explicit about what record version numbers we are outputting.

Implement ProducerIdsDelta#toString, to make debug logs look better.

Move MockRandom to the server-common package so that other internal broker packages can use it.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2022-10-13 09:56:19 -07:00
Colin Patrick McCabe f0f918b242
KAFKA-14177: Correctly support older kraft versions without FeatureLevelRecord (#12513)
The main changes here are ensuring that we always have a metadata.version record in the log, making
˘sure that the bootstrap file can be used for records other than the metadata.version record (for
example, we will want to put SCRAM initialization records there), and fixing some bugs.

If no feature level record is in the log and the IBP is less than 3.3IV0, then we assume the minimum KRaft
version for all records in the log.

Fix some issues related to initializing new clusters. If there are no records in the log at all,
then insert the bootstrap records in a single batch. If there are records, but no metadata version,
process the existing records as though they were metadata.version 3.3IV0 and then append a metadata
version record setting version 3.3IV0.  Previously, we were not clearly distinguishing between the
case where the metadata log was empty, and the case where we just needed to add a metadata.version
record.

Refactor BootstrapMetadata into an immutable class which contains a 3-tuple of metadata version,
record list, and source. The source field is used to log where the bootstrap metadata was obtained
from. This could be a bootstrap file, the static configuration, or just the software defaults.
Move the logic for reading and writing bootstrap files into BootstrapDirectory.java.

Add LogReplayTracker, which tracks whether the log is empty.

Fix a bug in FeatureControlManager where it was possible to use a "downgrade" operation to
transition to a newer version. Do not store whether we have seen a metadata version or not in
FeatureControlManager, since that is now handled by LogReplayTracker.

Introduce BatchFileReader, which is a simple way of reading a file containing batches of snapshots
that does not require spawning a thread. Rename SnapshotFileWriter to BatchFileWriter to be
consistent, and to reflect the fact that bootstrap files aren't snapshots.

QuorumController#processBrokerHeartbeat: add an explanatory comment.

Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>
2022-08-25 18:12:31 -07:00
dengziming 150fd5b0b1
KAFKA-13914: Add command line tool kafka-metadata-quorum.sh (#12469)
Add `MetadataQuorumCommand` to describe quorum status, I'm trying to use arg4j style command format, currently, we only support one sub-command which is "describe" and we can specify 2 arguments which are --status and --replication.

```
# describe quorum status
kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --replication

ReplicaId	LogEndOffset	Lag	LastFetchTimeMs	LastCaughtUpTimeMs	Status  	
0        	10          	        0  	-1             	        -1                	                 Leader  	
1        	10          	        0  	-1             	        -1                	                 Follower	
2        	10          	        0  	-1             	        -1                	                 Follower	

kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --status
ClusterId:                             fMCL8kv1SWm87L_Md-I2hg
LeaderId:                             3002
LeaderEpoch:                      2
HighWatermark:                  10
MaxFollowerLag:                 0
MaxFollowerLagTimeMs:   -1
CurrentVoters:                    [3000,3001,3002]
CurrentObservers:              [0,1,2]

# specify AdminClient properties
kafka-metadata-quorum.sh --bootstrap-server localhost:9092 --command-config config.properties describe --status
```

Reviewers: Jason Gustafson <jason@confluent.io>
2022-08-20 08:37:26 -07:00
Niket Goel ac64693434 KAFKA-14114: Add Metadata Error Related Metrics
This PR adds in 3 metrics as described in KIP-859:
 kafka.server:type=broker-metadata-metrics,name=metadata-load-error-count
 kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count
 kafka.controller:type=KafkaController,name=MetadataErrorCount

These metrics are incremented by fault handlers when the appropriate fault happens. Broker-side
load errors happen in BrokerMetadataListener. Broker-side apply errors happen in the
BrokerMetadataPublisher. The metric on the controller is incremented when the standby controller
(not active) encounters a metadata error.

In BrokerMetadataPublisher, try to limit the damage caused by an exception by introducing more
catch blocks. The only fatal failures here are those that happen during initialization, when we
initialize the manager objects (these would also be fatal in ZK mode).

In BrokerMetadataListener, try to improve the logging of faults, especially ones that happen when
replaying a snapshot. Try to limit the damage caused by an exception.

Replace MetadataFaultHandler with LoggingFaultHandler, which is more flexible and takes a Runnable
argument. Add LoggingFaultHandlerTest.

Make QuorumControllerMetricsTest stricter. Fix a bug where we weren't cleaning up some metrics from
the yammer registry on close in QuorumControllerMetrics.

Co-author: Colin P. McCabe <cmccabe@apache.org>
2022-08-09 15:22:15 -07:00
Colin Patrick McCabe 555744da70
KAFKA-14124: improve quorum controller fault handling (#12447)
Before trying to commit a batch of records to the __cluster_metadata log, the active controller
should try to apply them to its current in-memory state. If this application process fails, the
active controller process should exit, allowing another node to take leadership. This will prevent
most bad metadata records from ending up in the log and help to surface errors during testing.

Similarly, if the active controller attempts to renounce leadership, and the renunciation process
itself fails, the process should exit. This will help avoid bugs where the active controller
continues in an undefined state.

In contrast, standby controllers that experience metadata application errors should continue on, in
order to avoid a scenario where a bad record brings down the whole controller cluster.  The
intended effect of these changes is to make it harder to commit a bad record to the metadata log,
but to continue to ride out the bad record as well as possible if such a record does get committed.

This PR introduces the FaultHandler interface to implement these concepts. In junit tests, we use a
FaultHandler implementation which does not exit the process. This allows us to avoid terminating
the gradle test runner, which would be very disruptive. It also allows us to ensure that the test
surfaces these exceptions, which we previously were not doing (the mock fault handler stores the
exception).

In addition to the above, this PR fixes a bug where RaftClient#resign was not being called from the
renounce() function. This bug could have resulted in the raft layer not being informed of an active
controller resigning.

Reviewers: David Arthur <mumrah@gmail.com>
2022-08-04 22:49:45 -07:00
David Arthur cc384054c6
KAFKA-13935 Fix static usages of IBP in KRaft mode (#12250)
* Set the minimum supported MetadataVersion to 3.0-IV1
* Remove MetadataVersion.UNINITIALIZED
* Relocate RPC version mapping for fetch protocols into MetadataVersion
* Replace static IBP calls with dynamic calls to MetadataCache

A side effect of removing the UNINITIALIZED metadata version is that the FeatureControlManager and FeatureImage will initialize themselves with the minimum KRaft version (3.0-IV1).

The rationale for setting the minimum version to 3.0-IV1 is so that we can avoid any cases of KRaft mode running with an old log message format (KIP-724 was introduced in 3.0-IV1). As a side-effect of increasing this minimum version, the feature level values decreased by one.

Reviewers: Jason Gustafson <jason@confluent.io>, Jun Rao <junrao@gmail.com>
2022-06-13 14:23:28 -04:00
Divij Vaidya 4426b05e54
MINOR: Use Exit.addShutdownHook instead of directly adding hooks to Runtime (#12283)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Igor Soarez  <soarez@apple.com>, Kvicii <kvicii.yu@gmail.com>
2022-06-13 17:25:40 +02:00
David Jacot 151ca12a56
KAFKA-13916; Fenced replicas should not be allowed to join the ISR in KRaft (#12240)
This PR implements the first part of KIP-841. Specifically, it implements the following:

1. Adds a new metadata version.
2. Adds the InControlledShutdown field to the BrokerRegistrationRecord and BrokerRegistrationChangeRecord and bump their versions. The newest versions are only used if the new metadata version is enabled.
3. Writes a BrokerRegistrationChangeRecord with InControlledShutdown set when a broker requests a controlled shutdown.
4. Ensures that fenced and in controlled shutdown replicas are not picked as leaders nor included in the ISR.
5. Adds or extends unit tests.

Reviewes: José Armando García Sancio <jsancio@users.noreply.github.com>, dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>
2022-06-07 10:37:20 -07:00
Colin Patrick McCabe 65b4374203
MINOR: implement BrokerRegistrationChangeRecord (#12195)
Implement BrokerRegistrationChangeRecord as specified in KIP-746. This is a more flexible record than the
single-purpose Fence / Unfence records.

Reviewers: José Armando García Sancio <jsancio@gmail.com>, dengziming <dengziming1993@gmail.com>
2022-06-01 16:33:01 -07:00
José Armando García Sancio 7d1b0926fa
KAFKA-13883: Implement NoOpRecord and metadata metrics (#12183)
Implement NoOpRecord as described in KIP-835. This is controlled by the new
metadata.max.idle.interval.ms configuration.

The KRaft controller schedules an event to write NoOpRecord to the metadata log if the metadata
version supports this feature. This event is scheduled at the interval defined in
metadata.max.idle.interval.ms. Brokers and controllers were improved to ignore the NoOpRecord when
replaying the metadata log.

This PR also addsffour new metrics to the KafkaController metric group, as described KIP-835.

Finally, there are some small fixes to leader recovery. This PR fixes a bug where metadata version
3.3-IV1 was not marked as changing the metadata. It also changes the ReplicaControlManager to
accept a metadata version supplier to determine if the leader recovery state is supported.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2022-06-01 10:48:24 -07:00
dengziming 54d60ced86
KAFKA-13833: Remove the min_version_level from the finalized version range written to ZooKeeper (#12062)
Reviewers: David Arthur <mumrah@gmail.com>
2022-05-25 14:02:34 -04:00
David Arthur 1135f22eaf
KAFKA-13830 MetadataVersion integration for KRaft controller (#12050)
This patch builds on #12072 and adds controller support for metadata.version. The kafka-storage tool now allows a
user to specify a specific metadata.version to bootstrap into the cluster, otherwise the latest version is used.

Upon the first leader election of the KRaft quroum, this initial metadata.version is written into the metadata log. When
writing snapshots, a FeatureLevelRecord for metadata.version will be written out ahead of other records so we can
decode things at the correct version level.

This also includes additional validation in the controller when setting feature levels. It will now check that a given
metadata.version is supportable by the quroum, not just the brokers.

Reviewers: José Armando García Sancio <jsancio@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>, Alyssa Huang <ahuang@confluent.io>
2022-05-18 12:08:36 -07:00
José Armando García Sancio e94934b6b7
MINOR; DeleteTopics version tests (#12141)
Add a DeleteTopics test for all supported versions. Convert the
DeleteTopicsRequestTest to run against both ZK and KRaft mode.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>
2022-05-12 13:04:48 -07:00