Commit Graph

148 Commits

Author SHA1 Message Date
Ismael Juma 3dba3125e9
KAFKA-18601: Assume a baseline of 3.3 for server protocol versions (#18845)
3.3.0 was the first KRaft release that was deemed production-ready and also
when KIP-778 (KRaft to KRaft upgrades) landed. Given that, it's reasonable
for 4.x to only support upgrades from 3.3.0 or newer (the metadata version also
needs to be set to "3.3" or newer before upgrading).

Noteworthy changes:
1. `AlterPartition` no longer includes topic names, which makes it possible to
simplify `AlterParitionManager` logic.
2. Metadata versions older than `IBP_3_3_IV3` have been removed and
`IBP_3_3_IV3` is now the minimum version.
3. `MINIMUM_BOOTSTRAP_VERSION` has been removed.
4. Removed `isLeaderRecoverySupported`, `isNoOpsRecordSupported`,
`isKRaftSupported`, `isBrokerRegistrationChangeRecordSupported` and
`isInControlledShutdownStateSupported` - these are always `true` now.
Also removed related conditional code.
5. Removed default metadata version or metadata version fallbacks in
multiple places - we now fail-fast instead of potentially using an incorrect
metadata version.
6. Update `MetadataBatchLoader.resetToImage` to set `hasSeenRecord`
based on whether image is empty - this was a previously existing issue that
became more apparent after the changes in this PR.
7. Remove `ibp` parameter from `BootstrapDirectory`
8. A number of tests were not useful anymore and have been removed.

I will update the upgrade notes via a separate PR as there are a few things that
need changing and it would be easier to do so that way.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Justine Olshan <jolshan@confluen.io>, Ken Huang <s7133700@gmail.com>
2025-02-19 05:35:42 -08:00
Chirag Wadhwa 63229a768c
KAFKA-16718 [1/n]: Added DeleteShareGroupOffsets request and response schema (#18927)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-18 14:06:24 +00:00
Apoorv Mittal 06ce3e890b
KAFKA-18733: Updating share group record acks metric (2/N) (#18924)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-02-17 18:12:58 +00:00
Jimmy Wang 6a6b80215d
KAFKA-16717 [1/2]: Add AdminClient.alterShareGroupOffsets (#18819)
KAFKA-16720 aims to add the support for the AlterShareGroupOffsets AdminClient. Key Changes in the PR:

1. Added handing of alterShareGroupOffsets() in KafkaAdminClient and introduce AlterShareGroupOffsetRequest/AlterShareGroupOffsetResponse/AlterShareGroupOffsetsOptions classes.
2. Corresponding test in KafkaAdminClientTest.
3. Added ALTER_SHARE_GROUP_OFFSETS API (will finish it in next PR and the share coordinator pieces)

Reviewers: poorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-15 02:35:46 +08:00
Apoorv Mittal 53543bcf63
KAFKA-18733: Updating share group metrics (1/N) (#18826)
Reviewers: Sushant Mahajan <smahajan@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2025-02-14 08:48:41 +00:00
Calvin Liu 9cb271f1e1
KAFKA-18654[2/2]: Transction V2 retry add partitions on the server side when handling produce request. (#18810)
During the transaction commit phase, it is normal to hit CONCURRENT_TRANSACTION error before the transaction markers are fully propagated. Instead of letting the client to retry the produce request, it is better to retry on the server side.

Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>
2025-02-13 09:30:58 -08:00
Ken Huang 9494bebee6
KAFKA-18728 Move ListOffsetsPartitionStatus to server module (#18807)
Reviewers: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2025-02-13 10:36:46 +05:30
Jhen-Yung Hsu 4e36368d08
KAFKA-18743 Remove leader.imbalance.per.broker.percentage as it is not supported by Kraft (#18821)
Remove `leader.imbalance.per.broker.percentage` from config.
Add `leader.imbalance.per.broker.percentage` to release note

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-02-11 04:01:57 +08:00
Colin Patrick McCabe b2b2408692
KAFKA-18360 Remove zookeeper configurations (#18566)
Remove broker.id.generation.enable and reserved.broker.max.id, which are not used in KRaft mode.
Remove inter.broker.protocol.version, which is not used in KRaft mode.

Reviewers: PoAn Yang <payang@apache.org>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-06 22:22:11 +08:00
TengYao Chi aac62a32d9
KAFKA-18698: Migrate suitable classes to records in server and server-common modules (#18783)
Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Christo Lolov <lolovc@amazon.com>
2025-02-05 10:00:11 +00:00
kevin-wu24 184b891871
KAFKA-16524; Metrics for KIP-853 (#18304)
This change implement some of the metrics enumerated in KIP-853.

The KafkaRaftMetrics object now exposes number-of-voters, number-of-observers and uncommitted-voter-change. The number-of-observers and uncommitted-voter-change metrics are only present on the active controller or leader, since it does not make sense for other replicas to report these metrics.

In order to make these two metrics thread-safe, KafkaRaftMetrics needs to be passed into LeaderState, and therefore QuorumState. This introduces a circularity since the KafkaRaftMetrics constructor takes in QuorumState. To break the circularity for now, the logic using QuorumState will be moved to the KafkaRaftMetrics#initialize method.

The BrokerServerMetrics object now exposes ignored-static-voters. The ControllerServerMetrics object now exposes IgnoredStaticVoters. To implement both metrics for "ignored static voters", this PR introduces the ExternalKRaftMetrics interface, which allows for higher layer metrics objects to be accessible within the raft module.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2025-01-30 18:35:01 -05:00
PoAn Yang 4dd0bcbde8
KAFKA-18383 Remove reserved.broker.max.id and broker.id.generation.enable (#18478)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-30 02:55:09 +08:00
Apoorv Mittal c7619ef8d1
KAFKA-17951: Share parition rotate strategy (#18651)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>
2025-01-28 11:44:48 +00:00
Chung, Ming-Yen a8f6fc9cc4
KAFKA-18631 Remove ZkConfigs (#18693)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-26 04:37:49 +08:00
TengYao Chi 9bab9dc023
MINOR: Remove EMPTY_NODE_ID (#18707)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-26 04:28:01 +08:00
Ken Huang c40e7a1341
KAFKA-18533 Remove KafkaConfig zookeeper related logic (#18547)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-25 22:52:21 +08:00
Apoorv Mittal 70eab7778d
KAFKA-17894: Implemented broker topic metrics for Share Group 1/N (KIP-1103) (#18444)
The PR implements the BrokerTopicMetrics defined in KIP-1103.

The PR also corrected the share-acknowledgement-rate and share-acknowledgement-count metrics defined in KIP-932 as they are moved to BrokerTopicMetrics, necessary changes to KIP-932 broker metrics will be done once we complete KIP-1103.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>
2025-01-24 09:34:54 -08:00
Ken Huang 341e535942
KAFKA-18519: Remove Json.scala, cleanup AclEntry.scala (#18614)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-22 16:12:06 +01:00
Ismael Juma 87b37a4065
KAFKA-14552: Assume a baseline of 3.0 for server protocol versions (#18497)
Kafka 4.0 will remove support for zk mode and will require conversion to kraft
before upgrading to 4.0. The minimum kraft version is 3.0 (aka 3.0-IV1).

This provides an opportunity to remove exclusively server side protocols versions
that only exist to allow direct upgrades from versions older than 3.0 or that are
used only by zk mode.

Since KRaft became production ready in 3.3, we should consider setting the
baseline to 3.3. But that requires more discussion and it can be done via a
separate change (KAFKA-18601).

Protocol changes:
* Remove RequestHeader v0 (only used by ControlledShutdown v0)
* Remove WriteTxnMarkers v0
* Remove all versions of ControlledShutdown, LeaderAndIsr, StopReplica, UpdateMetadata

In order to remove all versions safely, extend generator to support setting
"versions" to "none". In this case, we no longer generate the `*Data` classes,
but we still reserve the id for the relevant protocol api (so it doesn't get
accidentally used for something else). The protocol documentation is correct
after these changes.

We kept a simplified version of `LeaderAndIsr{Request|Response}` because
it's used by many tests that are still relevant in kraft mode. Once
KAFKA-18486 is done, it may be possible to remove it (I left a comment on
the ticket). Similarly, KAFKA-18487 may make it possible to remove
the introduced `StopReplicaPartitionState` (left a comment on that ticket too).

There are a number of places that were adjusted to include an
`ApiKeys.hasValidVersion` check.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-20 13:51:44 -08:00
Ken Huang d0bca67b27
KAFKA-18363: Remove ZooKeeper mentiosn in broker configs (#18346)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, TengYao Chi <kitingiao@gmail.com>
2025-01-20 18:22:40 +01:00
TengYao Chi 0b07db572b
KAFKA-18595: Remove AuthorizerUtils#sessionToRequestContext (#18631)
Reviewers: Christo Lolov <lolovc@amazon.com>
2025-01-20 15:35:53 +00:00
TengYao Chi 3871ec4268
MINOR: Remove unused CredentialProvider#updateCredentials (#18622)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-19 23:02:29 +08:00
Apoorv Mittal a1f7457389
KAFKA-18514: Refactor share module code to server and server-common (#18524)
As per the discussion with @ijuma and @mumrah, the `share` module seems not required and it's advised to user `server` and `server-common` instead. The PR moves the classes from `share` module to respective server related modules.

Following has been refactored in the PR:

- Moved Share Fetch, Acknowledge, Session, Context and Cache related classes to `server` module as the classes are used by `core` and `tools` modules.
- Moved `Persister` releated classes from `share` to `server-common` as the Persister classes though currently just being used by `core` module but in [near future](https://github.com/apache/kafka/pull/17775) will also be used by `group-coordinator`. Hence the Persister classes shouldn't go in `server`. The debate is mostly between `coordinator-common` vs `server-common`. We have kept the Persister in `server-common` for now, the classes are more related to the server than the coordinator. Persister is basically an abstraction in the server to let you choose how you want to persist the share group progress.
- Updated build.gradle to remove `share` module.
- Removed `import-control-share.xml`

Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-16 00:14:25 -08:00
PoAn Yang 14daa23b59
KAFKA-18331: Make process.roles and node.id required configs (#18414)
In 4.0, there is no ZK mode and both of these configs are required in kraft mode.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2025-01-15 23:55:51 -08:00
Apoorv Mittal e12db663f0
KAFKA-18514 Remove server dependency on share coordinator (#18536)
The PR removes dependency of server module on share-coordinator, rather it should be other way. Moved the ShareCoordinatorConfig class from server to share-coordinator.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-16 00:47:01 +08:00
Sanskar Jhajharia e3e4c17959
Add DescribeShareGroupOffsets API [KIP-932] (#18500)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-14 14:33:39 +00:00
TaiJuWu 0c435e3855
KAFKA-18353 Remove zk config `control.plane.listener.name` (#18329)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 18:38:37 +08:00
Jhen-Yung Hsu f95726a211
KAFKA-18417 Remove controlled.shutdown.max.retries and controlled.shutdown.retry.backoff.ms (#18431)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-08 17:13:42 +08:00
David Arthur c4840f5e93
KAFKA-16446: Improve controller event duration logging (#15622)
There are times when the controller has a high event processing time, such as during startup, or when creating a topic with many partitions. We can see these processing times in the p99 metric (kafka.controller:type=ControllerEventManager,name=EventQueueProcessingTimeMs), however it's difficult to see exactly which event is causing high processing time.

With DEBUG logs, we see every event along with its processing time. Even with this, it's a bit tedious to find the event with a high processing time.

This PR logs all events which take longer than 2 seconds at ERROR level. This will help identify events that are taking far too long, and which could be disruptive to the operation of the controller. The slow event logging looks like this:

```
[2024-12-20 15:03:39,754] ERROR [QuorumController id=1] Exceptionally slow controller event createTopics took 5240 ms.  (org.apache.kafka.controller.EventPerformanceMonitor)
```

Also, every 60 seconds, it logs some event time statistics, including average time, maximum time, and the name of the event which took the longest. This periodic message looks like this:

```
[2024-12-20 15:35:04,798] INFO [QuorumController id=1] In the last 60000 ms period, 333 events were completed, which took an average of 12.34 ms each. The slowest event was handleCommit[baseOffset=0], which took 41.90 ms. (org.apache.kafka.controller.EventPerformanceMonitor)
```

An operator can disable these logs by adding the following to their log4j config:

```
org.apache.kafka.controller.EventPerformanceMonitor=OFF
```

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2025-01-06 13:34:46 -08:00
ClarkChen b9354d6886
MINOR: Fix newline not working in LISTENERS_DOC (#18388)
Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-06 14:04:23 +08:00
Ismael Juma 409a43eff7
MINOR: Collection/Option usage simplification via methods introduced in Java 9 & 11 (#18305)
Relevant methods:
1. `List.of`, `Set.of`, `Map.of` and similar (introduced in Java 9)
2. Optional: `isEmpty` (introduced in Java 11), `stream` (introduced in Java 9).

Reviewers: Mickael Maison <mimaison@users.noreply.github.com>
2025-01-03 16:13:39 -08:00
Ismael Juma 73ab7ee4ea
MINOR: Use `Files.readString/writeString` and `String.repeat` to simplify code (#18372)
The 3 methods were introduced in Java 11.

Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-02 17:50:27 -08:00
Ismael Juma d6f24d3665
Use `instanceof` pattern to avoid explicit cast (#18373)
This feature was introduced in Java 16.

Reviewers: David Arthur <mumrah@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
2025-01-02 09:32:51 -08:00
TengYao Chi 3161115ada
KAFKA-18361 Remove PasswordEncoderConfigs (#18347)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-31 00:18:23 +08:00
Ismael Juma dfb178a1d8
KAFKA-18272: Deprecated protocol api usage should be logged at info level (#18313)
This makes it possible to enable request logs for deprecated protocol api versions without enabling it for the rest. Combined with the ability to enable/disable dynamically, it makes it a bit easier to collect the information about deprecated clients that is not available via metrics.

This isn't particularly useful in trunk/4.0 since there are no deprecated api versions in these versions, but it will be useful for older branches. I intend to backport to those branches and add a release note in the backport regarding the change in behavior.

I manually verified that:
1. If the request logger is configured at `INFO` level, only deprecated protocol api versions are logged and they are logged at `INFO` level.
2. If the request logger is configured at `DEBUG` level, all requests are logged but the log level is `INFO` for deprecated protocol api versions and `DEBUG` for the rest.
3. If the request logger is configured at `WARN` level (the default), no requests are logged.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-27 00:04:19 -08:00
Lucas Brutschy 0055ef0a49
KAFKA-18283: Add StreamsGroupDescribe RPC definitions (#18230)
Adds a new RPC StreamsGroupDescribe that returns, given the group ID, all metadata related to the streams group, such as

 - The topology metadata of the group.
 - The topology epoch of the group.
 - The latest member metadata that each member provided through the StreamsGroupHeartbeat API.
 - The current target assignment generated by the assignor.
 - This just adds the JSON as defined in KIP-1071, together with some plumbing.

Reviewers: Bill Bejeck <bbejeck@gmail.com>
2024-12-18 19:38:01 +01:00
Lucas Brutschy ec32c8a376
KAFKA-18282: Add StreamsGroupHeartbeat RPC definitions (#18227)
The StreamsGroupHeartbeat API is the new core API used by streams application to form a group. The API allows members to initialize a topology, advertise their state, and their owned tasks. The group coordinator uses it to assign/revoke tasks to/from members. This API is also used as a liveness check.

This change adds the JSON definition of the RPC, as defined in KIP-1071.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2024-12-18 11:43:44 +01:00
TengYao Chi 772aa241b2
KAFKA-18136: Remove zk migration from code base (#18016)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-12-12 18:34:29 +01:00
Sushant Mahajan 4c5ea05ec8
KAFKA-18058: Share group state record pruning impl. (#18014)
In this PR, we've added a class ShareCoordinatorOffsetsManager, which tracks the last redundant offset for each share group state topic partition. We have also added a periodic timer job in ShareCoordinatorService which queries for the redundant offset at regular intervals and if a valid value is found, issues the deleteRecords call to the ReplicaManager via the PartitionWriter. In this way the size of the partitions is kept manageable.

Reviewers: Jun Rao <junrao@gmail.com>, David Jacot <djacot@confluent.io>, Andrew Schofield <aschofield@confluent.io>
2024-12-12 07:38:03 +00:00
Dongnuo Lyu e30edb3eff
KAFKA-18052: Decouple the dependency of feature stable version to the metadata version (#17886)
Currently the validation of feature upgrade relies on the supported version range generated during registration. For a given feature, its max supported feature version in production is set to be the default version value (the latest feature version with bootstrap metadata value smaller or equal to the latest production metadata value).

This patch introduces a LATEST_PRODUCTION value independent from the metadata version to each feature so that the highest supported feature version can be customized by the feature owner.

The change only applies to dynamic feature upgrade. During formatting, we still use the default value associated the metadata version.

Reviewers: Justine Olshan <jolshan@confluent.io>, Jun Rao <junrao@gmail.com>
2024-12-05 11:07:47 -08:00
Ken Huang 2b43c49f51
KAFKA-18050 Upgrade the checkstyle version to 10.20.2 (#17999)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-05 10:59:18 +08:00
David Jacot 44cb90222c
MINOR: Refactor configs in GroupMetadataManager (#17982)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-12-02 02:26:28 +08:00
Chia-Chuan Yu 7ca02fd908
KAFKA-16617 Add KRaft info for the `advertised.listeners` doc description (#17552)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-30 03:12:31 +08:00
Ken Huang c32a49549d
MINOR: Remove duplicate valid value in document (#17947)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-27 21:07:36 +08:00
Calvin Liu 2b2b3cd355
KAFKA-18062: use feature version to enable ELR (#17867)
Replace the ELR static config with feature version.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-11-26 14:40:23 -08:00
Cheryl Simmons 6dcca01efc
MINOR: Fixing typos in property definition: Adding an apostrophe and capitalizing ISR
Reviewers: Justine Olshan <jolshan@confluent.io>
2024-11-19 12:09:15 -08:00
kevin-wu24 ebb3202e01
KAFKA-16964 Integration tests for adding and removing voters (#17582)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-11-05 03:09:37 +08:00
Ken Huang 30b1bdfc74
KAFKA-17835 Move ProducerIdManager and RPCProducerIdManager to transaction-coordinator module (#17562)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-31 02:40:47 +08:00
Kuan-Po Tseng f0a3960e3e
KAFKA-17867 Consider using zero-copy for PushTelemetryRequest (#17622)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-30 20:49:56 +08:00
Dmitry Werner 12a60b8cd9
KAFKA-17878 Move ActionQueue to server module (#17602)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-10-28 20:35:23 +08:00