3.3.0 was the first KRaft release that was deemed production-ready and also
when KIP-778 (KRaft to KRaft upgrades) landed. Given that, it's reasonable
for 4.x to only support upgrades from 3.3.0 or newer (the metadata version also
needs to be set to "3.3" or newer before upgrading).
Noteworthy changes:
1. `AlterPartition` no longer includes topic names, which makes it possible to
simplify `AlterParitionManager` logic.
2. Metadata versions older than `IBP_3_3_IV3` have been removed and
`IBP_3_3_IV3` is now the minimum version.
3. `MINIMUM_BOOTSTRAP_VERSION` has been removed.
4. Removed `isLeaderRecoverySupported`, `isNoOpsRecordSupported`,
`isKRaftSupported`, `isBrokerRegistrationChangeRecordSupported` and
`isInControlledShutdownStateSupported` - these are always `true` now.
Also removed related conditional code.
5. Removed default metadata version or metadata version fallbacks in
multiple places - we now fail-fast instead of potentially using an incorrect
metadata version.
6. Update `MetadataBatchLoader.resetToImage` to set `hasSeenRecord`
based on whether image is empty - this was a previously existing issue that
became more apparent after the changes in this PR.
7. Remove `ibp` parameter from `BootstrapDirectory`
8. A number of tests were not useful anymore and have been removed.
I will update the upgrade notes via a separate PR as there are a few things that
need changing and it would be easier to do so that way.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Justine Olshan <jolshan@confluen.io>, Ken Huang <s7133700@gmail.com>
- update reflection-config.json and resource-config.json to include log4j2 and jackson
- remove unused jackson scala library
- fix the incorrect path of log4j2.yaml
- adopt workaround (--standalone) to make this PR work and it will be fixed by KAFKA-18737)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
- Fixed the RemoteIndexCacheTest that fails with caffeine > 3.1.1
Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
The stale/invalid files that ends-with ".deleted" and ".tmp" should be cleaned when the broker gets restarted.
- fix the remote-index-cache test to use the logDir instead of topicDir
- fix the flaky test
Reviewers: Luke Chen <showuon@gmail.com>
In PR #18267, we removed old message format for cases in ConsumerWithLegacyMessageFormatIntegrationTest. Although test cases can pass, they don't fulfill original purpose. We can't send old message format since 4.0, so I change cases to append old records by ReplicaManager directly.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Frequently updating the trust store can cause unexpected termination of the AsyncConsumer background thread.
1. To resolve this issue, reuse the same AdminClient instead of recreating it.
2. Add error logging when fail to initialize resources for the consumer network thread.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
KAFKA-16720 aims to add the support for the AlterShareGroupOffsets AdminClient. Key Changes in the PR:
1. Added handing of alterShareGroupOffsets() in KafkaAdminClient and introduce AlterShareGroupOffsetRequest/AlterShareGroupOffsetResponse/AlterShareGroupOffsetsOptions classes.
2. Corresponding test in KafkaAdminClientTest.
3. Added ALTER_SHARE_GROUP_OFFSETS API (will finish it in next PR and the share coordinator pieces)
Reviewers: poorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
During the transaction commit phase, it is normal to hit CONCURRENT_TRANSACTION error before the transaction markers are fully propagated. Instead of letting the client to retry the produce request, it is better to retry on the server side.
Reviewers: Artem Livshits <alivshits@confluent.io>, Justine Olshan <jolshan@confluent.io>
Rewrite BrokerMetricNamesTest using ReplicaManager.MetricNames, ensuring that all metrics are always included. This helps prevent issues like PartitionsWithLateTransactionsCount not being correctly included in the test before.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
The case testDescribeQuorumRequestToControllers shutdowns raft client but not the controller. This makes client has chance to send a request to the controller and get NOT_LEADER_OR_FOLLOWER error. However, if the raft client finishes shutdown before handling the request, the request will not be handled. Shutdown the controller before doing KafkaFuture#get for the client request, so we can make sure the request is handled by another controller eventually.
Signed-off-by: PoAn Yang <payang@apache.org>
Reviewers: Luke Chen <showuon@gmail.com>
Sometimes we didn't get into abortable state before aborting, so the epoch didn't get bumped. Now we force abortable state with an attempt to send before aborting so the epoch bump occurs as expected.
Reviewers: Jeff Kim <jeff.kim@confluent.io>
* KAFKA-18758: NullPointerException in shutdown following InvalidConfigurationException
Add checks for null in shutdown as BrokerLifecycleManager is not instantiaited if LogManager constructor throws an Exception
After log4j migration, we need to update the logging configuration in KafkaDockerWrapper from log4j1 to log4j2.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
it's crucial to utilize a try-finally block to ensure proper closure of the ReplicaManager. Failing to do so can result in an unreleased thread from the purgatory, potentially leading to errors in subsequent integration tests that incorporate thread leak detection.
Reviewers: poorv Mittal <apoorvmittal10@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This commit ensures that the ClientQuotaCallback#updateClusterMetadata method is executed in KRaft mode. This method is triggered whenever a topic or cluster metadata change occurs. However, in KRaft mode, the current implementation of the updateClusterMetadata API is inefficient due to the requirement of creating a full Cluster object. To address this, a follow-up issue (KAFKA-18239) has been created to explore more efficient mechanisms for providing cluster information to the ClientQuotaCallback without incurring the overhead of a full Cluster object creation.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, TaiJuWu <tjwu1217@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
If Log4j Core is absent, most calls to Log4jController and Loggers will end up with a NoClassDefFoundError.
This changeset:
- Profits from the major version bump to rename k.util.Log4jController to LoggingController.
- Removes o.a.l.l.Level from the signature of public methods of o.a.k.connect.runtime.Loggers and replaces it with String.
- Provides an additional no-op implementation of k.util.LoggingController and o.a.k.connect.runtime.Loggers: if Log4j Core is not present on the runtime classpath the no-op implementation will be used.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Remove broker.id.generation.enable and reserved.broker.max.id, which are not used in KRaft mode.
Remove inter.broker.protocol.version, which is not used in KRaft mode.
Reviewers: PoAn Yang <payang@apache.org>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
Since zk has been removed in 4.0, config handlers no longer need to handle the "<default>" value. This PR streamlines the config update process by eliminating the unnecessary string checks for "<default>"
Reviewers: Christo Lolov <lolovc@amazon.com>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
related to KAFKA-18206, set features in EmbeddedKafkaCluster in both streams and connect module, note that this PR also fix potential transaction with empty records in sendPrivileged method as transaction version 2 doesn't allow this kind of scenario.
Reviewers: Justine Olshan <jolshan@confluent.io>
It appears this test was failing because the transaction was never aborting and the concurrent transactions errors would not go away.
ccab9eb introduced the test failure because it requires the transaction to complete, but I suspect the lack of completion was happening before the change.
The timeout for the write is based on the transactional timeout, and 100ms seemed too small -- thus the requests to update the state would often repeatedly time out.
Also removed the loop since it was not necessary.
Reviewers: Jeff Kim <jeff.kim@confluent.io>, Calvin Liu <caliu@confluent.io>
We need to re-enable the unclean shutdown detection when in ELR mode, which was inadvertently removed during the development process.
Reviewers: David Mao <dmao@confluent.io>, Jun Rao <junrao@gmail.com>