We have observed an issue where inter broker SSL listener is not coming up when running with TLSv3/JDK 17 .
SSL debug logs shows that TLSv3 post handshake messages >16K are not getting read and causing SslEngineValidator process to stuck while validating the provided trust/key store.
- Right now, WRAP returns if there is already data in the buffer. But if we need more data to be wrapped for UNWRAP to succeed, we end up looping forever. To fix this, now we always attempt WRAP and only return early on BUFFER_OVERFLOW.
- Update SslEngineValidator to unwrap post-handshake messages from peer when local handshake status is FINISHED.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
I've added details for VerificationFailureRate and VerificationTimeMs.
I considered adding the documentation for the AddPartitionsToTxnVerification metrics, but I noticed that all the request metrics simply listed Produce|FetchConsumer|FetchFollower. If we don't already report the AddPartitionsToTxn request metrics in this file, it doesn't make sense to add the verification variant. (As well as all the other APIs we report)
Filed a followup jira if we want to redo that whole section.
Reviewers: Reviewers: Divij Vaidya <diviv@amazon.com>
A bug in the RemoteIndexCache leads to a situation where the cache does not replace the corrupted index with a new index instance fetched from remote storage. This commit fixes the bug by adding correct handling for `CorruptIndexException`.
Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Alexandre Dupriez <duprie@amazon.com>
DeleteSegmentsDueToLogStartOffsetBreach configures the segment such that it can hold at-most 2 record-batches. And, it asserts that the local-log-start-offset based on the assumption that each segment will contain exactly two messages.
During leader switch, the segment can get rotated and may not always contain two records. Previously, we were checking whether the expected local-log-start-offset is equal to the base-offset-of-the-first-local-log-segment. With this patch, we will scan the first local-log-segment for the expected offset.
Reviewers: Divij Vaidya <diviv@amazon.com>
bump snappy-java version to 1.1.10.4, and add more tests to verify the compressed data can be correctly decompressed and read.
For LogCleanerParameterizedIntegrationTest, we increased the message size for snappy decompression since in the new version of snappy, the decompressed size is increasing compared with the previous version. But since the compression algorithm is not kafka's scope, all we need to do is to make sure the compressed data can be successfully decompressed and parsed/read.
Reviewers: Divij Vaidya <diviv@amazon.com>, Ismael Juma <ismael@juma.me.uk>, Josep Prat <josep.prat@aiven.io>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
As part of validating 3.6.0 RC0, I ran the ZK migration system tests at the RC tag. Pretty much all of them failed due to recent changes (particularly, disallowing migrations with JBOD). All of the changes here are test fixes, so not a release blocker.
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.11.3
session_id: 2023-09-19--007
run time: 8 minutes 51.147 seconds
tests run: 5
passed: 5
flaky: 0
failed: 0
ignored: 0
Reviewers: Luke Chen <showuon@gmail.com>
This test extends the existing TransactionsTest. It configures the broker and topic with tiered storage and expects at-least one log segment to be uploaded to the remote storage.
Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>
KIP-890 Part 1 tries to address hanging transactions on old clients. Thus, the produce version can not be bumped and no new errors can be added. Before we used the java client's notion of retriable and abortable errors -- retriable errors are defined as such by extending the retriable error class, fatal errors are defined explicitly, and abortable errors are the remaining. However, many other clients treat non specified errors as fatal and that means many retriable errors kill the application.
Stuck between having specific errors for Java clients that are handled correctly (ie we retry) or specific fatal errors for cases that should not be fatal, we opted for a middle ground of non-specific error, but a message in the response to specify.
Converting some of the coordinator error codes to NOT_ENOUGH_REPLICAS which is a known produce response.
Also correctly add the old errors to the produce response. (We were not doing this correctly before)
Added tests for the new errors and messages.
Reviewers: Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>
In "ZooKeeper to KRaft Migration" documentation, we are still reporting 3.4 as metadata version. Reworking that phrase to make it more clear and avoid the need to update it in the future.
Signed-off-by: Federico Valeri <fedevaleri@gmail.com>
Reviewers: Luke Chen <showuon@gmail.com>
Reduce default remote.log.metadata.initialization.retry.interval.ms value to 100ms.
Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
- Updated the contract for RSM's fetchIndex to throw a ResourceNotFoundException instead of returning an empty InputStream when it does not have a TransactionIndex.
- Updated the LocalTieredStorage implementation to adhere to the new contract.
- Added Unit Tests for the change.
Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
This change is about the current leader updating the log-start-offset before the segments are deleted from remote storage. This will do a best-effort mechanism for followers to receive log-start-offset from the leader and they can update their log-start-offset before it becomes a leader.
Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
- RSM and RLMM classpath can be empty since it's optional so removed the non-empty string validator
- Fix getting the `localTieredStorage` by brokerId after stopping a broker.
Reviewers: Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
This patch allows broker heartbeat events to be completed while a metadata transaction is in-flight.
More generally, this patch allows any RUNS_IN_PREMIGRATION event to complete while the controller
is in pre-migration mode even if the migration transaction is in-flight.
We had a problem with broker heartbeats timing out because they could not be completed while a large
ZK migration transaction was in-flight. This resulted in the controller fencing all the ZK brokers which
has many undesirable downstream effects.
Reviewers: Akhilesh Chaganti <akhileshchg@users.noreply.github.com>, Colin Patrick McCabe <cmccabe@apache.org>
* Added the integration test for DELETE_RECORDS API for tiered storage enabled topic
* Added validation checks before removing remote log segments for log-start-offset breach
Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>
In the Windows OS atomic move are not allowed if the file has another open handle. E.g
__cluster_metadata-0\quorum-state: The process cannot access the file because it is being used by another process
at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:403)
at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:293)
at java.base/java.nio.file.Files.move(Files.java:1430)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:949)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:932)
at org.apache.kafka.raft.FileBasedStateStore.writeElectionStateToFile(FileBasedStateStore.java:152)
This is fixed by first closing the temporary quorum-state file before attempting to move it.
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>
Co-Authored-By: Renaldo Baur Filho <renaldobf@gmail.com>
This restores previous behavior for Admin::listOffsets, which was to fail immediately if topic metadata could not be found, and only retry if metadata for one or more specific partitions could not be found.
There is a subtle difference here: prior to https://github.com/apache/kafka/pull/13432, the operation would be retried if any metadata error was reported for any individual topic partition, even if an error was also reported for the entire topic. With this change, the operation always fails if an error is reported for the entire topic, even if an error is also reported for one or more individual topic partitions.
I am not aware of any cases where brokers might return both topic- and topic partition-level errors for a metadata request, and if there are none, then this change should be safe. However, if there are such cases, we may need to refine this PR to remove the discrepancy in behavior.
Reviewers: Justine Olshan <jolshan@confluent.io>
On leadership failover, the new leader's start offset may be older than the start offset of old leader. This works fine for local storage scenario because the new leader still contains data associated with stale start offset. But in case of remote storage, although new leader has a stale offset, the data associated with it has been deleted from remote by the old leader. Hence, we end up in a situation where leader has a start offset but no data associated with it.
This commit fixes the situation by ensuring that on every leadership failover, for topics with remote storage, the leader will update it's start offset from the base of first segment in current leader chain present in the remote storage (if any).
Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>, Divij Vaidya <diviv@amazon.com>
- Updated the log-start-offset to the correct value while building the replica state in ReplicaFetcherTierStateMachine#buildRemoteLogAuxState
Integration tests added:
1. ReassignReplicaExpandTest
2. ReassignReplicaMoveTest and
3. ReassignReplicaShrinkTest
Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>
Added the below integration tests with tiered storage
- PartitionsExpandTest
- DeleteSegmentsByRetentionSizeTest
- DeleteSegmentsByRetentionTimeTest and
- EnableRemoteLogOnTopicTest
- Enabled the test for both ZK and Kraft modes.
These are enabled for both ZK and Kraft modes.
Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>, Divij Vaidya <diviv@amazon.com>
This will allow enabling and disabling transaction verification (KIP-890 part 1) without having to roll the cluster.
Tested that restarting the cluster persists the configuration.
If a verification is disabled/enabled while we have an inflight request, depending on the step of the process, the change may or may not be seen in the inflight request (enabling will typically fail unverified requests, but we may still verify and reject when we first disable) Subsequent requests/retries will behave as expected for verification.
Sequence checks will continue to take place after disabling until the first message is written to the partition (thus clearing the verification entry with the tentative sequence) or the broker restarts/partition is reassigned which will clear the memory. On enabling, we will only track sequences that for requests received after the verification is enabled.
Reviewers: Jason Gustafson <jason@confluent.io>, Satish Duggana <satishd@apache.org>
A follow-up to https://github.com/apache/kafka/pull/13804.
This follow-up adds the alternative fix approach mentioned in
the PR above - bumping the session timeout used in the test
with 1 second.
Reproducing the flake-out locally has been much harder than
on the CI runs, as neither Gradle with Java 11 or Java 14 nor
IntelliJ with Java 14 could show it, but IntelliJ with Java 11
could occasionally reproduce the failure the first time
immediately after a rebuild. While I was unable to see the
failure with the bumped session timeout, the testing procedure
definitely didn't provide sufficient reassurance for the
fix as even without it often I'd see hundreds of consecutive
successful test runs when the first run didn't fail.
Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>
This change is about RLM task handling retriable exception when it tries to copy segments to remote but the RLMM is not yet initialized. On encountering the exception, we log the error and throw the exception back to the caller. We also make sure that the failure metrics are updated since this is a temporary error because RLMM is not yet initialized.
Added unit tests to verify RLM task does not attempt to copy segments to remote on encountering the retriable exception and that failure metrics remain unchanged.
Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>