This implementation introduces two new configurations `log.message.timestamp.before.max.ms` and `log.message.timestamp.after.max.ms` and deprecates `log.message.timestamp.difference.max.ms`.
The default value for all these three configs is maintained to be Long.MAX_VALUE for backward compatibility but with the newly added configurations we can have a finer control when validating message timestamps that are in the past and the future compared to the broker's timestamp.
To maintain backward compatibility if the default value of `log.message.timestamp.before.max.ms` is not changed, we are assuming users are still using the deprecated config `log.message.timestamp.difference.max.ms` and validation is done using its value. This ensures that existing customers who have customized the value of `log.message.timestamp.difference.max.ms` will continue to see no change in behavior.
Reviewers: Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>
A race condition between async flush and segment rename (for deletion purpose) might cause the entire log directory to be marked offline when we delete a topic. This PR fixes the bug by ignoring NoSuchFileException when we flush a directory.
Reviewers: Divij Vaidya <diviv@amazon.com>
"The test RefreshingHttpsJwksTest#testSecondaryRefreshAfterElapsedDelay relies on the actual system clock, which makes it frequently fail. The fix adds a second constructor that allows for passing a ScheduledExecutorService to manually execute the scheduled tasks before refreshing. The fixed task is much more robust and stable.
Co-authored-by: Fei Xie <feixie@MacBook-Pro.attlocal.net>
Reviewers: Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>
This PR main refactoring relates to :
1. serializers/deserializers used in clients - unified in a Deserializers class
2. logic for configuring ClusterResourceListeners moved to ClientUtils
3. misc refactoring of the new async consumer in preparation for upcoming Request Managers
Reviewers: Jun Rao <junrao@gmail.com>
#14083 added support for delegation tokens in KRaft and attached that support to the existing
MetadataVersion 3.6-IV1. This patch moves that support into a separate MetadataVersion 3.6-IV2.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
This PR contains three main changes:
- Support for transactions in MetadataLoader
- Abort in-progress transaction during controller failover
- Utilize transactions for ZK to KRaft migration
A new MetadataBatchLoader class is added to decouple the loading of record batches from the
publishing of metadata in MetadataLoader. Since a transaction can span across multiple batches (or
multiple transactions could exist within one batch), some buffering of metadata updates was needed
before publishing out to the MetadataPublishers. MetadataBatchLoader accumulates changes into a
MetadataDelta, and uses a callback to publish to the publishers when needed.
One small oddity with this approach is that since we can "splitting" batches in some cases, the
number of bytes returned in the LogDeltaManifest has new semantics. The number of bytes included in
a batch is now only included in the last metadata update that is published as a result of a batch.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
KIP-890 part 1 introduced the callback request type. It is used to execute a callback after KafkaApis.handle has returned. We did not account for tryCompleteActions at the end of handle when making this change.
In tests, we saw produce p99 increase dramatically (likely because we have to wait for another request before we can complete DelayedProduce). As a result, we should add the tryCompleteActions after the callback as well. In testing, this improved the produce performance.
Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>
Noticed that there was a dangling unused class (LongRef, replaced by PrimitiveRef.LongRef), and the LogOffsetMetadata toString was a little oddly formatted.
Reviewers: Justine Olshan <jolshan@confluent.io>
`TieredStorageTestHarness` is a base class for integration tests exercising the tiered storage functionality. This uses `LocalTieredStorage` instance as the second-tier storage system and `TopicBasedRemoteLogMetadataManager` as the remote log metadata manager.
Co-authored-by: Alexandre Dupriez <alexandre.dupriez@gmail.com>
Co-authored-by: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
Reviewers: David Arthur <mumrah@gmail.com>, Ron Dagostino <rndgstn@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Viktor Somogyi <viktor.somogyi@cloudera.com>
A stream thread should only change to RUNNING if there are no
active tasks in restoration in the state updater and if there
are no pending tasks to recycle and to init.
Usually all pending tasks to init are added to the state updater
in the same poll iteration that handles the assignment. However,
if during an initialization of a task a LockException the task
is re-added to the tasks to init and initialization is retried
in the next poll iteration.
A LockException might occur when a state directory is still locked
by another thread, when the rebalance just happened.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Walker Carlson <wcarlson@confluent.io>
Currently, Kafka Streams only tries to purge records whose
offset are committed from a repartition topic when at
least one offset was committed in the current commit.
The coupling between committing some offsets and purging
records is not needed and might delay purging of records.
For example, if a in-flight call for purging records has not
completed yet when a commit happens, a new call
is not issued.
If then the earlier in-flight call for purging records
finally completes but the next commit does not commit any
offsets, Streams does not issue the call for purging records
whose offset were committed in the previous commit
because the purging call was still in-flight.
This change issues calls for purging records during any commit
if the purge interval passed, even if no offsets were committed
in the current commit.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Walker Carlson <wcarlson@confluent.io>
KIP-904 introduced a backward incompatible change that requires a 2-bounce rolling upgrade.
The new "3.4" upgrade config value is not recognized by `AssignorConfiguration` though and thus crashed Kafka Streams if use.
Reviewers: Farooq Qaiser <fqaiser94@gmail.com>, Bruno Cadonna <bruno@confluent.io>
When collecting the set of broker IDs during the migration, don't try to parse the default broker resource `""` as a broker ID.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
* Delete remote segments when deleting a topic
Co-authored-by: Kamal Chandraprakash <kchandraprakash@uber.com>
Co-authored-by: d00791190 <dinglan6@huawei.com>
In a non-empty log the KRaft leader only notifies the listener of leadership when it has read to the leader's epoch start offset. This guarantees that the leader epoch has been committed and that the listener has read all committed offsets/records.
Unfortunately, the KRaft leader doesn't do this when the log is empty. When the log is empty the listener is notified immediately when it has become leader. This makes the API inconsistent and harder to program against.
This change fixes that by having the KRaft leader wait for the listener's nextOffset to be greater than the leader's epochStartOffset before calling handleLeaderChange.
The RecordsBatchReader implementation is also changed to include control records. This makes it possible for the state machine learn about committed control records. This additional information can be used to compute the committed offset or for counting those bytes when determining when to snapshot the partition.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
Reusing an admin client across tests can cause false positives in leak checkers, so don't do it.
Reviewers: Divij Vaidya <diviv@amazon.com>, Matthias J. Sax <matthias@confluent.io>
As described in the KIP here the default value of remote.log.metadata.manager.class.name should be TopicBasedRemoteLogMetadataManager
Reviewers: Luke Chen <showuon@gmail.com>, Kamal Chandraprakash <kchandraprakash@uber.com>, Divij Vaidya <diviv@amazon.com>
As title, we discovered a flaky bug during testing that the commit request manager would seldomly throw a NOT_COORDINATOR exception, which means the request was routed to a non-coordinator node. We discovered that if we don't check the coordinator node in the commitRequestManager, the request manager will pass on an empty node to the NetworkClientDelegate, which implies the request can be sent to any node in the cluster. This behavior is incorrect as the commit requests need to be routed to a coordinator node.
Because the timing coordinator's discovery during integration testing isn't entirely deterministic; therefore, the test became extremely flaky. After fixing this: The coordinator node is mandatory before attempt to enqueue these commit request to the NetworkClient.
Reviewers: Jun Rao <junrao@gmail.com>
This patch is a follow up of #14169 that installs the metadata publishers before blocking on the authorizer future.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
This patch fixes an issue for ZK controllers where we were emitting the ZkMigrationState enum
rather than a value. This can lead to downstream failures with JMX metrics since the RMI protocol
will marshal the ZkMigrationState object returned by the gauge. Any downstream consumer of this
metric (like jconsole or a metrics exporter) will not be able to unmarshal the value since the
ZkMigrationState class will not be present.
The fix is simply to emit the byte value of this enum.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Alok Thatikunta <athatikunta@confluent.io>
Add config validation which verifies that system level remote storage is enabled when enabling remote storage for a topic. In case verification fails, it throws InvalidConfigurationException.
Reviewers: Christo Lolov <lolovc@amazon.com>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>