Docker tests rely on docker compose. In recent runs it has been observed that github actions does not provide support for docker compose, so we are installing it explicitly in the workflow.
When Kafka Streams skips overs corrupted messages, it might not resume previously paused partitions,
if more than one record is skipped at once, and if the buffer drop below the max-buffer limit at the same time.
Reviewers: Matthias J. Sax <matthias@confluent.io>
This is a cherry-pick of #17258 to 3.7.2
This commit differs from the original by using the old (read 3.7) references to the configurations and not changing as many unit tests
Reviewers: Divij Vaidya <diviv@amazon.com>, Colin Patrick McCabe <cmccabe@apache.org>
This is a backport of #17686 merged to trunk and cherry-picked to 3.9. Need to do a standalone PR due to merge conflicts.
Reviewers: Matthias Sax <mjsax@apache.org>
TimestampExtractor allows to drop records by returning a timestamp of -1. For this case, we still need to update consumed offsets to allows us to commit progress.
Reviewers: Bill Bejeck <bill@confluent.io>
Each KafkaStreams instance maintains a map from threadId to state
to use to aggregate to a KafkaStreams app state. The map is updated
on every state change, and when a new thread is created. State change
updates are done in a synchronized blocks, however the update that
happens on thread creation is not, which can raise
ConcurrentModificationException. This patch moves this update
into the listener object and protects it using the object's lock.
It also moves ownership of the state map into the listener so that
its less likely that future changes access it without locking
Reviewers: Matthias J. Sax <matthias@confluent.io>
When client telemetry is configured in a cluster, Kafka producers and consumers push metrics to the brokers periodically. There is a special push of metrics that occurs when the client is terminating. A state machine in the client telemetry reporter controls its behaviour in different states.
Sometimes, when a client was terminating, it was attempting an invalid state transition from TERMINATING_PUSH_IN_PROGRESS to PUSH_NEEDED when it receives a response to a PushTelemetry RPC. This was essentially harmless because the state transition did not occur but it did cause unsightly log lines to be generated. This PR performs a check for the terminating states when receiving the response and simply remains in the current state.
I added a test to validate the state management in this case. Actually, the test passes before the code change in the PR, but with unsightly log lines.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Apoorv Mittal <amittal@confluent.io>
There is a race condition between KRaftMigrationDriver running its first poll() and being notified by Raft about a leader change. If onControllerChange is called before RecoverMigrationStateFromZKEvent is run, we will end up getting stuck in the INACTIVE state.
This patch fixes the race by enqueuing a RecoverMigrationStateFromZKEvent from onControllerChange if the driver has not yet initialized. If another RecoverMigrationStateFromZKEvent was already enqueued, the second one to run will just be ignored.
Reviewers: Luke Chen <showuon@gmail.com>
(cherry picked from commit 166d9e8)
KAFKA-15751 and KAFKA-15752 enable the kraft mode in SaslSslAdminIntegrationTest, and that is useful in writing security-related IT. Without the backport, the fixes to security could be obstructed from backport due to IT (KAFKA-17315, for example).
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Fix the flakiness of LogDirFailureTest by setting a separate metadata.log.dir for brokers in KRAFT mode.
The test was flaky because as we call causeLogDirFailure some times we impact the first log.dir which also is KafkaConfig.metadataLogDir as we don't have metadata.log.dir. So to fix the flakiness we need to explicitly set metadata.log.dir to diff log dir than the ones we could potentially fail for the tests.
This is part 1 of the fixes. Delivering them separately as the other issues were not as clear cut.
Reviewers: Gaurav Narula <gaurav_narula2@apple.com>, Justine Olshan <jolshan@confluent.io>, Greg Harris <greg.harris@aiven.io>
This patch raises the minimum MetadataVersion for migrations to 3.6-IV1 (metadata transactions). This is only enforced on the controller during bootstrap (when the log is empty). If the log is not empty on controller startup, as in the case of a software upgrade, we allow the migration to continue where it left off.
The broker will log an ERROR message if migrations are enabled and the IBP is not at least 3.6-IV1.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
1) When the local.retention.ms/bytes is set to -2, we didn't replace it with the server-side retention.ms/bytes config, so the -2 local retention won't take effect.
2) When setting retention.ms/bytes to -2, we can notice this log message:
```
Deleting segment LogSegment(baseOffset=10045, size=1037087, lastModifiedTime=1724040653922, largestRecordTimestamp=1724040653835) due to local log retention size -2 breach. Local log size after deletion will be 13435280. (kafka.log.UnifiedLog) [kafka-scheduler-6]
```
This is not helpful for users. We should replace -2 with real retention value when logging.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
In MetadataVersion 3.7-IV2 and above, the broker's AssignmentsManager sends an RPC to the
controller informing it about which directory we have chosen to place each new replica on.
Unfortunately, the code does not check to see if the topic still exists in the MetadataImage before
sending the RPC. It will also retry infinitely. Therefore, after a topic is created and deleted in
rapid succession, we can get stuck including the now-defunct replica in our subsequent
AssignReplicasToDirsRequests forever.
Reviewers: Igor Soarez <i@soarez.me>, Ron Dagostino <rndgstn@gmail.com>
Conflicts: the original PR in trunk and 3.9 was large and fixed some other issues, like batching.
In order to avoid too much disruption to this older branch, this cherry-pick is minimal and just
stops retrying if the AssignmentsManager receives Errors.UNKNOWN_TOPIC_ID from the controller.
* KAFKA-17227: Update zstd-jni lib
* Add note in upgrade docs
* Change zstd-jni version in docker native file and add warning in dependencies.gradle file
* Add reference to snappy in upgrade
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Mickael Maison <mickael.maison@gmail.com>
Currently, there are few document files generated automatically like the task genConnectMetricsDocs
However, the unwanted log information also added into it.
And the format is not aligned with other which has Mbean located of the third column.
I modified the code logic so the format could follow other section in ops.html
Also close the log since we take everything from the std as a documentation
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
When broker configuration is incompatible with the current Metadata Version the Broker should log an error-level message but avoid shutting down.
Reviewers: Luke Chen <showuon@gmail.com>
* Fix imports as the config files were moved after the 3.7 release.
* MINOR: Add integration tag to AdminFenceProducersIntegrationTest (#16326)
Add @tag("integration") to AdminFenceProducersIntegrationTest
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Mickael Maison <mickael.maison@gmail.com>, Edoardo Comar <ecomar@uk.ibm.com>
KAFKA-16570 FenceProducers API returns "unexpected error" when successful
* Client handling of ConcurrentTransactionsException as retriable
* Unit test
* Integration test
Reviewers: Chris Egerton <chrise@aiven.io>, Justine Olshan <jolshan@confluent.io>
JBOD Brokers keep the Controller up to date with replica-to-directory
placement via AssignReplicasToDirsRequest. These requests are queued,
compacted and sent by AssignmentsManager.
The Controller returns the error NOT_LEADER_OR_FOLLOWER when handling
a AssignReplicasToDirsRequest from a broker that is not a replica.
A partition reassignment can take place, removing the Broker
as a replica before the AssignReplicasToDirsRequest successfully
reaches the Controller. AssignmentsManager retries failed
requests, and will continuously try to propagate this assignment,
until the Broker either shuts down, or is added back as a replica.
When encountering a NOT_LEADER_OR_FOLLOWER error, AssignmentsManager
should assume that the broker is no longer a replica, and stop
trying to propagate the directory assignment for that partition.
Reviewers: Luke Chen <showuon@gmail.com>
KAFKA-16606 (#15834) introduced a change that broke
ReassignPartitionsCommandTest.testReassignmentCompletionDuringPartialUpgrade.
The point was to validate that the MetadataVersion supports JBOD
in KRaft when multiple log directories are configured.
We do that by checking the version used in
kafka-features.sh upgrade --metadata, and the version discovered
via a FeatureRecord for metadata.version in the cluster metadata.
There's no point in checking inter.broker.protocol.version in
KafkaConfig, since in KRaft, that configuration is deprecated
and ignored — always assuming the value of MINIMUM_KRAFT_VERSION.
The broken that was broken sets inter.broker.protocol.version in
KRaft mode and configures 3 directories. So alternatively, we
could change the test to not configure this property.
Since the property isn't forbidden in KRaft mode, just ignored,
and operators may forget to remove it, it seems better to remote
the fail condition in KafkaConfig.
Reviewers: Luke Chen <showuon@gmail.com>
Support for multiple log directories in KRaft exists from
MetataVersion 3.7-IV2.
When migrating a ZK broker to KRaft, we already check that
the IBP is high enough before allowing the broker to startup.
With KIP-584 and KIP-778, Brokers in KRaft mode do not require
the IBP configuration - the configuration is deprecated.
In KRaft mode inter.broker.protocol.version defaults to
MetadataVersion.MINIMUM_KRAFT_VERSION (IBP_3_0_IV1).
Instead KRaft brokers discover the MetadataVersion by reading
the "metadata.version" FeatureLevelRecord from the cluster metadata.
This change adds a new configuration validation step upon discovering
the "metadata.version" from the cluster metadata.
Reviewers: Mickael Maison <mickael.maison@gmail.com>