kafka/tests/kafkatest
David Arthur 8aee314a46
KAFKA-16667 Avoid stale read in KRaftMigrationDriver (#15918)
When becoming the active KRaftMigrationDriver, there is another race condition similar to KAFKA-16171. This time, the race is due to a stale read from ZK. After writing to /controller and /controller_epoch, it is possible that a read on /migration is not linearized with the writes that were just made. In other words, we get a stale read on /migration. This leads to an inability to sync metadata to ZK due to incorrect zkVersion on the migration ZNode.

The non-linearizability of reads is in fact documented behavior for ZK, so we need to handle it.

To fix the stale read, this patch adds a write to /migration after updating /controller and /controller_epoch. This allows us to learn the correct zkVersion for the migration ZNode before leaving the BECOME_CONTROLLER state.

This patch also adds a check on the current leader epoch when running certain events in KRaftMigrationDriver. Historically, we did not include this check because it is not necessary for correctness. Writes to ZK are gated on the /controller_epoch zkVersion, and RPCs sent to brokers are gated on the controller epoch. However, during a time of rapid failover, there is a lot of processing happening on the controller (i.e., full metadata sync to ZK and full UMRs sent to brokers), so it is best to avoid running events we know will fail.

There is also a small fix in here to improve the logging of ZK operations. The log message are changed to past tense to reflect the fact that they have already happened by the time the log message is created.

Reviewers: Igor Soarez <soarez@apple.com>
2024-07-15 09:32:06 -04:00
..
benchmarks KAFKA-15155: Follow PEP 8 best practice in Python to check if a container is empty (#13974) 2023-07-11 11:01:50 +02:00
directory_layout KAFKA-15226: Add connect-plugin-path and plugin.discovery system test (#14230) 2023-08-18 15:28:43 -07:00
sanity_checks KAFKA-14760: Move ThroughputThrottler from tools to clients, remove tools dependency from connect-runtime (#13313) 2023-07-20 12:58:48 -07:00
services KAFKA-17096 Fix kafka_log4j_appender.py (#16559) 2024-07-12 22:35:55 +08:00
tests KAFKA-16667 Avoid stale read in KRaftMigrationDriver (#15918) 2024-07-15 09:32:06 -04:00
utils KAFKA-16992: InvalidRequestException: ADD_PARTITIONS_TO_TXN with version 4 which is not enabled when upgrading from kafka (#15971) 2024-05-17 21:35:28 -07:00
__init__.py MINOR: Bump trunk to 3.9.0-SNAPSHOT (#16150) 2024-05-31 16:41:44 +02:00
version.py KAFKA-17083: Update LATEST_STABLE_METADATA_VERSION in system tests (#16533) 2024-07-05 21:29:35 +01:00