kafka/storage
Kamal Chandraprakash a056672f7c
KAFKA-19599: Reduce the frequency of ReplicaNotAvailableException thrown to clients when RLMM is not ready (#20345)
During broker restarts, the topic-based RemoteLogMetadataManager (RLMM)
constructs the state by reading the internal `__remote_log_metadata`
topic. When the partition is not ready to perform remote storage
operations, then ReplicaNotAvailableException thrown back to the
consumer. The clients retries the request immediately.

This results in a lot of FETCH requests on the broker and utilizes the
request handler threads. Using the CountdownLatch to reduce the
frequency of ReplicaNotAvailableException thrown back to the clients.
This will improve the request handler thread usage on the broker.

Previously for one consumer, when RLMM is not ready for a partition,
then ~9K  FetchConsumer requests / sec are received on the broker. With
this  patch, the number of FETCH requests reduced by 95% to 600 / sec.

Reviewers: Lan Ding <isDing_L@163.com>, Satish Duggana
 <satishd@apache.org>
2025-08-20 09:48:57 +05:30
..
api/src KAFKA-19523: Gracefully handle error while building remoteLogAuxState (#20201) 2025-07-23 19:29:31 +05:30
src KAFKA-19599: Reduce the frequency of ReplicaNotAvailableException thrown to clients when RLMM is not ready (#20345) 2025-08-20 09:48:57 +05:30