Ignore shardId updates from replica nodes (#13877)

Close https://github.com/redis/redis/issues/13868 This bug was introduced by https://github.com/redis/redis/pull/13468 To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.
2025-03-30 03:15:04 -04:00 · 2025-03-30 03:15:04 -04:00 · 779a20058b
parent a26774cee1
commit 779a20058b
1 changed files with 11 additions and 17 deletions
--- a/src/cluster.c
+++ b/src/cluster.c
@ -964,30 +964,24 @@ clusterNode *clusterNodeGetSlave(clusterNode *node, int slave_idx) {
 static void updateShardId(clusterNode *node, const char *shard_id) {
    if (shard_id && memcmp(node->shard_id, shard_id, CLUSTER_NAMELEN) != 0) {
-        assignShardIdToNode(node, shard_id, CLUSTER_TODO_SAVE_CONFIG);
+        /* We always make our best effort to keep the shard-id consistent
-
+         * between the master and its replicas:
        /* If the replica or master does not support shard-id (old version),
         * we still need to make our best effort to keep their shard-id consistent.
         *
-         * 1. Master supports but the replica does not.
+         * 1. When updating the master's shard-id, we simultaneously update the
-         *    We might first update the replica's shard-id to the master's randomly
+         *    shard-id of all its replicas to ensure consistency.
-         *    generated shard-id. Then, when the master's shard-id arrives, we must
+         * 2. When updating replica's shard-id, if it differs from its master's shard-id,
-         *    also update all its replicas.
+         *    we discard this replica's shard-id and continue using master's shard-id.
-         * 2. If the master does not support but the replica does.
+         *    This applies even if the master does not support shard-id, in which
-         *    We also need to synchronize the master's shard-id with the replica.
+         *    case we rely on the master's randomly generated shard-id. */
         * 3. If neither of master and replica supports it.
         *    The master will have a randomly generated shard-id and will update
         *    the replica to match the master's shard-id. */
        if (node->slaveof == NULL) {
            assignShardIdToNode(node, shard_id, CLUSTER_TODO_SAVE_CONFIG);
            for (int i = 0; i < clusterNodeNumSlaves(node); i++) {
                clusterNode *slavenode = clusterNodeGetSlave(node, i);
                if (memcmp(slavenode->shard_id, shard_id, CLUSTER_NAMELEN) != 0)
                    assignShardIdToNode(slavenode, shard_id, CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_FSYNC_CONFIG);
            }
-        } else {
+        } else if (memcmp(node->slaveof->shard_id, shard_id, CLUSTER_NAMELEN) == 0) {
-            clusterNode *masternode = node->slaveof;
+            assignShardIdToNode(node, shard_id, CLUSTER_TODO_SAVE_CONFIG);
            if (memcmp(masternode->shard_id, shard_id, CLUSTER_NAMELEN) != 0)
                assignShardIdToNode(masternode, shard_id, CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_FSYNC_CONFIG);
        }
    }
 }