OCF RA: Do not consider local failures as remote node problems
In is_clustered_with(), commands that we run to check if the node is clustered with us, or partitioned with us may fail. When they fail, it actually doesn't tell us anything about the remote node. Until now, we were considering such failures as hints that the remote node is not in a sane state with us. But doing so has pretty negative impact, as it can cause rabbitmq to get restarted on the remote node, causing quite some disruption. So instead of doing this, ignore the error (it's still logged). There was a comment in the code wondering what is the best behavior; based on experience, I think preferring stability is the slightly more acceptable poison between the two options.
This commit is contained in:
parent
b3925d446d
commit
056f7ed2ec
|
@ -870,8 +870,8 @@ is_clustered_with()
|
|||
rc=$?
|
||||
if [ "$rc" -ne 0 ]; then
|
||||
ocf_log err "${LH} Failed to check whether '$node_name' is considered running by us"
|
||||
# XXX Or should we give remote node benefit of a doubt?
|
||||
return 1
|
||||
# We had a transient local error; that doesn't mean the remote node is
|
||||
# not part of the cluster, so ignore this
|
||||
elif [ "$seen_as_running" != true ]; then
|
||||
ocf_log info "${LH} Node $node_name is not running, considering it not clustered with us"
|
||||
return 1
|
||||
|
@ -882,8 +882,8 @@ is_clustered_with()
|
|||
rc=$?
|
||||
if [ "$rc" -ne 0 ]; then
|
||||
ocf_log err "${LH} Failed to check whether '$node_name' is partitioned with us"
|
||||
# XXX Or should we give remote node benefit of a doubt?
|
||||
return 1
|
||||
# We had a transient local error; that doesn't mean the remote node is
|
||||
# partitioned with us, so ignore this
|
||||
elif [ "$seen_as_partitioned" != false ]; then
|
||||
ocf_log info "${LH} Node $node_name is partitioned from us"
|
||||
return 1
|
||||
|
|
Loading…
Reference in New Issue