OCF RA: Do not consider local failures as remote node problems

In is_clustered_with(), commands that we run to check if the node is
clustered with us, or partitioned with us may fail. When they fail, it
actually doesn't tell us anything about the remote node.

Until now, we were considering such failures as hints that the remote
node is not in a sane state with us. But doing so has pretty negative
impact, as it can cause rabbitmq to get restarted on the remote node,
causing quite some disruption.

So instead of doing this, ignore the error (it's still logged).

There was a comment in the code wondering what is the best behavior;
based on experience, I think preferring stability is the slightly more
acceptable poison between the two options.
This commit is contained in:
Vincent Untz 2017-12-13 12:34:31 +01:00
parent b3925d446d
commit 056f7ed2ec
1 changed files with 4 additions and 4 deletions

View File

@ -870,8 +870,8 @@ is_clustered_with()
rc=$?
if [ "$rc" -ne 0 ]; then
ocf_log err "${LH} Failed to check whether '$node_name' is considered running by us"
# XXX Or should we give remote node benefit of a doubt?
return 1
# We had a transient local error; that doesn't mean the remote node is
# not part of the cluster, so ignore this
elif [ "$seen_as_running" != true ]; then
ocf_log info "${LH} Node $node_name is not running, considering it not clustered with us"
return 1
@ -882,8 +882,8 @@ is_clustered_with()
rc=$?
if [ "$rc" -ne 0 ]; then
ocf_log err "${LH} Failed to check whether '$node_name' is partitioned with us"
# XXX Or should we give remote node benefit of a doubt?
return 1
# We had a transient local error; that doesn't mean the remote node is
# partitioned with us, so ignore this
elif [ "$seen_as_partitioned" != false ]; then
ocf_log info "${LH} Node $node_name is partitioned from us"
return 1