While errors are detected with '-e' shell option in this script,
diagnostics messages leave a lot to be desired.
E.g. when trying to write pid file to full partition, the only message
in log is:
sh: echo: I/O error
Which is definitely insufficient
* Add ocf_run wrappers and info log messages for CIB attribute events
* Move "fast" CIB attribute updates before "heavy" operations like
start/stop/wait to ensure CIB consistent even if the timeouts
exceeded for the ops
* Delete master and start time attributes from CIB on action_start
to ensure the correct rabbit nodes uptime evaluation for new
master elections for corresponding pacemaker resources
* For post-demote notify and action_demote() delete the master
attribute from CIB as well.
* For post-start notify, update the start time in the CIB even when
the node is already clustered. Otherwise it would remain running
in cluster w/o the start time registered, which affects the new
master elections badly.
* fix wrong log message when joining by a node
Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1530150https://bugs.launchpad.net/fuel/+bug/1530296
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* Fix the get_status() unexpectedly reports generic error
instead of "not running"
* Add proc_stop and proc_kill functions
(TODO these shall go as external common ocf heplers, eventually)
* Rework stop_server_process()
- make it to return SUCCESS/ERROR as expected
- grant the "rabbitmqctl stop" a graceful termintation window and only
then ensure the beam process termination and pidfile removal as well
- return the actual status with get_status()
* Rework kill_rmq_and_remove_pid()
- use proc_stop to try to kill by pgrp with -TERM, then -KILL, or
by the beam process name match, if there is no PID.
- make it to returns SUCCESS/ERROR
* Fix action_stop()
- fail early by the stop_server_process() results without additional
rabbitmqctl invocations in the get_status() call
- rework hard-coded sleep 10 to use the gracefull stop windows in the
stop_server_process() instead
- ensure the rabbit-start-time removal from CIB before to try to stop
the server process
- issue the "stop: action end" log record before the actual end
* Add comments and make logs to be more informational
Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1529897
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Alex Schultz <aschultz@mirantis.com>
This is a follow-up commit to ed276656. This fixes the following crash:
=ERROR REPORT==== 18-Oct-2015::00:51:30 ===
** Generic server <0.966.0> terminating
** Last message in was {'DOWN',#Ref<0.0.3.4250>,process,<21148.716.0>,
shutdown}
** When Server state == {state,
...
{true,{shutdown,ring_shutdown}}}
** Reason for termination ==
** {function_clause,[{orddict,fetch,
[{1,<0.966.0>},[]],
[{file,"orddict.erl"},{line,80}]},
{gm,check_neighbours,1,[{file,"src/gm.erl"},{line,1243}]},
Submitted by Alvaro Videla (@videlalvaro).
Fixes#368.