Commit Graph

20 Commits

Author SHA1 Message Date
Michele Baldessari cf039f9a54 Allow rabbitmq to run in a larger cluster composed of also non-rabbitmq nodes
We introduce the OCF_RESKEY_allowed_cluster_node parameter which can be used to specify
which nodes of the cluster rabbitmq is expected to run on. When this variable is not
set the resource agent assumes that all nodes of the cluster (output of crm_node -l)
are eligible to run rabbitmq. The use case here is clusters that have a large
numbers of node, where only a specific subset is used for rabbitmq (usually this is
done with some constraints).

Tested in a 9-node cluster as follows:
[root@messaging-0 ~]# pcs resource config rabbitmq
 Resource: rabbitmq (class=ocf provider=rabbitmq type=rabbitmq-server-ha)
  Attributes: allowed_cluster_nodes="messaging-0 messaging-1 messaging-2" avoid_using_iptables=true
  Meta Attrs: container-attribute-target=host master-max=3 notify=true ordered=true
  Operations: demote interval=0s timeout=30 (rabbitmq-demote-interval-0s)
              monitor interval=5 timeout=30 (rabbitmq-monitor-interval-5)
              monitor interval=3 role=Master timeout=30 (rabbitmq-monitor-interval-3)
              notify interval=0s timeout=20 (rabbitmq-notify-interval-0s)
              promote interval=0s timeout=60s (rabbitmq-promote-interval-0s)
              start interval=0s timeout=200s (rabbitmq-start-interval-0s)
              stop interval=0s timeout=200s (rabbitmq-stop-interval-0s)

[root@messaging-0 ~]# pcs status |grep -e rabbitmq -e messaging
  * Online: [ controller-0 controller-1 controller-2 database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 ]
...
  * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]:
    * rabbitmq-bundle-0 (ocf::rabbitmq:rabbitmq-server-ha):      Master messaging-0
    * rabbitmq-bundle-1 (ocf::rabbitmq:rabbitmq-server-ha):      Master messaging-1
    * rabbitmq-bundle-2 (ocf::rabbitmq:rabbitmq-server-ha):      Master messaging-2
2021-02-28 15:51:39 +01:00
Michele Baldessari 6c33da543b Allow operator to disable iptables client blocking
Currently the resource agent hard-codes iptables calls to block off
client access before the resource becomes master. This was done
historically because many libraries were fairly buggy detecting a
not-yet functional rabbitmq, so they were being helped by getting
a tcp RST packet and they would go on trying their next configured
server.

It makes sense to be able to disable this behaviour because
most libraries by now have gotten better at detecting timeouts when
talking to rabbit and because when you run rabbitmq inside a bundle
(pacemaker term for a container with an OCF resource inside) you
normally do not have access to iptables.

Tested by creating a three-node bundle cluster inside a container:
 Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]
   Replica[0]
      rabbitmq-bundle-podman-0  (ocf:💓podman):        Started controller-0
      rabbitmq-bundle-0 (ocf::pacemaker:remote):        Started controller-0
      rabbitmq  (ocf::rabbitmq:rabbitmq-server-ha):     Master rabbitmq-bundle-0
   Replica[1]
      rabbitmq-bundle-podman-1  (ocf:💓podman):        Started controller-1
      rabbitmq-bundle-1 (ocf::pacemaker:remote):        Started controller-1
      rabbitmq  (ocf::rabbitmq:rabbitmq-server-ha):     Master rabbitmq-bundle-1
   Replica[2]
      rabbitmq-bundle-podman-2  (ocf:💓podman):        Started controller-2
      rabbitmq-bundle-2 (ocf::pacemaker:remote):        Started controller-2
      rabbitmq  (ocf::rabbitmq:rabbitmq-server-ha):     Master rabbitmq-bundle-2

The ocf resource was created inside a bundle with:
pcs resource create rabbitmq ocf:rabbitmq:rabbitmq-server-ha avoid_using_iptables="true" \
  meta notify=true container-attribute-target=host master-max=3 ordered=true \
  op start timeout=200s stop timeout=200s promote timeout=60s bundle rabbitmq-bundle

Signed-off-by: Michele Baldessari <michele@acksyn.org>
2020-01-31 08:26:39 +01:00
Spring Operator 8bcebe2185 URL Cleanup
This commit updates URLs to prefer the https protocol. Redirects are not followed to avoid accidentally expanding intentionally shortened URLs (i.e. if using a URL shortener).

# Fixed URLs

## Fixed Success
These URLs were switched to an https URL with a 2xx status. While the status was successful, your review is still recommended.

* [ ] http://www.apache.org/licenses/LICENSE-2.0 with 1 occurrences migrated to:
  https://www.apache.org/licenses/LICENSE-2.0 ([https](https://www.apache.org/licenses/LICENSE-2.0) result 200).
2019-03-21 03:25:18 -05:00
Michele Baldessari c587ba79eb Use ocf_attribute_target instead of crm_node
Instead of calling crm_node directly it is preferrable to use the
ocf_attribute_target function. This function will return crm_node -n
as usual, except when run inside a bundle (aka container in pcmk
language). Inside a bundle it will return the bundle name or, if the
meta attribute meta_container_attribute_target is set to 'host', it
will return the physical node name where the bundle is running.

Typically when running a rabbitmq cluster inside containers it is
desired to set 'meta_container_attribute_target=host' on the rabbit
cluster resource so that the RA is aware on which host it is running.

Tested both on baremetal (without containers):
 Master/Slave Set: rabbitmq-master [rabbitmq]
     Masters: [ controller-0 controller-1 controller-2 ]

And with bundles as well.

Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
2018-11-19 22:06:23 +01:00
Vincent Untz 056f7ed2ec OCF RA: Do not consider local failures as remote node problems
In is_clustered_with(), commands that we run to check if the node is
clustered with us, or partitioned with us may fail. When they fail, it
actually doesn't tell us anything about the remote node.

Until now, we were considering such failures as hints that the remote
node is not in a sane state with us. But doing so has pretty negative
impact, as it can cause rabbitmq to get restarted on the remote node,
causing quite some disruption.

So instead of doing this, ignore the error (it's still logged).

There was a comment in the code wondering what is the best behavior;
based on experience, I think preferring stability is the slightly more
acceptable poison between the two options.
2017-12-20 10:24:21 +01:00
Vincent Untz ea745e62c4
OCF RA: Fix syntax error
(cherry picked from commit a9b4a4ff97a96e798de51933fc44f61aa6bc88a3)
2017-12-14 07:07:02 +03:00
Michael Klishin 7e93369f0c
Merge pull request #64 from vuntz/ocf-fix-notify-start
OCF RA: Fix various issues with start notification handler
2017-12-12 19:19:39 +03:00
Vincent Untz a6dc3f91b0 OCF RA: Fix logging in start notification handler
The "post-start end" log message was written too early (some things were
still done afterwards), and not in all cases (it was inside a if
statement).
2017-12-08 14:17:38 +01:00
Vincent Untz 2f284bf595 OCF RA: Do not start rabbitmq if notification of start is not about us
Right now, every time we get a start notification, all nodes will ensure
the rabbitmq app is started. This makes little sense, as nodes that are
already active don't need to do that.

On top of that, this had the sideeffect of updating the start time for
each of these nodes, which could result in the master moving to another
node.
2017-12-08 14:15:24 +01:00
Vincent Untz a8e7a62513 OCF RA: Fix test for no node in start notification handler
If there's nothing starting and nothing active, then we do a -z " ",
which doesn't have the same result as -z "". Instead, just test for
emptiness for each set of nodes.
2017-12-08 14:13:59 +01:00
Vincent Untz 62a4f75611 OCF RA: Avoid promoting nodes with same start time as master
It may happen that two nodes have the same start time, and one of these
is the master. When this happens, the node actually gets the same score
as the master and can get promoted. There's no reason to avoid being
stable here, so let's keep the same master in that scenario.
2017-12-08 13:32:45 +01:00
Michael Klishin 0da346eb88 Merge pull request #21 from vuntz/ocf-limit_nofile
OCF RA: Add new limit_nofile parameter to both OCF resource agents
2017-04-05 17:49:34 +03:00
Vincent Untz 73080ac783 OCF RA: Only set limit for open files when higher than current value
This allows to set the limit via some other way.
2017-04-04 15:13:52 +02:00
Michael Klishin 91ffc30b66 Merge pull request #24 from vuntz/ocf-vhost
OCF RA: Add vhost parameter to rabbitmq-server-ha.ocf
2017-04-04 16:11:08 +03:00
Vincent Untz 89d65b51aa OCF RA: Add new limit_nofile parameter to rabbitmq-server-ha OCF RA
This enables to change the limit of open files, as the default on
distributions is usually too low for rabbitmq. Default is 65535.
2017-04-04 15:08:51 +02:00
Vincent Untz 525eaba13a OCF RA: Add default_vhost parameter to rabbitmq-server-ha.ocf
This enables the cluster to focus on a vhost that is not /, in case the
most important vhost is something else.

For reference, other vhosts may exist in the cluster, but these are not
guaranteed to not suffer from any data loss. This patch doesn't address
this issue.

Closes https://github.com/rabbitmq/rabbitmq-server-release/issues/22
2017-04-04 14:41:50 +02:00
Vincent Untz 9bd1b0a5f3 OCF RA: Don't hardcode primitive name in rabbitmq-server-ha.ocf
We can compute the name of the primitive automatically from environment
variables, instead of hard-coding p_rabbitmq-server; this makes the
resource agent more flexible.

Closes https://github.com/rabbitmq/rabbitmq-server-release/issues/23
2017-03-31 13:24:27 +02:00
Dmitry Mescheryakov 67cdbe3067 Correctly return exit code from stop
Panicking and returning non-success on stop often leads to resource
becoming unmanaged on that node.

Before we called get_status to verify that RabbitMQ is dead. But
sometimes it returns error even though RabbitMQ is not running. There
is no reason to call it - we will just verify that there is no beam
process running.

Related fuel bug - https://bugs.launchpad.net/fuel/+bug/1626933
2016-10-17 19:43:46 +03:00
Alexey Lebedeff 1d564c8746 OCF RA: Check partitions on non-master nodes
Partitions reported by `rabbit_node_monitor:partitions/0` are not
commutative (i.e. node1 can report itself as partitioned with node2, but
not vice versa).

Given that we now have strong notion of master in OCF script, we can
check for those fishy situations during master health check, and order
damaged nodes to restart.

Fuel bug: https://bugs.launchpad.net/fuel/+bug/1628487
2016-09-29 16:13:18 +03:00
Jean-Sébastien Pédron e97ca28ac7
scripts: Take package-specific files from rabbitmq-server
[#130659985]
2016-09-21 16:25:24 +02:00