From the coordinator's POV each stream has a unique id consisting of the
vhost, queuename and a high resolution timestamp even if several stream ids
relate to the same queue record.
When performing the mnesia update the coordinator now checks that the current stream id
matches that of the update_mnesia action and does not change the queue record if
the stream id is not the same.
This should avoid "old" incarnations of a stream queue updating newer ones
with incorrect information.
When the suite passes, it's about 120 seconds total, so 5 minutes per
case seems to be too much. Additionally, if the suite times out at the
bazel level, we get no logs, so the cause of the timeout is unclear.
Avoids multiple calls to `application:get_env` which can be very expensive.
Also limits filter to vhost_msg_stats, as queue_msg_stats are required for
individual queue metrics
So that a reply is sent to the caller immediately after the command has
been processed as intended. Previously it was possible if reply_to was
already set that a reply never was sent to the caller and the caller
times out. This should improve some flakyness in the rabbit_stream_queue suite
as well.
Strictly this is a change that introduces indeterminism in the coordinator
state machine as during an upgrade different members may run different code
for this command. But as this state only affects side effects (replies) and
the state for the streams affected will shortly be removed this is very
unlikely to cause any real issues.
Fixes#2941
This adds proper exception handlers in the right places. And tests
ensure that it indeed provides nice neat logs without large
stacktraces for every amqp operation.
Unnecessary checking for subscribe permissions on topic was dropped,
as `queue.bind` does exactly the same check. Topic permissions tests
were also added, and they indeed confirm that there was no change in
behaviour.
Ideally the same explicit topic permission check should be dropped for
publishing, but it's more complicated - so for now there only a
detailed comment in the source code explaining it.
A few other things were also optimized away:
- Using amqp client to test for queue existence
- Creating queues/starting consumptions too eagerly, even if not yet
requested by client
debugging of situation where messages may be stuck.
Also cancel rabbit_fifo_client timer after message resend to avoid
resending them again when the timer triggers.
Some plugins might create internal queues that should not be accounted
for the total number of messages on the system. These can now be filtered
out using a regular expression on the queue name. Individual queue stats
are still available