It uses the commercial edition of RabbitMQ, requires a valid Tanzu
Network account. Learn more: https://rabbitmq.com/tanzu
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
So that clusters with the same rabbitmq_cluster name in different K8s namespaces don't clash
Namespace filter comes first, because the order of the layers is namespace -> cluster -> node
Tested with the latest 3.9.0 dev build
We had to account for plugin changes from .ez to directories & the management.load_definitions deprecation which would prevent a node from booting (fixed in 07a0dd7438). This commit didn't make it through the 3.9.x pipeline yet, so there is no 3.9.0 dev build with this fix yet. The simplest fix is to drop `management.` from the load_definitions config.
The next manual step is to generate all dashboards using e.g. `make RabbitMQ-Overview.json > ~/Downloads/RabbitMQ-Overview.json` and upload them to https://grafana.com/orgs/rabbitmq
Great contribution @ansd, thank you 👏🏻
Regardless of the value of `return_per_object_metrics`, this endpoint
always returns per-object metrics. This allows scraping both endpoints
at different intervals or scraping per-object metrics only during
debugging.
Co-authored-by: Mirah Gary <mgary@vmware.com>
In the case where there are 0 channels (and as such 0 publishers), the
dashboard reports there are actually `n` publishers in an `n`-node
cluster. This changes the calculation of publishers to be number of
channels (which is always known) minus the number of consumers (which is
always known).
Context: we want to move away from environment variables and use either
config files or env files (such as the rabbitmq-env.conf).
Since .erlang.cookie is neither, the official RabbitMQ Docker image
handles this by writing the value from the RABBITMQ_ERLANG_COOKIE env
var into the file if it does not exist. The problem is that if this file
exists, and the value is different from the RABBITMQ_ERLANG_COOKIE env
var, CLI tools will not be able to communicate with the rabbit node, as
described here: https://github.com/rabbitmq/rabbitmq-cli/issues/443
The only gotcha is that this file must be owned by the user, and
privileges should not be too open (git should have captured this). If
not, RabbitMQ will fail to boot. This is somewhat similar to how OpenSSH
reacts when private key permissions are too open.
re https://github.com/docker-library/rabbitmq/pull/422#issuecomment-650074731
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
open http://localhost:3000/dashboards # select Erlang-Distribution
e > Metrics; General > Description # when on Data buffered in the distribution links queue
Save Dashboard > Export > +Export for sharing externally > Save to file
pwd
/Users/gerhard/github.com/rabbitmq/3.9.x/deps/rabbitmq_prometheus
vimdiff docker/grafana/dashboards/Erlang-Distribution.json ~/Downloads/Erlang-Distribution*.json
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Have to force prometheus.erl to a version that does not have this
feature, otherwise the test would succeed.
pwd
/Users/gerhard/github.com/rabbitmq/3.9.x/deps/rabbitmq_prometheus
rm -fr ../prometheus.erl
make tests
open logs/index.html
Pull request content:
Expose & visualise distribution buffer busy limit - zdbbl
> This will be closed after TGIR S01E04 gets recorded.
> The goal is to demonstrate how to do this, and then let an external contributor have a go.
Before this patch, the **Data buffered in the distribution links queue** graph was empty.
This is what that graph looks like after this gets applied:

## References
- [RabbitMQ Runtime Tuning - Inter-node Communication Buffer Size](https://www.rabbitmq.com/runtime.html#distribution-buffer)
- [erl +zdbbl](https://erlang.org/doc/man/erl.html#+zdbbl)
Fixes#39
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
They are the product name & version. They are added at the beginning of
`rabbitmq_build_info` if they are set. If they are not, the content of
`rabbitmq_build_info` is the same as before.
This will make future diffs smaller
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
(cherry picked from commit e1a08d6ae752181177cbcc411219a8dd780359d2)
Hard-coding number of dashboards, because there is no API for
https://grafana.com/orgs/rabbitmq/dashboards, and setting up our own
JSON endpoint to return this information would be overkill.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Supporting n-1 Erlang/OTP versions: v21 & v22 and taking into account
RabbitMQ's Erlang Version Requirements:
https://www.rabbitmq.com/which-erlang.html
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Otherwise the singlestat panel will return a 'Only queries that return
single series/table is supported' error if the node changed some
properties, like the instance label because the IP changed.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Given that we test Linux distributions and upgrades after this artefact
is produced, it's quicker to get the latest RabbitMQ generic-unix dev
build into a Docker image without waiting on tests that are less
relevant & useful for the audience of this image.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Hiding "all other" values stopped working since Grafana v6.6.1, need to
be explicit about which values should be hidden. Picked up a few other
changes from Grafana after Save JSON to file.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
It activates and extra graph on the RabbitMQ-Overview dashboard and
let's be honest - why use Quorum Queues if the workload didn't care
whether the broker received the message? They go together, seriously!
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Since metrics are now aggregated by default, it made more sense to use
the inverse meaning of disabling aggregation, and call it a positive and
explicit action: return_per_object_metrics.
Naming pair: @michaelklishin
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Having talked to @michaelklishin we've decided to enable metrics
aggregation by default so that RabbitMQ nodes with many objects serve
the same amount of metrics quickly rather than taking many seconds and
transferring many MBs of data on every scrape.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
We want to keep the same metric type regardless whether we aggregate or
don't. If we had used a histogram type, considering the ~12 buckets that
we added, it would have meant 12 extra metrics per queue which would
have resulted in an explosion of metrics. Keeping the gauge type and
aggregating latencies across all members.
re https://github.com/rabbitmq/rabbitmq-prometheus/pull/28
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
This is a follow-up to https://github.com/rabbitmq/ra/pull/160
Had to introduce mf_convert/3 so that METRICS_REQUIRING_CONVERSIONS
proplist does not clash with METRICS_RAW proplists that have the same
number of elements. This is begging to be refactored, but I know that
@dcorbacho is working on https://github.com/rabbitmq/rabbitmq-prometheus/issues/26
Also modified the RabbitMQ-Quorum-Queues-Raft dashboard
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Grafan will keep failing with the following error message otherwise:
failed to load dashboard from /dashboards/__inputs.json Dashboard title cannot be empty