Key changes:
- endpoint variable to handle scraping multiple endpoints
- message size panels (new metric in 4.1)
- panels at the top of the Overview dashboard should be more up to date
(they show the latest value)
- values should be accurate if multiple endpoints are scraped
(previously, many would be doubled)
- Nodes table shows fewer volumns and shows node uptime
- Modified metric expression and legend format in State of distribution links
- Changed panel type from 'flant-statusmap-panel' to 'status-history' for Process state
- Updated metric expressions to include instance filtering with {instance=\"$node\"}
for the following metrics:
- erlang_vm_statistics_run_queues_length
- erlang_vm_statistics_dirty_io_run_queue_length
- erlang_vm_statistics_dirty_cpu_run_queue_length
- Added 'DS_PROMETHEUS' as a templated data source variable
* Add BEAM dashboard
Also update the other dashboards by opening in Grafana v11.2.2 and ensuring they work as expected.
* Update the Erlang-Distributions-Compare dashboard
* Update the RabbitMQ-Overview dashboard
* Update the RabbitMQ-Quorum-Queues-Raft dashboard
* Update the RabbitMQ-Stream dashboard
* Update distribution link status panel
---------
Co-authored-by: Michal Kuratczyk <mkuratczyk@vmware.com>
* Fix broken dashboards if detailed metrics are used
If detailed metrics are pulled into the same prometheus, then
we get an error in Grafana:
execution: many-to-many matching not allowed:
matching labels must be unique on one side
This is because both endpoints provide `rabbit_identity_info`
which is not unique to the endpoint.
* add detailed metric scraper to prometheus config
---------
Co-authored-by: Michal Kuratczyk <michal.kuratczyk@broadcom.com>
Before this commit, importing the dashboard via ConfigMap as seen in
1eb1dc618e
didn't work because DS_PROMETHEUS variable was undefined in Grafana.
Related to https://github.com/rabbitmq/rabbitmq-server/pull/3250
Co-authored-by: Gerhard Lazu <gerhard@lazu.co.uk>
This breaks the docker-compose integration, but we need to move away
from it anyways, the whole dev flow needs revisiting after our focus on
K8s.
$__rate_interval does not work with irate, dropping it in favour of 60s,
same as all other dashboards.
This is a follow-up to https://github.com/rabbitmq/rabbitmq-server/pull/3250
Thanks @ansd for mentioning about the post-import issues.
It was uploaded as https://grafana.com/api/dashboards/14798/revisions/3/download
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
It uses the commercial edition of RabbitMQ, requires a valid Tanzu
Network account. Learn more: https://rabbitmq.com/tanzu
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
So that clusters with the same rabbitmq_cluster name in different K8s namespaces don't clash
Namespace filter comes first, because the order of the layers is namespace -> cluster -> node
Tested with the latest 3.9.0 dev build
We had to account for plugin changes from .ez to directories & the management.load_definitions deprecation which would prevent a node from booting (fixed in 07a0dd7438). This commit didn't make it through the 3.9.x pipeline yet, so there is no 3.9.0 dev build with this fix yet. The simplest fix is to drop `management.` from the load_definitions config.
The next manual step is to generate all dashboards using e.g. `make RabbitMQ-Overview.json > ~/Downloads/RabbitMQ-Overview.json` and upload them to https://grafana.com/orgs/rabbitmq
Great contribution @ansd, thank you 👏🏻
In the case where there are 0 channels (and as such 0 publishers), the
dashboard reports there are actually `n` publishers in an `n`-node
cluster. This changes the calculation of publishers to be number of
channels (which is always known) minus the number of consumers (which is
always known).
open http://localhost:3000/dashboards # select Erlang-Distribution
e > Metrics; General > Description # when on Data buffered in the distribution links queue
Save Dashboard > Export > +Export for sharing externally > Save to file
pwd
/Users/gerhard/github.com/rabbitmq/3.9.x/deps/rabbitmq_prometheus
vimdiff docker/grafana/dashboards/Erlang-Distribution.json ~/Downloads/Erlang-Distribution*.json
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
This will make future diffs smaller
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
(cherry picked from commit e1a08d6ae752181177cbcc411219a8dd780359d2)
Otherwise the singlestat panel will return a 'Only queries that return
single series/table is supported' error if the node changed some
properties, like the instance label because the IP changed.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Hiding "all other" values stopped working since Grafana v6.6.1, need to
be explicit about which values should be hidden. Picked up a few other
changes from Grafana after Save JSON to file.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
This is a follow-up to https://github.com/rabbitmq/ra/pull/160
Had to introduce mf_convert/3 so that METRICS_REQUIRING_CONVERSIONS
proplist does not clash with METRICS_RAW proplists that have the same
number of elements. This is begging to be refactored, but I know that
@dcorbacho is working on https://github.com/rabbitmq/rabbitmq-prometheus/issues/26
Also modified the RabbitMQ-Quorum-Queues-Raft dashboard
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Grafan will keep failing with the following error message otherwise:
failed to load dashboard from /dashboards/__inputs.json Dashboard title cannot be empty
It captures the Quorum-Queues Raft, so let's be specific, especially
since we know that there will be other Raft implementations in RabbitMQ,
not just Quorum Queues.
[#166926415]