Commit Graph

205 Commits

Author SHA1 Message Date
Gerhard Lazu 62d82e1660
Break down metrics by node in all RabbitMQ-Stream pie charts
Otherwise we won't be able to see which nodes are running "hot"

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-08-11 13:39:30 +01:00
David Ansari 4b774db5c1 Use same threshold color for "Errors since boot" 2021-08-02 17:05:17 +02:00
David Ansari c99ee6961e Use same colorMode in all RabbitMQ-Stream panels
Co-authored-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-08-02 13:33:00 +02:00
David Ansari ea18c31288 Make RabbitMQ-Stream dashboard work via ConfigMap
Before this commit, importing the dashboard via ConfigMap as seen in
1eb1dc618e
didn't work because DS_PROMETHEUS variable was undefined in Grafana.

Related to https://github.com/rabbitmq/rabbitmq-server/pull/3250

Co-authored-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-08-02 13:12:48 +02:00
Gerhard Lazu 65afbb931b
Ensure RabbitMQ-Stream dashboard works correctly after import
This breaks the docker-compose integration, but we need to move away
from it anyways, the whole dev flow needs revisiting after our focus on
K8s.

$__rate_interval does not work with irate, dropping it in favour of 60s,
same as all other dashboards.

This is a follow-up to https://github.com/rabbitmq/rabbitmq-server/pull/3250

Thanks @ansd for mentioning about the post-import issues.

It was uploaded as https://grafana.com/api/dashboards/14798/revisions/3/download

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-07-30 13:53:02 +01:00
Gerhard Lazu 35a6369327
Restart stream-perf-test on-failure
This handles the scenario where rmq2 is not available, and
stream-perf-test exits with a non-zero exit code. Good spot @ansd!

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-07-30 11:25:36 +01:00
David Ansari 47d572908d Convert string to integer for ulimits.nofile
Before this commit:

> make overview metrics
services.rmq1.ulimits.nofile.hard must be a integer
make: *** [Makefile:68: overview] Error 15

Accoring to the docs
https://docs.docker.com/compose/compose-file/compose-file-v3/#ulimits
this must be an integer.
2021-07-30 09:46:38 +02:00
Gerhard Lazu 6f5c4118ea
Publish RabbitMQ-Stream dashboard to grafana.com
Removed the Dockerfile and slimmed down the Makefile, all of this is now
handled by https://github.com/rabbitmq/rabbitmq-server/blob/master/.github/workflows/oci.yaml
cc @Zerpet @pjk25

More details here (including the steps used to publish to grafana.com):
https://github.com/rabbitmq/release-engineering/issues/11#issuecomment-887627938

I don't want to hold up this PR, will invest in automating the
steps described in the previous link another time. Time to 🚀

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-07-29 19:34:05 +01:00
Gerhard Lazu 1e5708b0c5
Fix Grafana dashboards when importing from URL
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-03-22 19:27:13 +00:00
Gerhard Lazu c18ad7a5b6
Fix colors for node names that include digits in Grafana dashboards
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-03-08 13:19:14 +00:00
Gerhard Lazu 6adb2449b4
Add inet_tcp_metrics Grafana dashboard & cluster example
It uses the commercial edition of RabbitMQ, requires a valid Tanzu
Network account.  Learn more: https://rabbitmq.com/tanzu

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-02-05 12:50:32 +00:00
Gerhard Lazu 0ce95075ef
Bump all Grafana dashboards dep versions to latest
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-01-18 18:59:11 +00:00
David Ansari 377a933f4c
Filter Grafana dashboards by namespace (#2719)
So that clusters with the same rabbitmq_cluster name in different K8s namespaces don't clash

Namespace filter comes first, because the order of the layers is namespace -> cluster -> node

Tested with the latest 3.9.0 dev build

We had to account for plugin changes from .ez to directories & the management.load_definitions deprecation which would prevent a node from booting (fixed in 07a0dd7438). This commit didn't make it through the 3.9.x pipeline yet, so there is no 3.9.0 dev build with this fix yet. The simplest fix is to drop `management.` from the load_definitions config.

The next manual step is to generate all dashboards using e.g. `make RabbitMQ-Overview.json > ~/Downloads/RabbitMQ-Overview.json` and upload them to https://grafana.com/orgs/rabbitmq

Great contribution @ansd, thank you 👏🏻
2021-01-18 18:45:05 +00:00
Gerhard Lazu 4e31a176c9 Upgrade RabbitMQ Overview dashboard to Grafana 7
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-11-13 12:55:05 +00:00
Gerhard Lazu 530de03e38 Merge pull request #61 from rabbitmq/grafana-publisher-fix
Prevent non-zero publisher count in Grafana when aggregating metrics
2020-11-13 12:45:33 +00:00
Gerhard Lazu 3f6f54eb02 Bump Grafana, Prometheus & Node Exporter to latest
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-11-13 12:15:06 +00:00
Connor Rogers 5b9f77a5f2 Prevent non-zero publisher count when aggregating metrics
In the case where there are 0 channels (and as such 0 publishers), the
dashboard reports there are actually `n` publishers in an `n`-node
cluster. This changes the calculation of publishers to be number of
channels (which is always known) minus the number of consumers (which is
always known).
2020-11-12 15:26:39 +00:00
Gerhard Lazu 8f7953438e Fix Erlang cookie when running with Docker Compose on Windows
Context:
9452cf179b (commitcomment-40660523)

Thanks @wainwrightmark!

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-07-17 10:28:14 +01:00
Gerhard Lazu 9452cf179b Mount .erlang.cookie file
Context: we want to move away from environment variables and use either
config files or env files (such as the rabbitmq-env.conf).

Since .erlang.cookie is neither, the official RabbitMQ Docker image
handles this by writing the value from the RABBITMQ_ERLANG_COOKIE env
var into the file if it does not exist. The problem is that if this file
exists, and the value is different from the RABBITMQ_ERLANG_COOKIE env
var, CLI tools will not be able to communicate with the rabbit node, as
described here: https://github.com/rabbitmq/rabbitmq-cli/issues/443

The only gotcha is that this file must be owned by the user, and
privileges should not be too open (git should have captured this). If
not, RabbitMQ will fail to boot. This is somewhat similar to how OpenSSH
reacts when private key permissions are too open.

re https://github.com/docker-library/rabbitmq/pull/422#issuecomment-650074731

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-07-01 17:07:16 +01:00
Gerhard Lazu b28f6e64ba Fix metric name & description in zdbbl graph, Erlang Distribution
open http://localhost:3000/dashboards # select Erlang-Distribution
    e > Metrics; General > Description # when on Data buffered in the distribution links queue
    Save Dashboard > Export > +Export for sharing externally > Save to file
    pwd
    /Users/gerhard/github.com/rabbitmq/3.9.x/deps/rabbitmq_prometheus
    vimdiff docker/grafana/dashboards/Erlang-Distribution.json ~/Downloads/Erlang-Distribution*.json

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-06-24 16:36:04 +01:00
Gerhard Lazu a6f6244c85 Build Docker image from latest 3.9 dev release + this PR
Update OTP to latest stable, 23.0.2

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-06-18 11:52:16 +01:00
Gerhard Lazu 850a30653d Use Grafana 6.7.2 schema defaults in Erlang-Distribution
This will make future diffs smaller

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
(cherry picked from commit e1a08d6ae752181177cbcc411219a8dd780359d2)
2020-04-27 15:22:01 +01:00
Gerhard Lazu eca19f7dd9 Bump versions across a number of deps
- RabbitMQ latest 3.9 dev build
- OpenSSL - https://github.com/docker-library/rabbitmq/pull/403
- OTP, PerfTest, Prometheus & Grafana latest GA

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
(cherry picked from commit b25b4e897337d97edbf6a826b0f12d20ea7cf914)
2020-04-22 18:12:21 +01:00
aakcht 729dd14f9b Color labelling grafana fix 2020-04-14 18:20:04 +04:00
Gerhard Lazu 1222018e50 Make Erlang-Memory-Allocators dashboard look better on light
Use the latest Grafana schema improvements to simplify the dahsboard
definition.

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-24 13:21:46 +00:00
Gerhard Lazu 8d40bf85a2 Bump Prometheus & Grafana to latest stable
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-24 12:45:13 +00:00
Gerhard Lazu 9c112b5718 Sum resident set size on Erlang-Memory-Allocators
Otherwise the singlestat panel will return a 'Only queries that return
single series/table is supported' error if the node changed some
properties, like the instance label because the IP changed.

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-24 12:09:47 +00:00
Gerhard Lazu d64361658a Bump to latest unverified generic-unix 3.9 dev build
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-12 19:00:00 +00:00
Gerhard Lazu 92ef32d022 Build image with latest RabbitMQ 3.9.0 dev + local
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-11 13:29:29 +00:00
Gerhard Lazu 19683fc2c9 Clean up NODES table on RabbitMQ-Overview
Hiding "all other" values stopped working since Grafana v6.6.1, need to
be explicit about which values should be hidden. Picked up a few other
changes from Grafana after Save JSON to file.

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-11 13:26:20 +00:00
Gerhard Lazu dec54306c9 Use publisher confirms for Quorum Queue workload
It activates and extra graph on the RabbitMQ-Overview dashboard and
let's be honest - why use Quorum Queues if the workload didn't care
whether the broker received the message? They go together, seriously!

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-11 13:23:26 +00:00
Gerhard Lazu e7c997744d Improve config for returning metrics per object
Since metrics are now aggregated by default, it made more sense to use
the inverse meaning of disabling aggregation, and call it a positive and
explicit action: return_per_object_metrics.

Naming pair: @michaelklishin

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-11 13:08:00 +00:00
Gerhard Lazu 4622974d1b Bump Grafana & Prometheus to latest
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-10 17:50:13 +00:00
Gerhard Lazu c079459e9c Bump Docker image to latest RabbitMQ 3.9 dev
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-10 17:49:06 +00:00
Gerhard Lazu e91e4ea32b Bump to latest RabbitMQ 3.9.0 dev build & Erlang/OTP v22.2.6
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-06 17:57:31 +00:00
Gerhard Lazu 8088a50e13 Merge pull request #28 from rabbitmq/metrics-aggregation
Option to aggregate channel, queue and connection metrics
2020-02-04 12:43:49 +00:00
Gerhard Lazu f632014e2c Bump RabbitMQ to latest dev & OTP to latest stable
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-01-15 19:02:33 +00:00
Gerhard Lazu 29c5d2e241 Fix QQ PerfTest instance name in Prometheus config
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-01-15 12:58:54 +00:00
Gerhard Lazu 89efb964d9 Convert raft_entry_commit_latency to seconds & be explicit about unit
This is a follow-up to https://github.com/rabbitmq/ra/pull/160

Had to introduce mf_convert/3 so that METRICS_REQUIRING_CONVERSIONS
proplist does not clash with METRICS_RAW proplists that have the same
number of elements. This is begging to be refactored, but I know that
@dcorbacho is working on https://github.com/rabbitmq/rabbitmq-prometheus/issues/26

Also modified the RabbitMQ-Quorum-Queues-Raft dashboard

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-01-07 16:20:59 +00:00
Gerhard Lazu 5602a9eb4c Update Docker image to latest dev
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-01-07 16:11:57 +00:00
Gerhard Lazu 1e96189826 Bump grafana version to latest stable since flant-statusmap-panel v0.2.0
Thanks @diafour & @briangann for grafana/grafana-plugin-repository#531 👍

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2019-12-20 10:09:04 +00:00
Gerhard Lazu 0af70418b9 Bump OTP to latest stable & RabbitMQ to latest dev in Dockerfile
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2019-12-20 10:07:58 +00:00
Gerhard Lazu a10c7ce6f1 Move the __inputs partial out of the Grafan dashboards dir
Grafan will keep failing with the following error message otherwise:

    failed to load dashboard from /dashboards/__inputs.json Dashboard title cannot be empty
2019-12-04 21:12:26 +00:00
Gerhard Lazu 400ebdf9f8 Publish Erlang-Distribution Grafana dashboard to grafana.com
https://grafana.com/grafana/dashboards/11352

[finishes #166355345]
2019-12-04 21:12:16 +00:00
Gerhard Lazu b51288cfc5 Drop Visualise in Erlang-Memory-Allocators dashboard description
[#169264435]
2019-12-04 12:15:35 +00:00
Gerhard Lazu ad450779ba Publish Erlang-Memory-Allocators Grafana dashboard to grafana.com
https://grafana.com/grafana/dashboards/11350

[finishes #169264435]
2019-12-04 12:11:41 +00:00
Gerhard Lazu 076c65becb Version control descriptions for all our grafana.com dashboards 2019-12-03 13:34:43 +00:00
Gerhard Lazu 35525db9df Publish RabbitMQ-Quorum-Queues-Raft Grafana dashboard to grafana.com
https://grafana.com/grafana/dashboards/11340

[finishes #166926415]
2019-12-03 13:33:31 +00:00
Gerhard Lazu 4df7e701ee Decrease load on qq deployment
It still puts a significant load on the host, but any lower and we won't
see any change in the Uncommited log entries graph, and too little
variation in the Log entry commit latency.
2019-12-03 11:28:01 +00:00
Gerhard Lazu 79284d0b02 Bump RabbitMQ to latest alpha 2019-12-03 11:26:40 +00:00