Since metrics are now aggregated by default, it made more sense to use
the inverse meaning of disabling aggregation, and call it a positive and
explicit action: return_per_object_metrics.
Naming pair: @michaelklishin
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Having talked to @michaelklishin we've decided to enable metrics
aggregation by default so that RabbitMQ nodes with many objects serve
the same amount of metrics quickly rather than taking many seconds and
transferring many MBs of data on every scrape.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
We want to keep the same metric type regardless whether we aggregate or
don't. If we had used a histogram type, considering the ~12 buckets that
we added, it would have meant 12 extra metrics per queue which would
have resulted in an explosion of metrics. Keeping the gauge type and
aggregating latencies across all members.
re https://github.com/rabbitmq/rabbitmq-prometheus/pull/28
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
This is a follow-up to https://github.com/rabbitmq/ra/pull/160
Had to introduce mf_convert/3 so that METRICS_REQUIRING_CONVERSIONS
proplist does not clash with METRICS_RAW proplists that have the same
number of elements. This is begging to be refactored, but I know that
@dcorbacho is working on https://github.com/rabbitmq/rabbitmq-prometheus/issues/26
Also modified the RabbitMQ-Quorum-Queues-Raft dashboard
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Some properties had queue_ appended, while others used messages_ instead
of message_. This meant that metrics such as rabbitmq_queue_consumers
were not reported correctly, as captured in https://github.com/rabbitmq/rabbitmq-prometheus/issues/9#issuecomment-558233464
The test needs fixing before this can be merged, it's currently failing with:
$ make ct-rabbit_prometheus_http t=with_metrics:metrics_test
== rabbit_prometheus_http_SUITE ==
* [with_metrics]
rabbit_prometheus_http_SUITE > with_metrics
{error,
{shutdown,
{gen_server,call,
[<0.245.0>,
{call,
{'basic.cancel',<<"amq.ctag-uHUunE5EoozMKYG8Bf6s1Q">>,
false},
none,<0.252.0>},
infinity]}}}
Closes#19
Some metrics were of type gauge while they should have been of type
counter. Thanks @brian-brazil for making the distinction clear. This is
now captured as a comment above the metric definitions.
Because all metrics are from RabbitMQ's perspective, cached for up to 5
seconds by default (configurable), we prepend `rabbitmq_` to all metrics
emitted by this collector. While Some metrics are for Erlang (erlang_),
Mnesia (schema_db_) or the System (io_), they are all observed & cached
by RabbitMQ, hence the prefix.
This is the last PR which started in the context of prometheus/docs#1414
[#167846096]
As described in
https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics.
Until prometheus.erl has the prometheus_process_collector functionality
built-in - this may not happen -, we are exposing a subset of those
metrics via rabbitmq_core_metrics_collector, so we are going to stick to
the expected naming conventions.
This commit supercedes the thought process captured in
1e5f4de4cb
[#167846096]
While `process_open_fds` would have been ideal, because the value is
cached within RabbitMQ, and computed differently across platforms, it is
important to keep the distinction from, say, what the kernel reports
just-in-time.
I am also capturing the Erlang context by adding `erlang_` to the
relevant metrics. The full context is: RabbitMQ observed this Erlang VM
process metric to be X, so this is why some metrics are prefixed with
`rabbitmq_erlang_process_`
Because there is a difference betwen what RabbitMQ limits are set to,
e.g. `rabbitmq_memory_used_limit_bytes`, vs. what RabbitMQ reports about
the Erlang process, e.g. `rabbitmq_erlang_process_memory_used_bytes`.
This is the best that we can do while staying honest about what is being
reported. cc @brian-brazil
[#167846096]
This started in the context of prometheus/docs#1414, specifically
https://github.com/prometheus/docs/pull/1414#issuecomment-520505757
Rather than labelling all metrics with the same label, we are
introducing 2 new metrics: rabbitmq_build_info & rabbitmq_identity_info.
I suspect that we may want to revert deadtrickster/prometheus.erl#91
when we agree that the proposed alternative is better.
We are yet to see through changes in Grafana dashboards. I am most
interested in how the updated queries will look like and, more
importantly, if we will have the same panels as we do now. More commits
to follow shortly, wanted to get this out the door first.
In summary, this commit changes:
# TYPE erlang_mnesia_held_locks gauge
# HELP erlang_mnesia_held_locks Number of held locks.
erlang_mnesia_held_locks{node="rabbit@920f1e3272af",cluster="rabbit@920f1e3272af",rabbitmq_version="3.8.0-alpha.806",erlang_version="22.0.7"} 0
# TYPE erlang_mnesia_lock_queue gauge
# HELP erlang_mnesia_lock_queue Number of transactions waiting for a lock.
erlang_mnesia_lock_queue{node="rabbit@920f1e3272af",cluster="rabbit@920f1e3272af",rabbitmq_version="3.8.0-alpha.806",erlang_version="22.0.7"} 0
...
To this:
# TYPE erlang_mnesia_held_locks gauge
# HELP erlang_mnesia_held_locks Number of held locks.
erlang_mnesia_held_locks 0
# TYPE erlang_mnesia_lock_queue gauge
# HELP erlang_mnesia_lock_queue Number of transactions waiting for a lock.
erlang_mnesia_lock_queue 0
...
# TYPE rabbitmq_build_info untyped
# HELP rabbitmq_build_info RabbitMQ & Erlang/OTP version info
rabbitmq_build_info{rabbitmq_version="3.8.0-alpha.809",prometheus_plugin_version="3.8.0-alpha.809-2019.08.15",prometheus_client_version="4.4.0",erlang_version="22.0.7"} 1
# TYPE rabbitmq_identity_info untyped
# HELP rabbitmq_identity_info Node & cluster identity info
rabbitmq_identity_info{node="rabbit@bc7aeb0c2564",cluster="rabbit@bc7aeb0c2564"} 1
...
[#167846096]
Includes Erlang node to colour pinning
Adds a few make targets to help with docker-compose repetitive commands
& Grafana dashboard updates.
Split Overview & Distribution Docker deployments
re deadtrickster/prometheus.erl#92
[finishes #166004512]
This includes the global_labels feature introduced in deadtrickster/prometheus.erl#91
To test, run `docker-compose up` in docker dir, then navigate to
localhost:15692/metrics & localhost:3000/dashboards (admin:admin) to see
the Grafana RabbitMQ Overview dashboard.
Bumping all prometheus-related deps to latest stable. Defining them in
rabbitmq-components.mk, so that they can be promoted to all deps in
umbrella.
rabbitmq_management_agent is required for alarm-related metrics to be
available.
Added node label to most `rabbitmq_` metrics. I need help adding them to
mfa_totals - metrics_node_label_test test currently fails. The new unit
tests ensure that label/0 behaves as expected in all cases - made
refactoring easy. Run unit tests via:
gmake eunit EUNIT_MODS=prometheus_rabbitmq_core_metrics_collector
Updating to latest erlang.mk makes running eunit tests much faster: 2s
vs 10s. To do this, comment `ERLANG_MK_*` in Makefile and run `gmake
erlank-mk`.