Commit Graph

87 Commits

Author SHA1 Message Date
Gerhard Lazu 9cc33c571d Print the response body by default
Makes is easier to spot why a match failed.

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-04-25 00:06:32 +01:00
Jean-Sébastien Pédron 636c3f78dc Update copyright (year 2020) 2020-03-10 16:42:08 +01:00
Gerhard Lazu e7c997744d Improve config for returning metrics per object
Since metrics are now aggregated by default, it made more sense to use
the inverse meaning of disabling aggregation, and call it a positive and
explicit action: return_per_object_metrics.

Naming pair: @michaelklishin

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-11 13:08:00 +00:00
dcorbacho 253ef8e827 Fix 0/0 and enable all tests 2020-02-07 17:08:10 +01:00
Gerhard Lazu 09b29057af Aggregate metrics by default
Having talked to @michaelklishin we've decided to enable metrics
aggregation by default so that RabbitMQ nodes with many objects serve
the same amount of metrics quickly rather than taking many seconds and
transferring many MBs of data on every scrape.

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-07 15:19:30 +00:00
Gerhard Lazu 11d676f3e1 Replace histogram type with gauge for raft_entry_commit_latency_seconds
We want to keep the same metric type regardless whether we aggregate or
don't. If we had used a histogram type, considering the ~12 buckets that
we added, it would have meant 12 extra metrics per queue which would
have resulted in an explosion of metrics. Keeping the gauge type and
aggregating latencies across all members.

re https://github.com/rabbitmq/rabbitmq-prometheus/pull/28

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-02-06 17:37:37 +00:00
dcorbacho 06186065b4 Option to aggregate channel, queue and connection metrics
`prometheus.enable_metric_aggregation = true`

rabbitmq-prometheus#26
2020-01-10 16:35:50 +01:00
Gerhard Lazu 89efb964d9 Convert raft_entry_commit_latency to seconds & be explicit about unit
This is a follow-up to https://github.com/rabbitmq/ra/pull/160

Had to introduce mf_convert/3 so that METRICS_REQUIRING_CONVERSIONS
proplist does not clash with METRICS_RAW proplists that have the same
number of elements. This is begging to be refactored, but I know that
@dcorbacho is working on https://github.com/rabbitmq/rabbitmq-prometheus/issues/26

Also modified the RabbitMQ-Quorum-Queues-Raft dashboard

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2020-01-07 16:20:59 +00:00
Gerhard Lazu b8893afcde Add auto-generated test rabbitmq_management.schema 2019-12-03 11:27:26 +00:00
Michael Klishin 3aed601336 A typo 2019-11-26 12:25:08 +03:00
Michael Klishin 6e17eeb3c5 Update this test to use a consumer in a separate process 2019-11-26 12:23:40 +03:00
Gerhard Lazu f550aa0706 Fix queue_metrics references
Some properties had queue_ appended, while others used messages_ instead
of message_. This meant that metrics such as rabbitmq_queue_consumers
were not reported correctly, as captured in https://github.com/rabbitmq/rabbitmq-prometheus/issues/9#issuecomment-558233464

The test needs fixing before this can be merged, it's currently failing with:

    $ make ct-rabbit_prometheus_http t=with_metrics:metrics_test
    == rabbit_prometheus_http_SUITE ==

      * [with_metrics]

    rabbit_prometheus_http_SUITE > with_metrics
        {error,
            {shutdown,
                {gen_server,call,
                    [<0.245.0>,
                     {call,
                         {'basic.cancel',<<"amq.ctag-uHUunE5EoozMKYG8Bf6s1Q">>,
                             false},
                         none,<0.252.0>},
                     infinity]}}}

Closes #19
2019-11-26 07:47:22 +00:00
Michael Klishin b70a8da7f0 Expose endpoint path configuration, references #8 2019-09-26 13:39:21 +03:00
Michael Klishin b03dfa2dd2 New style configuration schema for listeners
Closes #8.
2019-09-26 13:08:36 +03:00
Gerhard Lazu 2b73981ab1 Fix build & identity info metrics
Improve pattern matching used in tests so that we don't match partial
metric names.

[#167846096]
2019-09-04 13:21:50 +01:00
Gerhard Lazu 5781130b61 Use the correct metric types & capture perspective when naming
Some metrics were of type gauge while they should have been of type
counter. Thanks @brian-brazil for making the distinction clear. This is
now captured as a comment above the metric definitions.

Because all metrics are from RabbitMQ's perspective, cached for up to 5
seconds by default (configurable), we prepend `rabbitmq_` to all metrics
emitted by this collector.  While Some metrics are for Erlang (erlang_),
Mnesia (schema_db_) or the System (io_), they are all observed & cached
by RabbitMQ, hence the prefix.

This is the last PR which started in the context of prometheus/docs#1414

[#167846096]
2019-09-04 11:49:48 +01:00
Gerhard Lazu aafc4c026b Revert erlang_uptime_seconds to gauge, not counter
We care about its value rather than the rate of change.

[#167846096]
2019-09-03 19:56:59 +01:00
Gerhard Lazu fbc945f710 Convert all time metrics to seconds
This started in the context of prometheus/docs#1414, specifically
https://github.com/prometheus/docs/pull/1414#issuecomment-524250746

[#167846096]
2019-09-03 17:17:50 +01:00
Gerhard Lazu 98e488f1c4 Use standard naming for metrics expected from the client library
As described in
https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics.

Until prometheus.erl has the prometheus_process_collector functionality
built-in - this may not happen -, we are exposing a subset of those
metrics via rabbitmq_core_metrics_collector, so we are going to stick to
the expected naming conventions.

This commit supercedes the thought process captured in
1e5f4de4cb

[#167846096]
2019-09-03 15:31:55 +01:00
Gerhard Lazu 1e5f4de4cb Rename process-related metrics to stay closer to conventions
While `process_open_fds` would have been ideal, because the value is
cached within RabbitMQ, and computed differently across platforms, it is
important to keep the distinction from, say, what the kernel reports
just-in-time.

I am also capturing the Erlang context by adding `erlang_` to the
relevant metrics. The full context is: RabbitMQ observed this Erlang VM
process metric to be X, so this is why some metrics are prefixed with
`rabbitmq_erlang_process_`

Because there is a difference betwen what RabbitMQ limits are set to,
e.g. `rabbitmq_memory_used_limit_bytes`, vs. what RabbitMQ reports about
the Erlang process, e.g. `rabbitmq_erlang_process_memory_used_bytes`.

This is the best that we can do while staying honest about what is being
reported. cc @brian-brazil

[#167846096]
2019-09-03 12:30:48 +01:00
Gerhard Lazu 2e686f1131 Continue updating RabbitMQ-Overview dashboard to use the new info metric
[#167846096]
2019-08-27 17:11:41 +01:00
Gerhard Lazu e2be7193ff Use a higher config_port when testing
Otherwise it will clash with docker-compose-overview.yml ports
2019-08-15 16:40:19 +01:00
Gerhard Lazu 052d92c74b Replace global labels with build_info & identity_info metrics
This started in the context of prometheus/docs#1414, specifically
https://github.com/prometheus/docs/pull/1414#issuecomment-520505757

Rather than labelling all metrics with the same label, we are
introducing 2 new metrics: rabbitmq_build_info & rabbitmq_identity_info.

I suspect that we may want to revert deadtrickster/prometheus.erl#91
when we agree that the proposed alternative is better.

We are yet to see through changes in Grafana dashboards. I am most
interested in how the updated queries will look like and, more
importantly, if we will have the same panels as we do now. More commits
to follow shortly, wanted to get this out the door first.

In summary, this commit changes:

    # TYPE erlang_mnesia_held_locks gauge
    # HELP erlang_mnesia_held_locks Number of held locks.
    erlang_mnesia_held_locks{node="rabbit@920f1e3272af",cluster="rabbit@920f1e3272af",rabbitmq_version="3.8.0-alpha.806",erlang_version="22.0.7"} 0
    # TYPE erlang_mnesia_lock_queue gauge
    # HELP erlang_mnesia_lock_queue Number of transactions waiting for a lock.
    erlang_mnesia_lock_queue{node="rabbit@920f1e3272af",cluster="rabbit@920f1e3272af",rabbitmq_version="3.8.0-alpha.806",erlang_version="22.0.7"} 0
    ...

To this:

    # TYPE erlang_mnesia_held_locks gauge
    # HELP erlang_mnesia_held_locks Number of held locks.
    erlang_mnesia_held_locks 0
    # TYPE erlang_mnesia_lock_queue gauge
    # HELP erlang_mnesia_lock_queue Number of transactions waiting for a lock.
    erlang_mnesia_lock_queue 0
    ...
    # TYPE rabbitmq_build_info untyped
    # HELP rabbitmq_build_info RabbitMQ & Erlang/OTP version info
    rabbitmq_build_info{rabbitmq_version="3.8.0-alpha.809",prometheus_plugin_version="3.8.0-alpha.809-2019.08.15",prometheus_client_version="4.4.0",erlang_version="22.0.7"} 1
    # TYPE rabbitmq_identity_info untyped
    # HELP rabbitmq_identity_info Node & cluster identity info
    rabbitmq_identity_info{node="rabbit@bc7aeb0c2564",cluster="rabbit@bc7aeb0c2564"} 1
    ...

[#167846096]
2019-08-15 16:00:29 +01:00
Gerhard Lazu 4aa3871194 Use different names for *_process_reductions_total metrics
It is invalid to have multiple metrics with the same name, TYPE & HELP,
but differing labels.

[#167846096]
2019-08-14 16:17:48 +01:00
Gerhard Lazu 75ecd6af1d Fix test that fails when the metric is empty 2019-08-14 12:44:29 +01:00
Gerhard Lazu e218ea5ea2 Reorder elements in metric names & improve naming
bytes / packets must come before _total

Explaing element order difference in TOTALS vs the metrics above

[#167846096]
2019-08-13 19:15:12 +01:00
Gerhard Lazu f1043134f4 Fix test failure message description 2019-08-08 16:58:01 +01:00
Diana Corbacho 82d858719d Label all metrics with Erlang and RMQ version
[#166413229]
2019-08-05 09:40:51 +01:00
Gerhard Lazu 805dd5e3b2 Enable quorum queue feature flag when runnin e2e metrics tests
Otherwise tests will fail in CI:
https://ci.rabbitmq.com/teams/main/pipelines/server-release:v3.8.x-mixed-versions/jobs/test-rabbitmq-prometheus/builds/20

Remove unused rabbit_ct_client_helpers imports
2019-06-26 11:03:28 +01:00
Gerhard Lazu 5e280c0281 Add first version of RabbitMQ Raft metrics
Depends on https://github.com/rabbitmq/ra/tree/metrics_tweaks &
https://github.com/rabbitmq/rabbitmq-server/tree/qq_metrics_tweak

[#166819045]
2019-06-20 20:11:31 +01:00
Gerhard Lazu 2645082738 Finish Erlang Distribution Grafana dashboard
Includes Erlang node to colour pinning

Adds a few make targets to help with docker-compose repetitive commands
& Grafana dashboard updates.

Split Overview & Distribution Docker deployments

re deadtrickster/prometheus.erl#92

[finishes #166004512]
2019-05-29 18:19:09 +01:00
Gerhard Lazu f9ce43677b Review metrics with @dcorbacho
[accepts #165831668]
2019-05-15 17:06:42 +01:00
Gerhard Lazu c596efb58e Review all metrics to ETS mappings
Clarify descriptions, improve metric names, fix typos etc. Follow-up to
deadtrickster/prometheus_rabbitmq_exporter#75.

Helpful metric descriptions

* https://www.rabbitmq.com/monitoring.html
* https://docs.signalfx.com/en/latest/integrations/integrations-reference/integrations.rabbitmq.html
* https://github.com/rabbitmq/rabbitmq-common/blob/master/include/rabbit_core_metrics.hrl
* https://github.com/rabbitmq/rabbitmq-common/blob/master/src/rabbit_core_metrics.erl

Thanks for the pair-up @michaelklishin!

[finishes #165831668]
2019-05-09 17:25:17 +01:00
Gerhard Lazu a2e6687162 Bump prometheus.erl to v4.3.0
This includes the global_labels feature introduced in deadtrickster/prometheus.erl#91

To test, run `docker-compose up` in docker dir, then navigate to
localhost:15692/metrics & localhost:3000/dashboards (admin:admin) to see
the Grafana RabbitMQ Overview dashboard.
2019-05-01 12:58:35 +01:00
Gerhard Lazu e61a5efde9 Use latest deps, depend on rabbitmq_management_agent + add node label
Bumping all prometheus-related deps to latest stable. Defining them in
rabbitmq-components.mk, so that they can be promoted to all deps in
umbrella.

rabbitmq_management_agent is required for alarm-related metrics to be
available.

Added node label to most `rabbitmq_` metrics. I need help adding them to
mfa_totals - metrics_node_label_test test currently fails. The new unit
tests ensure that label/0 behaves as expected in all cases - made
refactoring easy. Run unit tests via:

    gmake eunit EUNIT_MODS=prometheus_rabbitmq_core_metrics_collector

Updating to latest erlang.mk makes running eunit tests much faster: 2s
vs 10s. To do this, comment `ERLANG_MK_*` in Makefile and run `gmake
erlank-mk`.
2019-04-09 17:02:40 +01:00
Michael Klishin 08be918234 Change default port to 15692 2019-04-01 18:00:39 +03:00
Diana Corbacho 46431063a3 Support only text format and (optional) gzip encoding
Since Prometheus 2.0 protobuf is not longer a supported format
2019-03-13 14:36:49 +00:00