rabbitmq-server/deps/rabbitmq_prometheus
David Ansari 8c286cc680 Add Prometheus metrics for dead-lettered messages
> curl -s localhost:15692/metrics | grep rabbitmq_global_messages_dead_lettered
\# TYPE rabbitmq_global_messages_dead_lettered_delivery_limit_total counter
\# HELP rabbitmq_global_messages_dead_lettered_delivery_limit_total Total number of messages dead-lettered due to delivery-limit exceeded
rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0
\# TYPE rabbitmq_global_messages_dead_lettered_expired_total counter
\# HELP rabbitmq_global_messages_dead_lettered_expired_total Total number of messages dead-lettered due to message TTL exceeded
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0
\# TYPE rabbitmq_global_messages_dead_lettered_rejected_total counter
\# HELP rabbitmq_global_messages_dead_lettered_rejected_total Total number of messages dead-lettered due to basic.reject or basic.nack
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0
\# TYPE rabbitmq_global_messages_dead_lettered_confirmed_total counter
\# HELP rabbitmq_global_messages_dead_lettered_confirmed_total Total number of messages dead-lettered and confirmed by target queues
rabbitmq_global_messages_dead_lettered_confirmed_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
\# TYPE rabbitmq_global_messages_dead_lettered_maxlen_total counter
\# HELP rabbitmq_global_messages_dead_lettered_maxlen_total Total number of messages dead-lettered due to overflow drop-head or reject-publish-dlx
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0

A few notes:
* dead_letter_strategy 'disabled' means either user did not configure
  dead-letter-exchange or configured dead-letter-exchange does not
  exist.
* Only time series that make sense get output.
  Example 1: Combination of 'at_least_once' and 'maxlen' will always be 0.
  Hence, we omit that time series.
  Example 2: 'confirmed' makes only sense with quorum queues and
  'at_least_once'.
  Example 3: 'delivery_limit' makes only sense with quorum queues.
* Users get to know *why* messages were dead-lettered.
* Before this commit, there was no possibilities for users to alert
  based on messages being dropped from the head of the queue when
  overflow=drop-head.
* Users can now easily create alerts:
  Example 1: Message gets silently dropped (i.e.
  dead_letter_strategy='disabled') instead of actually dead-lettered.
  Example 2: Detect dead-letter topology misconfigurations.
  Example 3: Messages expire
  Example 4: Messages overflow
  Example 5: Messages requeued too often
* Stream queues by definition do not dead-letter.
2022-02-28 16:28:02 +01:00
..
docker Break down metrics by node in all RabbitMQ-Stream pie charts 2021-08-11 13:39:30 +01:00
priv/schema Use own key to exclude queues 2021-11-16 16:53:17 +01:00
src Use `rabbitmq_cluster_` prefix for cluster-wide metrics 2021-11-24 16:49:43 +01:00
test Use `rabbitmq_cluster_` prefix for cluster-wide metrics 2021-11-24 16:49:43 +01:00
.autocomplete
.gitignore
BUILD.bazel Tighten up dialyzer usage 2022-02-24 11:18:41 +01:00
CODE_OF_CONDUCT.md
CONTRIBUTING.md
LICENSE
LICENSE-MPL-RabbitMQ
Makefile Publish RabbitMQ-Stream dashboard to grafana.com 2021-07-29 19:34:05 +01:00
README.md Allow selective querying for prometheus plugin 2021-09-20 14:59:17 +02:00
metrics-detailed.md Use `rabbitmq_cluster_` prefix for cluster-wide metrics 2021-11-24 16:49:43 +01:00
metrics.md Add Prometheus metrics for dead-lettered messages 2022-02-28 16:28:02 +01:00
rabbitmq-disable-metrics-collector.conf

README.md

Build Grafana Dashboards

Prometheus Exporter of Core RabbitMQ Metrics

Getting Started

This is a Prometheus exporter of core RabbitMQ metrics, developed by the RabbitMQ core team. It is largely a "clean room" design that reuses some prior work from Prometheus exporters done by the community.

Project Maturity

This plugin is new as of RabbitMQ 3.8.0.

Documentation

See Monitoring RabbitMQ with Prometheus and Grafana.

Installation

This plugin is included into RabbitMQ 3.8.x releases. Like all plugins, it has to be enabled before it can be used:

To enable it with rabbitmq-plugins:

rabbitmq-plugins enable rabbitmq_prometheus

Usage

See the documentation guide.

Default port used by the plugin is 15692 and the endpoint path is at /metrics. To try it with curl:

curl -v -H "Accept:text/plain" "http://localhost:15692/metrics"

In most environments there would be no configuration necessary.

See the entire list of metrics exposed via the default port.

Configuration

This exporter supports the following options via a set of prometheus.* configuration keys:

Sample configuration snippet:

# these values are defaults
prometheus.return_per_object_metrics = false
prometheus.path = /metrics
prometheus.tcp.port =  15692

When metrics are returned per object, nodes with 80k queues have been measured to take 58 seconds to return 1.9 million metrics in a 98MB response payload. In order to not put unnecessary pressure on your metrics system, metrics are aggregated by default.

When debugging, it may be useful to return metrics per object (unaggregated).

This can be done by scraping the /metrics/per-object endpoint:

curl -v -H "Accept:text/plain" "http://localhost:15692/metrics/per-object"

This can also be enabled as the default behavior of the /metrics endpoint on-the-fly, without restarting or configuring RabbitMQ, using the following command:

rabbitmqctl eval 'application:set_env(rabbitmq_prometheus, return_per_object_metrics, true).'

To go back to aggregated metrics on-the-fly, run the following command:

rabbitmqctl eval 'application:set_env(rabbitmq_prometheus, return_per_object_metrics, false).'

Selective querying of per-object metrics

As mentioned in the previous section, returning a lot of per-object metrics is quite computationally expensive process. One of the reasons is that /metrics/per-object returns every possible metric for every possible object - even if having them makes no sense in the day-to-day monitoring activity.

That's why there is an additional endpoint that always return per-object metrics and allows one to explicitly query only the things that are relevant - /metrics/detailed. By default it doesn't return anything at all, but it's possible to specify required metric groups and virtual host filters in the GET-parameters. Scraping /metrics/detailed?vhost=vhost-1&vhost=vhost-2&family=queue_coarse_metrics&family=queue_consumer_count. will only return requested metrics (and not, for example, channel metrics that include erlang PID in labels).

This endpoint supports the following parameters:

  • Zero or more family - only the requested metric families will be returned. The full list is documented in metrics-detailed.
  • Zero or more vhost - if it's given, queue related metrics (queue_coarse_metrics, queue_consumer_count and queue_metrics) will be returned only for given vhost(s).

The returned metrics use different prefix rabbitmq_detailed_ (instead of plain rabbitmq_ used by other endpoints), so that endpoint can be used simultaneously with /metrics, and existing dashboards won't be affected.

Here are the performance gains you can expect from using this endpoint. On a test system with 10k queues/10k consumer/10k producers, /metrics/per-object took a bit over 2 minutes. Querying /metrics/detailed?family=queue_coarse_metrics&family=queue_consumer_count provides just enough metrics to see how many messages sit in every queue and how much consumers each of these queues have. And it takes only 2 seconds, a significant improvement over indiscriminate /metrics/per-object.

Contributing

See CONTRIBUTING.md.

Makefile

This project uses erlang.mk, running make help will return erlang.mk help.

To see all custom targets that have been documented, run make h.

For Bash shell autocompletion, run eval "$(make autocomplete)", then type make a<TAB> to see all Make targets starting with the letter a, e.g.:

$ make a<TAB
ac               all.coverdata    app-build        apps             apps-eunit       asciidoc-guide   autocomplete
all              app              app-c_src        apps-ct          asciidoc         asciidoc-manual

(c) 2007-2020 VMware, Inc. or its affiliates.