rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
Loïc Hoguin	cd35f7e7fa	Remove sockets_used/sockets_total metrics from UIs Part of the removal of file_handle_cache. The Prometheus endpoint was updated but the Grafana dashboard was not. The FD stats are using the system's state rather than file_handle_cache so there's no need to remove them.	2024-06-24 12:07:51 +02:00
Iliia Khaprov	8925dfa916	Close #10345 . Add promtheus_rabbitmq_federation_collector. rabbitmq_federation_links gauge metric with status lable.	2024-03-14 09:29:01 +01:00
David Ansari	0f5fe8fadd	Add Prometheus metric messages dropped by MQTT QoS 0 queue type Why: A RabbitMQ operator should be able to see whether RabbitMQ drops MQTT QoS 0 messages due to overload protection. It's an indication that an MQTT subscriber does not consume fast enough. How: Use Prometheus global counters. There are 2 valid solutions: 1. Introduce a new metric called messages_dropped specifically for the rabbitmq_mqtt_qos0_queue type. This would work in a similar fashion how streams extends the per protocol global counters, but requires extending the per protocol & queue type global counters for the MQTT QoS queue type. The emitted metrics would look as follows: ``` rabbitmq_global_messages_dropped_total{protocol="mqtt310",queue_type="rabbit_mqtt_qos0_queue"} 0 rabbitmq_global_messages_dropped_total{protocol="mqtt311",queue_type="rabbit_mqtt_qos0_queue"} 0 rabbitmq_global_messages_dropped_total{protocol="mqtt50",queue_type="rabbit_mqtt_qos0_queue"} 0 ``` 2. Reuse the existing metric rabbitmq_global_messages_dead_lettered_maxlen_total This commit decides to go for the 2nd approach because: a) there is no need to add a new metric. Even though dead lettering is not supported for the MQTT QoS 0 queue type, this metric maps nicely to what happens: The queue drop messages since itx max length (mqtt.mailbox_soft_limit) is exceeded with overflow behaviour drop-head. Furtheremore the label `dead_letter_strategy="disabled"` tells that dead lettering is not taking place from this queue type. b) this metric allows to support dead lettering for the MQTT QoS 0 queue type in the future. The new dead lettering metrics look as follows: ``` rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0 rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_mqtt_qos0_queue",dead_letter_strategy="disabled"} 0 rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0 rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0 rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0 rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0 rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0 rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0 rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0 rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0 rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0 rabbitmq_global_messages_dead_lettered_confirmed_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0 ```	2023-08-15 16:06:15 +02:00
David Ansari	c3cccf4963	Rename run_queues_length_total to run_queues_length It's a gauge, not a counter. @deadtrickster fixed the bug in `d0feb0df58` See #4380	2022-08-24 18:04:14 +02:00
Loïc Hoguin	499e0b9197	Remove the CQv1 disabled stats from management/Prometheus	2022-04-05 12:37:54 +02:00
David Ansari	a3905da47c	Add note about missed Prometheus counter updates Currently, the quorum queue state machine updates counters via mod_call effects which are not guaranteed to be executed. They are updated via mod_call effects such that only the leader increments the counter (and not the followers). In certain failure scenarios when dead-lettering lots of messages at the same time, these mod_call effects might not be executed. Hence, one shouldn't rely that counters for dead lettered messages and dead lettered confirmed messages match up 100% even though all dead-lettered messages were confirmed eventually.	2022-02-28 16:28:09 +01:00
David Ansari	8c286cc680	Add Prometheus metrics for dead-lettered messages > curl -s localhost:15692/metrics \| grep rabbitmq_global_messages_dead_lettered \# TYPE rabbitmq_global_messages_dead_lettered_delivery_limit_total counter \# HELP rabbitmq_global_messages_dead_lettered_delivery_limit_total Total number of messages dead-lettered due to delivery-limit exceeded rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0 rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0 \# TYPE rabbitmq_global_messages_dead_lettered_expired_total counter \# HELP rabbitmq_global_messages_dead_lettered_expired_total Total number of messages dead-lettered due to message TTL exceeded rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0 rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0 rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0 \# TYPE rabbitmq_global_messages_dead_lettered_rejected_total counter \# HELP rabbitmq_global_messages_dead_lettered_rejected_total Total number of messages dead-lettered due to basic.reject or basic.nack rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0 rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0 rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0 \# TYPE rabbitmq_global_messages_dead_lettered_confirmed_total counter \# HELP rabbitmq_global_messages_dead_lettered_confirmed_total Total number of messages dead-lettered and confirmed by target queues rabbitmq_global_messages_dead_lettered_confirmed_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0 \# TYPE rabbitmq_global_messages_dead_lettered_maxlen_total counter \# HELP rabbitmq_global_messages_dead_lettered_maxlen_total Total number of messages dead-lettered due to overflow drop-head or reject-publish-dlx rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0 rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0 rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0 A few notes: * dead_letter_strategy 'disabled' means either user did not configure dead-letter-exchange or configured dead-letter-exchange does not exist. * Only time series that make sense get output. Example 1: Combination of 'at_least_once' and 'maxlen' will always be 0. Hence, we omit that time series. Example 2: 'confirmed' makes only sense with quorum queues and 'at_least_once'. Example 3: 'delivery_limit' makes only sense with quorum queues. * Users get to know why messages were dead-lettered. * Before this commit, there was no possibilities for users to alert based on messages being dropped from the head of the queue when overflow=drop-head. * Users can now easily create alerts: Example 1: Message gets silently dropped (i.e. dead_letter_strategy='disabled') instead of actually dead-lettered. Example 2: Detect dead-letter topology misconfigurations. Example 3: Messages expire Example 4: Messages overflow Example 5: Messages requeued too often * Stream queues by definition do not dead-letter.	2022-02-28 16:28:02 +01:00
dcorbacho	b636ad2565	Rename protocol error counters to _total	2021-06-30 12:46:41 +02:00
dcorbacho	c9305d948a	Use number of publishing channels as global publishers in amqp091	2021-06-29 08:10:42 +01:00
Gerhard Lazu	c7971252cd	Global counters per protocol + protocol AND queue_type This way we can show how many messages were received via a certain protocol (stream is the second real protocol besides the default amqp091 one), as well as by queue type, which is something that many asked for a really long time. The most important aspect is that we can also see them by protocol AND queue_type, which becomes very important for Streams, which have different rules from regular queues (e.g. for example, consuming messages is non-destructive, and deep queue backlogs - think billions of messages - are normal). Alerting and consumer scaling due to deep backlogs will now work correctly, as we can distinguish between regular queues & streams. This has gone through a few cycles, with @mkuratczyk & @dcorbacho covering most of the ground. @dcorbacho had most of this in https://github.com/rabbitmq/rabbitmq-server/pull/3045, but the main branch went through a few changes in the meantime. Rather than resolving all the conflicts, and then making the necessary changes, we (@gerhard + @kjnilsson) took all learnings and started re-applying a lot of the existing code from #3045. We are confident in this approach and would like to see it through. We continued working on this with @dumbbell, and the most important changes are captured in https://github.com/rabbitmq/seshat/pull/1. We expose these global counters in rabbitmq_prometheus via a new collector. We don't want to keep modifying the existing collector, which grew really complex in parts, especially since we introduced aggregation, but start with a new namespace, `rabbitmq_global_`, and continue building on top of it. The idea is to build in parallel, and slowly transition to the new metrics, because semantically the changes are too big since streams, and we have been discussing protocol-specific metrics with @kjnilsson, which makes me think that this approach is least disruptive and... simple. While at this, we removed redundant empty return value handling in the channel. The function called no longer returns this. Also removed all DONE / TODO & other comments - we'll handle them when the time comes, no need to leave TODO reminders. Pairs @kjnilsson @dcorbacho @dumbbell (this is multiple commits squashed into one) Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2021-06-22 14:14:21 +01:00
Gerhard Lazu	c0f28afab1	Add erlang_vm_dist_node_queue_size_bytes to metrics.md Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-06-24 16:36:04 +01:00
Gerhard Lazu	11d676f3e1	Replace histogram type with gauge for raft_entry_commit_latency_seconds We want to keep the same metric type regardless whether we aggregate or don't. If we had used a histogram type, considering the ~12 buckets that we added, it would have meant 12 extra metrics per queue which would have resulted in an explosion of metrics. Keeping the gauge type and aggregating latencies across all members. re https://github.com/rabbitmq/rabbitmq-prometheus/pull/28 Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-02-06 17:37:37 +00:00
Gerhard Lazu	0efb206656	Update metrics doc with new queue metrics, improve formatting Tabularize FTW: http://vimcasts.org/episodes/aligning-text-with-tabular-vim/ Add new preview-readme make target (pre alias) for quickly previewing md changes locally, using GitHub Markdown styling.	2019-11-26 15:50:24 +00:00
Marcial Rosales	122a9fdb67	Document all metrics exposed via the prometheus endpoint	2019-11-15 17:21:59 +01:00

14 Commits