Commit Graph

458 Commits

Author SHA1 Message Date
Iliia Khaprov 19f122fea4 Prometheus core metrics collector: Do not render any sampels that are NaN or undefined
close #8740
2023-07-08 18:45:03 +02:00
Simon Unge 2a2af36b9c rename fix 2023-06-23 14:53:57 -07:00
Simon Unge 8b3ca4c972 See #8605. Add authentcation support to prometheus. 2023-06-23 13:54:45 -07:00
Michael Klishin 55442aa914 Replace @rabbitmq.com addresses with rabbitmq-core@groups.vmware.com
Don't ask why we have to do it. Because reasons!
2023-06-20 15:40:13 +04:00
Rin Kuryloski eb94a58bc9 Add a workflow to compare the bazel/erlang.mk output
To catch any drift between the builds
2023-05-15 13:54:14 +02:00
Michael Klishin 59fe5dc01b
Prometheus: handle scenarios when no listener is configured
Start a plain TCP one with all defaults.
2023-05-06 00:19:58 +04:00
Chunyi Lyu 4ddb0c2038 Support TLS-only listener for Prometheus
- tcp listener can be turned off by setting
'prometheus.tcp.listener = none'
- config schema follows web_mqtt and web_stomp
2023-05-05 15:44:53 +01:00
Rin Kuryloski a944439fba Replace globs in bazel with explicit lists of files
As this is preferred in rules_erlang 3.9.14
2023-04-25 17:29:12 +02:00
Rin Kuryloski 854d01d9a5 Restore the original -include_lib statements from before #6466
since this broke erlang_ls

requires rules_erlang 3.9.13
2023-04-20 12:40:45 +02:00
Rin Kuryloski 8de8f59d47 Use gazelle generated bazel files
Bazel build files are now maintained primarily with `bazel run
gazelle`. This will analyze and merge changes into the build files as
necessitated by certain code changes (e.g. the introduction of new
modules).

In some cases there hints to gazelle in the build files, such as `#
gazelle:erlang...` or `# keep` comments. xref checks on plugins that
depend on the cli are a good example.
2023-04-17 18:13:18 +02:00
Rin Kuryloski 8a7eee6a86 Ignore warnings when building plt files for dependencies
As we don't generally care if a dependency has warnings, only the
target
2023-04-17 10:09:24 +02:00
Rin Kuryloski 609171ec70 Rename the tanzu cli scope to vmware
And update other references to commercial editions
2023-02-16 13:49:54 +01:00
Ilia Kurenkov 051419e46b
Fix descriptions of auth metrics. 2023-02-11 23:36:21 +01:00
Alexey Lebedeff c7da0da8b8 Cleanup dialyzer calls
- Use the same base .plt everywhere, so there is no need to list
standard apps everywhere
- Fix typespecs: some typos and the use of not-exported types
2023-02-06 17:05:30 +01:00
Rin Kuryloski 5ef8923462 Avoid the need to pass package name to rabbitmq_integration_suite 2023-01-18 15:25:27 +01:00
Rin Kuryloski a317b30807 Use improved assert_suites2 macro from rules_erlang 3.9.0 2023-01-18 15:07:06 +01:00
Michael Klishin ba7b44df8a
Merge pull request #6879 from rabbitmq/dialyzer-warnings-rabbitmq-prometheus
Fix all dialyzer warnings in rabbitmq_prometheus
2023-01-13 12:21:15 -06:00
Alexey Lebedeff cd92258346 Fix all dialyzer warnings in rabbitmq_prometheus 2023-01-13 15:52:26 +01:00
Michal Kuratczyk 510415f8b9
Update prometheus.erl to 4.10.0
Since 4.10.0 was released specifically to address an issue we
encountered in RabbitMQ integration with prometheus.erl, new test was
added to validate this functionality in the future.
2023-01-13 10:24:41 +01:00
Michael Klishin 8e8def801c
Wording 2023-01-03 14:19:58 -05:00
Ilia Kurenkov 98a5e34e90 DRY: Link docs for `/metrics/detailed` endpoint to website. 2023-01-03 13:45:45 +01:00
Michael Klishin ec4f1dba7d
(c) year bump: 2022 => 2023 2023-01-01 23:17:36 -05:00
Luke Bakken 7fe159edef
Yolo-replace format strings
Replaces `~s` and `~p` with their unicode-friendly counterparts.

```
git ls-files *.erl | xargs sed -i.ORIG -e s/~s>/~ts/g -e s/~p>/~tp/g
```
2022-10-10 10:32:03 +04:00
Loïc Hoguin 73dd0acf01
rabbit_prometheus_http_SUITE: Update tests for new CQs
CQs without consumers will have only one message in memory.
2022-09-27 12:00:10 +02:00
Michael Klishin 96b6e6c368
Merge pull request #5463 from rabbitmq/global-metrics-values
Move message rate metrics from aggregated to global counters
2022-09-12 20:25:08 +04:00
Iliia Khaprov - VMware e15d12d767
Merge pull request #5449 from rabbitmq/grafana-9-support
Update RabbitMQ Dashboards to support latest Grafana versions
2022-08-24 21:25:56 +02:00
David Ansari c3cccf4963
Rename run_queues_length_total to run_queues_length
It's a gauge, not a counter.
@deadtrickster fixed the bug in d0feb0df58
See #4380
2022-08-24 18:04:14 +02:00
Péter Gömöri bf00ee4cfc Fix a typo in a comment in prometheus_rabbitmq_core_metrics_collector 2022-08-23 00:54:35 +02:00
Connor Rogers 6ee0a318e8
Move message rate metrics from channel/queue aggregation to global counters 2022-08-08 16:19:01 +01:00
Connor Rogers c88326ef23
Add README.md for creating/updating dashboards 2022-08-05 17:41:47 +01:00
Connor Rogers e35fd65ff3
Fix overview graphs in Grafana 9
'-1' is no longer accepted as of Grafana 9, and causes a console error when rendering
2022-08-05 17:09:34 +01:00
Connor Rogers 9ac6862e06
Fix dist link graph
Both directions of the link were showing as one entry instead of two.

This is beacuse of https://github.com/flant/grafana-statusmap/issues/277
2022-08-05 16:51:21 +01:00
Connor Rogers 42f30ba7c3
Set time series to show all series in tooltip 2022-08-05 16:14:29 +01:00
Connor Rogers 40767cdae4
Take dashboard definitions straight from exported Grafana for simplicity 2022-08-05 16:09:30 +01:00
Connor Rogers 4d28eef0f8
Migrate from deprecated panels in Grafana 2022-08-05 15:46:27 +01:00
Connor Rogers 8e404ecd04
Update to supported Grafana and Prometheus versions 2022-08-05 12:52:26 +01:00
Jean-Sébastien Pédron 6e9ee4d0da
Remove test code which depended on the `quorum_queue` feature flags
These checks are now irrelevant as the feature flag is required.
2022-08-01 12:41:30 +02:00
Iliia Khaprov 360db38db0 Add process_start_time_seconds metrics. See #4539 2022-07-13 08:35:33 +02:00
Philip Kuryloski 15a79466b1 Use the new xref2 macro from rules_erlang
That adopts the modern erlang.mk xref behaviour
2022-06-09 23:18:28 +02:00
Philip Kuryloski 327f075d57 Make rabbitmq-server work with rules_erlang 3
Also rework elixir dependency handling, so we no longer rely on mix to
fetch the rabbitmq_cli deps

Also:

- Specify ra version with a commit rather than a branch
- Fixup compilation options for erlang 23
- Add missing ra reference in MODULE.bazel
- Add missing flag in oci.yaml
- Reduce bazel rbe jobs to try to save memory
- Use bazel built erlang for erlang git master tests
- Use the same cache for all the workflows but windows
- Avoid using `mix local.hex --force` in elixir rules
  - Fetching seems blocked in CI, and this should reduce hex api usage in
    all builds, which is always nice
- Remove xref and dialyze tags since rules_erlang 3 includes them in
  the defaults
2022-06-08 14:04:53 +02:00
Loïc Hoguin dc70cbf281
Update Erlang.mk and switch to new xref code 2022-05-31 13:51:12 +02:00
Michael Klishin 018b04a1ea
Wording 2022-04-26 23:08:00 +04:00
Péter Gömöri 35b21797ae Expose head_message_timestamp via Prometheus plugin as well
It is already exposed via rabbitmqctl and the API. It is also exposed by
old or unofficial prometheus plugins and other monitoring
integrations (DataDog).
2022-04-26 15:42:57 +02:00
Michael Klishin 7c47d0925a
Revert "Correct a double quote introduced in #4603"
This reverts commit 6a44e0e2ef.

That wiped a lot of files unintentionally
2022-04-20 16:05:56 +04:00
Michael Klishin 6a44e0e2ef
Correct a double quote introduced in #4603 2022-04-20 16:01:29 +04:00
Luke Bakken dba25f6462
Replace files with symlinks
This prevents duplicated and out-of-date instructions.
2022-04-15 06:04:29 -07:00
Loïc Hoguin 499e0b9197
Remove the CQv1 disabled stats from management/Prometheus 2022-04-05 12:37:54 +02:00
Michael Klishin c38a3d697d
Bump (c) year 2022-03-21 01:21:56 +04:00
David Ansari a3905da47c Add note about missed Prometheus counter updates
Currently, the quorum queue state machine updates counters via mod_call effects
which are not guaranteed to be executed.

They are updated via mod_call effects such that only the leader
increments the counter (and not the followers).

In certain failure scenarios when dead-lettering lots of messages
at the same time, these mod_call effects might not be executed.

Hence, one shouldn't rely that counters for dead lettered messages
and dead lettered confirmed messages match up 100% even though all
dead-lettered messages were confirmed eventually.
2022-02-28 16:28:09 +01:00
David Ansari 8c286cc680 Add Prometheus metrics for dead-lettered messages
> curl -s localhost:15692/metrics | grep rabbitmq_global_messages_dead_lettered
\# TYPE rabbitmq_global_messages_dead_lettered_delivery_limit_total counter
\# HELP rabbitmq_global_messages_dead_lettered_delivery_limit_total Total number of messages dead-lettered due to delivery-limit exceeded
rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0
\# TYPE rabbitmq_global_messages_dead_lettered_expired_total counter
\# HELP rabbitmq_global_messages_dead_lettered_expired_total Total number of messages dead-lettered due to message TTL exceeded
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0
\# TYPE rabbitmq_global_messages_dead_lettered_rejected_total counter
\# HELP rabbitmq_global_messages_dead_lettered_rejected_total Total number of messages dead-lettered due to basic.reject or basic.nack
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0
\# TYPE rabbitmq_global_messages_dead_lettered_confirmed_total counter
\# HELP rabbitmq_global_messages_dead_lettered_confirmed_total Total number of messages dead-lettered and confirmed by target queues
rabbitmq_global_messages_dead_lettered_confirmed_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
\# TYPE rabbitmq_global_messages_dead_lettered_maxlen_total counter
\# HELP rabbitmq_global_messages_dead_lettered_maxlen_total Total number of messages dead-lettered due to overflow drop-head or reject-publish-dlx
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0

A few notes:
* dead_letter_strategy 'disabled' means either user did not configure
  dead-letter-exchange or configured dead-letter-exchange does not
  exist.
* Only time series that make sense get output.
  Example 1: Combination of 'at_least_once' and 'maxlen' will always be 0.
  Hence, we omit that time series.
  Example 2: 'confirmed' makes only sense with quorum queues and
  'at_least_once'.
  Example 3: 'delivery_limit' makes only sense with quorum queues.
* Users get to know *why* messages were dead-lettered.
* Before this commit, there was no possibilities for users to alert
  based on messages being dropped from the head of the queue when
  overflow=drop-head.
* Users can now easily create alerts:
  Example 1: Message gets silently dropped (i.e.
  dead_letter_strategy='disabled') instead of actually dead-lettered.
  Example 2: Detect dead-letter topology misconfigurations.
  Example 3: Messages expire
  Example 4: Messages overflow
  Example 5: Messages requeued too often
* Stream queues by definition do not dead-letter.
2022-02-28 16:28:02 +01:00