Commit Graph

527 Commits

Author SHA1 Message Date
Michal Kuratczyk c34c803754
Remove flake in prometheus_http_SUITE (#14367)
Sometimes the metrics for streams created by `stream_pub_sub_metrics`
would be returned when the next test starts, breaking the assertions.
2025-08-12 15:03:20 +02:00
Jean-Sébastien Pédron 267445680f
rabbit_prometheus_http_SUITE: Run `stream_pub_sub_metrics` first
[Why]
I wonder if a previous test interferes with the metrics verified by this
test case. To be safer, execute it first and let's see what happens.
2025-08-08 10:12:57 +02:00
Jean-Sébastien Pédron 2bc8d117b6
rabbit_prometheus_http_SUITE: Log more details for a future failure in CI
[Why]
The `stream_pub_sub_metrics` test failed at least once in CI because the
`rabbitmq_stream_consumer_max_offset_lag` was 4 instead of the expected
3 on line 815.

I couldn't reproduce the problem so far.

[How]
The test case now logs the initial value of that metric at the beginning
of the test function. Hopefully this will give us some clue for the day
it fails again.
2025-08-08 10:12:56 +02:00
Jean-Sébastien Pédron c6729351b6
rabbit_prometheus_http_SUITE: Use another Erlang metric
[Why]
It looks like `erlang_vm_dist_node_queue_size_bytes` is not always
present, even though other Erlang-specific metrics are present.

[How]
The goal is to ensure Erlang metrics are present in the output, so just
use another one that is likely to be there.
2025-07-30 15:04:48 +02:00
Michal Kuratczyk a5106c6a61
Expose ra counters (#13895)
Trigger a 4.2.x alpha release build / trigger_alpha_build (push) Waiting to run Details
Test (make) / Build and Xref (1.18, 26) (push) Waiting to run Details
Test (make) / Build and Xref (1.18, 27) (push) Waiting to run Details
Test (make) / Build and Xref (1.18, 28) (push) Waiting to run Details
Test (make) / Test (1.18, 28, khepri) (push) Waiting to run Details
Test (make) / Test (1.18, 28, mnesia) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.18, 28, khepri) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.18, 28, mnesia) (push) Waiting to run Details
Test (make) / Type check (1.18, 28) (push) Waiting to run Details
Switch from ra_metrics to ra_counters

* Expose many more metrics (they are also up to date)
* Bump Seshat, Ra, Osiris, Prometheus.erl
* switch from proplists to maps
2025-07-24 10:43:20 +02:00
Michal Kuratczyk 175ba70e8c
[skip ci] Remove rabbit_log and switch to LOG_ macros 2025-07-18 08:42:59 +02:00
Michael Klishin 61dcfd5fa6
Use the standard 'undefined' here 2025-06-04 12:31:27 +04:00
Michael Klishin e9fc656241
Wrap TLS options password into a function in more places
A follow-up to #13958 #13999.

Pair: @dcorbacho.
2025-06-04 12:24:45 +04:00
Michal Kuratczyk c0368a0d24
[skip ci] Update dashboards for RabbitMQ 4.1
Key changes:
- endpoint variable to handle scraping multiple endpoints
- message size panels (new metric in 4.1)
- panels at the top of the Overview dashboard should be more up to date
  (they show the latest value)
- values should be accurate if multiple endpoints are scraped
  (previously, many would be doubled)
- Nodes table shows fewer volumns and shows node uptime
2025-04-16 17:48:21 +02:00
Michal Kuratczyk f0976b48b2
queue info metric: guard against whereis returning `undefined` (#13646) 2025-03-28 12:37:42 +01:00
Michal Kuratczyk 2a93bbcebd
RMQ-1460: Emit queue_info metric (#13583)
To allow filtering on queue type or membership status,
we need an info metric for queues; see
https://grafana.com/blog/2021/08/04/how-to-use-promql-joins-for-more-effective-queries-of-prometheus-metrics-at-scale/#info-metrics

With this change, per-object metrics and the detailed metrics
(if queue-related families are requested) will contain
rabbitmq_queue_info / rabbitmq_detailed_queue_info with a value of 1
and labels including the queue name, vhost, queue type and membership
status.
2025-03-27 15:54:26 +01:00
Arnaud Cogoluègnes b8244f70f4
Pull from socket up to 10 times in stream test utils (#13588)
To make sure to have enough data to complete a command.
2025-03-24 09:13:31 +01:00
Loïc Hoguin c5d150a7ef
Use Erlang.mk's native Elixir support for CLI
This avoids using Mix while compiling which simplifies
a number of things and let us do further build improvements
later on.

Elixir is only enabled from within rabbitmq_cli currently.

Eunit is disabled since there are only Elixir tests.

Dialyzer will force-enable Elixir in order to process
Elixir-compiled beam files.

This commit also includes a few changes that are
related:

 * The Erlang distribution will now be started for parallel-ct

 * Many unnecessary PROJECT_MOD lines have been removed

 * `eunit_formatters` has been removed, it provides little value

 * The new `maybe_flock` Erlang.mk function is used where possible

 * Build test deps when testing rabbitmq_cli (Mix won't do it anymore)

 * rabbitmq_ct_helpers now use the early plugins to have Dialyzer
   properly set up
2025-03-18 10:02:49 +01:00
Aitor Perez 07adc3e571
Remove Bazel files 2025-03-13 13:42:34 +00:00
Tony Lewis Hiroaki URAHAMA 3c5f4d3d39
Bump Prometheus Version 2025-03-01 18:21:51 +00:00
Michal Kuratczyk 703ee8529e
Add rabbitmq_endpoint label to rabbitmq_identity_info 2025-02-07 15:51:40 +01:00
Michal Kuratczyk 16700f9f19
Better metric description 2025-01-30 11:33:53 +01:00
Arnaud Cogoluègnes b3b0940024
Fix wait-for-confirms sequence in stream test utils
And refine the implementation and its usage.
2025-01-21 17:38:58 +01:00
Michael Klishin 968eefa1bb
Bump (c) line year
There are no functional changes to this massive diff.
2025-01-01 17:54:10 -05:00
Diana Parra Corbacho 40cb4f46e8 Tests: rabbit_prometheus_http_SUITE longer wait 2024-12-16 11:58:05 +01:00
Péter Gömöri bbc902ef23
Add test for stream consumer max offset lag prometheus metric
(cherry picked from commit 0c76054a0c)
2024-11-19 19:14:12 -05:00
markus812498 085ec75253
Expose max offset lag of stream consumers via Prometheus
Supports both per stream (detailed) and aggregated (metrics) values.

(cherry picked from commit e82058e872)
2024-11-19 19:14:06 -05:00
Anh Nguyen dc9311a561 Update Erlang Distribution dashboard panel and instance filtering
- Modified metric expression and legend format in State of distribution links
- Changed panel type from 'flant-statusmap-panel' to 'status-history' for Process state
2024-11-14 11:04:07 +07:00
Anh Nguyen b9dc0ea3b4 Add instance filtering to Erlang BEAM Grafana dashboard metrics
- Updated metric expressions to include instance filtering with {instance=\"$node\"}
  for the following metrics:
  - erlang_vm_statistics_run_queues_length
  - erlang_vm_statistics_dirty_io_run_queue_length
  - erlang_vm_statistics_dirty_cpu_run_queue_length
- Added 'DS_PROMETHEUS' as a templated data source variable
2024-11-13 20:20:02 +07:00
Jean-Sébastien Pédron d6024e30f4
rabbit_prometheus_http_SUITE: Start broker once in `special_chars` group
`init_per_group/3`, which starts the broker, was already called earlier
in the function.

This fixes a bug where the node can't be stopped in `end_per_group/2`,
attecting the next group ability to start one.
2024-10-30 10:08:56 +01:00
Luke Bakken 3d668fda46
Grafana: add a runtime/Erlang/BEAM dashboard (#12456)
* Add BEAM dashboard

Also update the other dashboards by opening in Grafana v11.2.2 and ensuring they work as expected.

* Update the Erlang-Distributions-Compare dashboard

* Update the RabbitMQ-Overview dashboard

* Update the RabbitMQ-Quorum-Queues-Raft dashboard

* Update the RabbitMQ-Stream dashboard

* Update distribution link status panel

---------

Co-authored-by: Michal Kuratczyk <mkuratczyk@vmware.com>
2024-10-17 07:10:54 -07:00
Michael Klishin 80f4797e76 Remove multiple mentions of global prefetch
As suggested by @johanrhodin in #12454.

This keeps the Prometheus plugin part but
marks it as deprecated. We can remove it in
4.1.
2024-10-04 20:47:37 -04:00
David Ansari 1e3f4e5db9 Emit histogram metric for received message sizes per protocol (#12342)
* Add global histogram metrics for received message sizes per-protocol

fixup: add new files to bazel

fixup: expose message_size_bytes as prometheus classic histogram type

`rabbit_msg_size_metrics` does not use `seshat` any more, but
`counters` directly.

fixup: add msg_size_metrics unit test

* Improve message size histogram

1.
Avoid unnecessary time series emitted for stream protocol
The stream protocol cannot observe message sizes.
This commit ensures that the following time series are omitted:
```
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="64"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="256"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="1024"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="4096"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="16384"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="65536"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="262144"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="1048576"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="4194304"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="16777216"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="67108864"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="268435456"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="+Inf"} 0
rabbitmq_global_message_size_bytes_count{protocol="stream"} 0
rabbitmq_global_message_size_bytes_sum{protocol="stream"} 0
```

This reduces the number of time series by 15.

2.
Further reduce the number of time series by reducing the number of
buckets. Instead of 13 bucktes, emit only 9 buckets. Buckets are not
free, each is an extra time series stored.

Prior to this commit:
```
curl -s -u guest:guest localhost:15692/metrics | ag message_size | wc -l
      92
```

After this commit:
```
curl -s -u guest:guest localhost:15692/metrics | ag message_size | wc -l
      57
```

3.
The emitted metric should be called
`rabbitmq_message_size_bytes_bucket` instead of `rabbitmq_global_message_size_bytes_bucket`.
The latter is poor naming. There is no need to use `global` in
the metric name given that this metric doesn't exist in the old flawed
aggregated metrics.

4.
This commit simplies module `rabbit_global_counters`.

5.
Avoid garbage collecting the 10-elements list of buckets per message
being received.

---------

Co-authored-by: Péter Gömöri <peter@84codes.com>
2024-09-25 09:00:44 -04:00
Lois Soto Lopez 8377eda336 Comment added label clause to clarify need for it 2024-09-25 11:04:25 +02:00
Lois Soto Lopez 79f04c23f3 Avoid duplicate vhost label for prometh. queue-exchange metrics
Adds a specific clause on the
`prometheus_rabbitmq_core_metrics_collector:labels` function when the
associated metric item is a Queue + Exchange combo (`{Queue, Exchange}`)
2024-09-24 10:38:38 +02:00
Michael Davis 512f8838fd
Add prometheus tags for raft_cluster to non-QQ raft metrics
By default Ra will use the cluster name as the metrics key. Currently
atom values are ignored by the prometheus plugin's tag rendering
functions, so if you have a QQ and Khepri running and request the
`/metrics/per-object` or `/metrics/detailed` endpoints you'll see values
that don't have labels set for the `ra_metrics` metrics:

    # TYPE rabbitmq_raft_term_total counter
    # HELP rabbitmq_raft_term_total Current Raft term number
    rabbitmq_raft_term_total{vhost="/",queue="qq"} 9
    rabbitmq_raft_term_total 10

With this change we map the name of the Ra cluster to a "raft_cluster"
tag, so instead an example metric might be:

    # TYPE rabbitmq_raft_term_total counter
    # HELP rabbitmq_raft_term_total Current Raft term number
    rabbitmq_raft_term_total{vhost="/",queue="qq"} 9
    rabbitmq_raft_term_total{raft_cluster="rabbitmq_metadata"} 10

This affects metrics for Khepri and the stream coordinator.
2024-08-30 12:41:37 -04:00
Michal Kuratczyk 9b828c08b7
Remove HiPE 2024-08-28 09:18:28 +02:00
Michal Kuratczyk 116ab4f6fe
Remove memory_high_watermark_paging_ratio 2024-08-28 08:12:49 +02:00
Simon Unge 2766122836 Move shovel prometheus to its own plugin 2024-08-08 01:26:49 -04:00
Michael Klishin 1f1d422fa2 rabbitmq_shovel is a runtime dependency of rabbitmq_prometheus now 2024-08-02 23:15:28 -04:00
Simon Unge 4c44ebd8eb Add dynamic and static promethues metric gauge 2024-08-02 22:19:20 +00:00
Michal Kuratczyk 618f695645
Move memory breakdown metrics to new endpoint
Collecting them on a large system (tens of thousands of processes
or more) can be time consuming as we iterate over all processes.
By putting them on a separate endpoint, we make that opt-in
2024-07-23 10:17:37 +02:00
Michael Klishin 0caea225c6 Assertions for #11743 2024-07-18 21:32:42 -04:00
Michael Klishin e9b5f52512 Prometheus: expose memory breakdown metrics
Closes #11743.
2024-07-18 21:32:42 -04:00
Lois Soto Lopez bb93e718c2 Prometheus: some per-exchange/per-queue metrics aggregated per-channel
Add copies of some per-object metrics that are labeled per-channel
aggregated to reduce cardinality. These metrics are valuable and
easier to process if exposed on per-exchange and per-queue basis.
2024-07-16 14:30:25 +02:00
Michael Klishin 0700e1cdc4 Revert "Provide per-exchange/queue metrics w/out channelID"
This reverts commit 3ed2e30e3a.
2024-07-11 21:34:52 -04:00
Michael Klishin 2bd3a2d307 Revert "Update deps/rabbitmq_prometheus/src/collectors/prometheus_rabbitmq_core_metrics_collector.erl"
This reverts commit 64e0812ced.
2024-07-11 21:34:46 -04:00
Michael Klishin 6b1e003afe Revert "New metrics return on detailed only"
This reverts commit 1aec73b21c.
2024-07-11 21:34:40 -04:00
Lois Soto Lopez 18e667fc8f New metrics return on detailed only
Make new metrics return on detailed only and adjust some of the
help messages.
2024-07-11 17:34:18 -04:00
LoisSotoLopez cb2de0d9ea Update deps/rabbitmq_prometheus/src/collectors/prometheus_rabbitmq_core_metrics_collector.erl
Co-authored-by: Péter Gömöri <gomoripeti@users.noreply.github.com>
2024-07-11 17:34:18 -04:00
Lois Soto Lopez ec5e258825 Provide per-exchange/queue metrics w/out channelID 2024-07-11 17:34:18 -04:00
Loïc Hoguin bbfa066d79
Cleanup .gitignore files for the monorepo
We don't need to duplicate so many patterns in so many
files since we have a monorepo (and want to keep it).

If I managed to miss something or remove something that
should stay, please put it back. Note that monorepo-wide
patterns should go in the top-level .gitignore file.
Other .gitignore files are for application or folder-
specific patterns.
2024-06-28 12:00:52 +02:00
Loïc Hoguin cd35f7e7fa
Remove sockets_used/sockets_total metrics from UIs
Part of the removal of file_handle_cache.

The Prometheus endpoint was updated but the Grafana dashboard
was not.

The FD stats are using the system's state rather than
file_handle_cache so there's no need to remove them.
2024-06-24 12:07:51 +02:00
Michael Klishin 9e97c5d8e7 rabbitmq_prometheus.schema: wording 2024-06-21 21:58:43 -04:00
Michal Kuratczyk 141659a638 OTP27 support (#11366)
* "maybe" is now a keyword
* Bump horus to 0.2.5 and switch to hex
* Get rid of some deprecated callbacks/functions
2024-06-21 21:46:33 -04:00