Commit Graph

93 Commits

Author SHA1 Message Date
Luke Bakken ce86fb989e
Remove all usage of `cacerts` from configuration schemas
As mentioned in discussion #14426, the way that `cacerts` is handled by
cuttlefish schemas simply will not work if set.

If `cacerts` were set to a string value containing one X509 certificate,
it would eventually result in a crash because the `cacerts` ssl option
must be of [this type](https://www.erlang.org/doc/apps/ssl/ssl.html#t:client_option_cert/0):

```
{cacerts, CACerts :: [public_key:der_encoded()] | [public_key:combined_cert()]}
```

Neither of those are strings, of course.

This PR removes all use of `cacerts` in cuttlefish schemas. In addition,
it filters out `cacerts` and `certs_keys` from being JSON-encoded by an
HTTP API call to `/api/overview`. It _is_ technically possible to set
`cacerts` via `advanced.config`, so, if set, it would crash this API
call, as would `certs_keys`.
2025-10-01 08:14:14 -07:00
Michal Kuratczyk c34c803754
Remove flake in prometheus_http_SUITE (#14367)
Sometimes the metrics for streams created by `stream_pub_sub_metrics`
would be returned when the next test starts, breaking the assertions.
2025-08-12 15:03:20 +02:00
Jean-Sébastien Pédron 267445680f
rabbit_prometheus_http_SUITE: Run `stream_pub_sub_metrics` first
[Why]
I wonder if a previous test interferes with the metrics verified by this
test case. To be safer, execute it first and let's see what happens.
2025-08-08 10:12:57 +02:00
Jean-Sébastien Pédron 2bc8d117b6
rabbit_prometheus_http_SUITE: Log more details for a future failure in CI
[Why]
The `stream_pub_sub_metrics` test failed at least once in CI because the
`rabbitmq_stream_consumer_max_offset_lag` was 4 instead of the expected
3 on line 815.

I couldn't reproduce the problem so far.

[How]
The test case now logs the initial value of that metric at the beginning
of the test function. Hopefully this will give us some clue for the day
it fails again.
2025-08-08 10:12:56 +02:00
Jean-Sébastien Pédron c6729351b6
rabbit_prometheus_http_SUITE: Use another Erlang metric
[Why]
It looks like `erlang_vm_dist_node_queue_size_bytes` is not always
present, even though other Erlang-specific metrics are present.

[How]
The goal is to ensure Erlang metrics are present in the output, so just
use another one that is likely to be there.
2025-07-30 15:04:48 +02:00
Michal Kuratczyk a5106c6a61
Expose ra counters (#13895)
Trigger a 4.2.x alpha release build / trigger_alpha_build (push) Waiting to run Details
Test (make) / Build and Xref (1.18, 26) (push) Waiting to run Details
Test (make) / Build and Xref (1.18, 27) (push) Waiting to run Details
Test (make) / Build and Xref (1.18, 28) (push) Waiting to run Details
Test (make) / Test (1.18, 28, khepri) (push) Waiting to run Details
Test (make) / Test (1.18, 28, mnesia) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.18, 28, khepri) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.18, 28, mnesia) (push) Waiting to run Details
Test (make) / Type check (1.18, 28) (push) Waiting to run Details
Switch from ra_metrics to ra_counters

* Expose many more metrics (they are also up to date)
* Bump Seshat, Ra, Osiris, Prometheus.erl
* switch from proplists to maps
2025-07-24 10:43:20 +02:00
Michal Kuratczyk 2a93bbcebd
RMQ-1460: Emit queue_info metric (#13583)
To allow filtering on queue type or membership status,
we need an info metric for queues; see
https://grafana.com/blog/2021/08/04/how-to-use-promql-joins-for-more-effective-queries-of-prometheus-metrics-at-scale/#info-metrics

With this change, per-object metrics and the detailed metrics
(if queue-related families are requested) will contain
rabbitmq_queue_info / rabbitmq_detailed_queue_info with a value of 1
and labels including the queue name, vhost, queue type and membership
status.
2025-03-27 15:54:26 +01:00
Arnaud Cogoluègnes b8244f70f4
Pull from socket up to 10 times in stream test utils (#13588)
To make sure to have enough data to complete a command.
2025-03-24 09:13:31 +01:00
Arnaud Cogoluègnes b3b0940024
Fix wait-for-confirms sequence in stream test utils
And refine the implementation and its usage.
2025-01-21 17:38:58 +01:00
Michael Klishin 968eefa1bb
Bump (c) line year
There are no functional changes to this massive diff.
2025-01-01 17:54:10 -05:00
Diana Parra Corbacho 40cb4f46e8 Tests: rabbit_prometheus_http_SUITE longer wait 2024-12-16 11:58:05 +01:00
Péter Gömöri bbc902ef23
Add test for stream consumer max offset lag prometheus metric
(cherry picked from commit 0c76054a0c)
2024-11-19 19:14:12 -05:00
Jean-Sébastien Pédron d6024e30f4
rabbit_prometheus_http_SUITE: Start broker once in `special_chars` group
`init_per_group/3`, which starts the broker, was already called earlier
in the function.

This fixes a bug where the node can't be stopped in `end_per_group/2`,
attecting the next group ability to start one.
2024-10-30 10:08:56 +01:00
David Ansari 960808e6b2
Emit histogram metric for received message sizes per protocol (#12342)
* Add global histogram metrics for received message sizes per-protocol

fixup: add new files to bazel

fixup: expose message_size_bytes as prometheus classic histogram type

`rabbit_msg_size_metrics` does not use `seshat` any more, but
`counters` directly.

fixup: add msg_size_metrics unit test

* Improve message size histogram

1.
Avoid unnecessary time series emitted for stream protocol
The stream protocol cannot observe message sizes.
This commit ensures that the following time series are omitted:
```
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="64"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="256"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="1024"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="4096"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="16384"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="65536"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="262144"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="1048576"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="4194304"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="16777216"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="67108864"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="268435456"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="+Inf"} 0
rabbitmq_global_message_size_bytes_count{protocol="stream"} 0
rabbitmq_global_message_size_bytes_sum{protocol="stream"} 0
```

This reduces the number of time series by 15.

2.
Further reduce the number of time series by reducing the number of
buckets. Instead of 13 bucktes, emit only 9 buckets. Buckets are not
free, each is an extra time series stored.

Prior to this commit:
```
curl -s -u guest:guest localhost:15692/metrics | ag message_size | wc -l
      92
```

After this commit:
```
curl -s -u guest:guest localhost:15692/metrics | ag message_size | wc -l
      57
```

3.
The emitted metric should be called
`rabbitmq_message_size_bytes_bucket` instead of `rabbitmq_global_message_size_bytes_bucket`.
The latter is poor naming. There is no need to use `global` in
the metric name given that this metric doesn't exist in the old flawed
aggregated metrics.

4.
This commit simplies module `rabbit_global_counters`.

5.
Avoid garbage collecting the 10-elements list of buckets per message
being received.

---------

Co-authored-by: Péter Gömöri <peter@84codes.com>
2024-09-24 18:08:24 +02:00
Simon Unge 2766122836 Move shovel prometheus to its own plugin 2024-08-08 01:26:49 -04:00
Simon Unge 4c44ebd8eb Add dynamic and static promethues metric gauge 2024-08-02 22:19:20 +00:00
Michal Kuratczyk 618f695645
Move memory breakdown metrics to new endpoint
Collecting them on a large system (tens of thousands of processes
or more) can be time consuming as we iterate over all processes.
By putting them on a separate endpoint, we make that opt-in
2024-07-23 10:17:37 +02:00
Michael Klishin 0caea225c6 Assertions for #11743 2024-07-18 21:32:42 -04:00
Lois Soto Lopez bb93e718c2 Prometheus: some per-exchange/per-queue metrics aggregated per-channel
Add copies of some per-object metrics that are labeled per-channel
aggregated to reduce cardinality. These metrics are valuable and
easier to process if exposed on per-exchange and per-queue basis.
2024-07-16 14:30:25 +02:00
Michael Klishin 0700e1cdc4 Revert "Provide per-exchange/queue metrics w/out channelID"
This reverts commit 3ed2e30e3a.
2024-07-11 21:34:52 -04:00
Lois Soto Lopez ec5e258825 Provide per-exchange/queue metrics w/out channelID 2024-07-11 17:34:18 -04:00
Michal Kuratczyk cfa3de4b2b
Remove unused imports (thanks elp!) 2024-05-23 16:36:08 +02:00
Iliia Khaprov 8925dfa916 Close #10345. Add promtheus_rabbitmq_federation_collector.
rabbitmq_federation_links gauge metric with status lable.
2024-03-14 09:29:01 +01:00
Michael Klishin f414c2d512
More missed license header updates #9969 2024-02-05 11:53:50 -05:00
Michael Klishin 01092ff31f
(c) year bumps 2024-01-01 22:02:20 -05:00
Péter Gömöri fec09c0792 Escape prometheus core metric label values
For example special characters like double quotes are allowed in queue
names, in which case detailed metrics could produce unparsable text
format output.
2023-12-03 01:14:44 +01:00
Michael Klishin 1b642353ca
Update (c) according to [1]
1. https://investors.broadcom.com/news-releases/news-release-details/broadcom-and-vmware-intend-close-transaction-november-22-2023
2023-11-21 23:18:22 -05:00
Simon Unge 8b3ca4c972 See #8605. Add authentcation support to prometheus. 2023-06-23 13:54:45 -07:00
Chunyi Lyu 4ddb0c2038 Support TLS-only listener for Prometheus
- tcp listener can be turned off by setting
'prometheus.tcp.listener = none'
- config schema follows web_mqtt and web_stomp
2023-05-05 15:44:53 +01:00
Michal Kuratczyk 510415f8b9
Update prometheus.erl to 4.10.0
Since 4.10.0 was released specifically to address an issue we
encountered in RabbitMQ integration with prometheus.erl, new test was
added to validate this functionality in the future.
2023-01-13 10:24:41 +01:00
Michael Klishin ec4f1dba7d
(c) year bump: 2022 => 2023 2023-01-01 23:17:36 -05:00
Luke Bakken 7fe159edef
Yolo-replace format strings
Replaces `~s` and `~p` with their unicode-friendly counterparts.

```
git ls-files *.erl | xargs sed -i.ORIG -e s/~s>/~ts/g -e s/~p>/~tp/g
```
2022-10-10 10:32:03 +04:00
Loïc Hoguin 73dd0acf01
rabbit_prometheus_http_SUITE: Update tests for new CQs
CQs without consumers will have only one message in memory.
2022-09-27 12:00:10 +02:00
Jean-Sébastien Pédron 6e9ee4d0da
Remove test code which depended on the `quorum_queue` feature flags
These checks are now irrelevant as the feature flag is required.
2022-08-01 12:41:30 +02:00
Michael Klishin 7c47d0925a
Revert "Correct a double quote introduced in #4603"
This reverts commit 6a44e0e2ef.

That wiped a lot of files unintentionally
2022-04-20 16:05:56 +04:00
Michael Klishin 6a44e0e2ef
Correct a double quote introduced in #4603 2022-04-20 16:01:29 +04:00
Michael Klishin c38a3d697d
Bump (c) year 2022-03-21 01:21:56 +04:00
Alexey Lebedeff 7676ed9685 Use `rabbitmq_cluster_` prefix for cluster-wide metrics 2021-11-24 16:49:43 +01:00
Alexey Lebedeff 6e3012aaf9 Add optional metrics for vhost and exchange count
These can make sense in some scenarios, e.g. when vhost/exchanges are
+created using self-service automation
2021-11-24 11:00:41 +01:00
Alexey Lebedeff b9ebfb8980 Fix ssl port handling in prometheus plugin
All ssl options were stored in the same proplist, and the code was
then trying to determine whether an option actually belongs to ranch
ssl options or not.

Some keys landed in the wrong place, like it did happen in #2975 -
different ports were mentioned in listener config (default at
top-level, and non-default in `ssl_opts`). Then `ranch` and
`rabbitmq_web_dispatch` were treating this differently.

This change just moves all ranch ssl opts into proper place using
schema, removing any need for guessing in code.

The only downside is that advanced config compatibility is broken.
2021-10-20 14:55:33 +02:00
Michael Klishin 3826a0df25
Compile #3561 2021-10-13 01:27:16 +03:00
Johannes Würbach 84de860b4c
feat(prom): expose cluster id in identity 2021-10-12 15:43:46 +02:00
Alexey Lebedeff 989a299720 Emit identity info in prometheus /metrics/detailed endpoint
This is needed to make filtering metrics on a cluster name possible.
2021-09-28 19:35:02 +02:00
Alexey Lebedeff 5501d07b8b Use rabbitmq_ct_helpers to allocate prometheus port
This test always used standard 15692 before, which were causing
conflicts with e.g. local `make run-broker`.
2021-09-22 15:23:35 +02:00
Alexey Lebedeff 4bb2262140 Allow selective querying for prometheus plugin 2021-09-20 14:59:17 +02:00
dcorbacho c9305d948a
Use number of publishing channels as global publishers in amqp091 2021-06-29 08:10:42 +01:00
Gerhard Lazu c7971252cd
Global counters per protocol + protocol AND queue_type
This way we can show how many messages were received via a certain
protocol (stream is the second real protocol besides the default amqp091
one), as well as by queue type, which is something that many asked for a
really long time.

The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.

This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
https://github.com/rabbitmq/rabbitmq-server/pull/3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through. We continued working on this with @dumbbell, and
the most important changes are captured in
https://github.com/rabbitmq/seshat/pull/1.

We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, `rabbitmq_global_`, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pairs @kjnilsson @dcorbacho @dumbbell
(this is multiple commits squashed into one)

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-06-22 14:14:21 +01:00
Gerhard Lazu f3f3e8aae9
Always show aggregated auth_attempts, add detailed when per object enabled
The metrics have different names now, so we can't end up with duplicate TYPEs.

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-01-22 16:38:44 +00:00
Gerhard Lazu 5a6e3f235b
Single auth_attempts declarations when per-object metrics enabled
Closes #2740

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-01-22 11:36:42 +00:00
Michael Klishin 52479099ec
Bump (c) year 2021-01-22 09:00:14 +03:00