rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
Alexey Lebedeff	7676ed9685	Use `rabbitmq_cluster_` prefix for cluster-wide metrics	2021-11-24 16:49:43 +01:00
Alexey Lebedeff	6e3012aaf9	Add optional metrics for vhost and exchange count These can make sense in some scenarios, e.g. when vhost/exchanges are +created using self-service automation	2021-11-24 11:00:41 +01:00
Alexey Lebedeff	b9ebfb8980	Fix ssl port handling in prometheus plugin All ssl options were stored in the same proplist, and the code was then trying to determine whether an option actually belongs to ranch ssl options or not. Some keys landed in the wrong place, like it did happen in #2975 - different ports were mentioned in listener config (default at top-level, and non-default in `ssl_opts`). Then `ranch` and `rabbitmq_web_dispatch` were treating this differently. This change just moves all ranch ssl opts into proper place using schema, removing any need for guessing in code. The only downside is that advanced config compatibility is broken.	2021-10-20 14:55:33 +02:00
Michael Klishin	3826a0df25	Compile #3561	2021-10-13 01:27:16 +03:00
Johannes Würbach	84de860b4c	feat(prom): expose cluster id in identity	2021-10-12 15:43:46 +02:00
Alexey Lebedeff	989a299720	Emit identity info in prometheus /metrics/detailed endpoint This is needed to make filtering metrics on a cluster name possible.	2021-09-28 19:35:02 +02:00
Alexey Lebedeff	5501d07b8b	Use rabbitmq_ct_helpers to allocate prometheus port This test always used standard 15692 before, which were causing conflicts with e.g. local `make run-broker`.	2021-09-22 15:23:35 +02:00
Alexey Lebedeff	4bb2262140	Allow selective querying for prometheus plugin	2021-09-20 14:59:17 +02:00
dcorbacho	c9305d948a	Use number of publishing channels as global publishers in amqp091	2021-06-29 08:10:42 +01:00
Gerhard Lazu	c7971252cd	Global counters per protocol + protocol AND queue_type This way we can show how many messages were received via a certain protocol (stream is the second real protocol besides the default amqp091 one), as well as by queue type, which is something that many asked for a really long time. The most important aspect is that we can also see them by protocol AND queue_type, which becomes very important for Streams, which have different rules from regular queues (e.g. for example, consuming messages is non-destructive, and deep queue backlogs - think billions of messages - are normal). Alerting and consumer scaling due to deep backlogs will now work correctly, as we can distinguish between regular queues & streams. This has gone through a few cycles, with @mkuratczyk & @dcorbacho covering most of the ground. @dcorbacho had most of this in https://github.com/rabbitmq/rabbitmq-server/pull/3045, but the main branch went through a few changes in the meantime. Rather than resolving all the conflicts, and then making the necessary changes, we (@gerhard + @kjnilsson) took all learnings and started re-applying a lot of the existing code from #3045. We are confident in this approach and would like to see it through. We continued working on this with @dumbbell, and the most important changes are captured in https://github.com/rabbitmq/seshat/pull/1. We expose these global counters in rabbitmq_prometheus via a new collector. We don't want to keep modifying the existing collector, which grew really complex in parts, especially since we introduced aggregation, but start with a new namespace, `rabbitmq_global_`, and continue building on top of it. The idea is to build in parallel, and slowly transition to the new metrics, because semantically the changes are too big since streams, and we have been discussing protocol-specific metrics with @kjnilsson, which makes me think that this approach is least disruptive and... simple. While at this, we removed redundant empty return value handling in the channel. The function called no longer returns this. Also removed all DONE / TODO & other comments - we'll handle them when the time comes, no need to leave TODO reminders. Pairs @kjnilsson @dcorbacho @dumbbell (this is multiple commits squashed into one) Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2021-06-22 14:14:21 +01:00
Gerhard Lazu	f3f3e8aae9	Always show aggregated auth_attempts, add detailed when per object enabled The metrics have different names now, so we can't end up with duplicate TYPEs. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2021-01-22 16:38:44 +00:00
Gerhard Lazu	5a6e3f235b	Single auth_attempts declarations when per-object metrics enabled Closes #2740 Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2021-01-22 11:36:42 +00:00
Michael Klishin	52479099ec	Bump (c) year	2021-01-22 09:00:14 +03:00
Mirah Gary	fe9881687c	Change per-object endpoint to `/metrics/per-object`. This conforms with other http endpoints.	2020-11-26 10:35:26 +01:00
Michal Kuratczyk	8b8a66cf0b	Add /metrics/per_object endpoint Regardless of the value of `return_per_object_metrics`, this endpoint always returns per-object metrics. This allows scraping both endpoints at different intervals or scraping per-object metrics only during debugging. Co-authored-by: Mirah Gary <mgary@vmware.com>	2020-11-19 18:00:42 +01:00
Michael Klishin	898a46d7bc	Switch to MPL2	2020-07-14 16:42:52 +03:00
Gerhard Lazu	cab99c29f0	Add failing test for erlang_vm_dist_node_queue_size_bytes Have to force prometheus.erl to a version that does not have this feature, otherwise the test would succeed. pwd /Users/gerhard/github.com/rabbitmq/3.9.x/deps/rabbitmq_prometheus rm -fr ../prometheus.erl make tests open logs/index.html Pull request content: Expose & visualise distribution buffer busy limit - zdbbl > This will be closed after TGIR S01E04 gets recorded. > The goal is to demonstrate how to do this, and then let an external contributor have a go. Before this patch, the Data buffered in the distribution links queue graph was empty. This is what that graph looks like after this gets applied: ![image](https://user-images.githubusercontent.com/3342/80223464-3bf28580-8640-11ea-8851-8f33f1c4fd4f.png) ## References - [RabbitMQ Runtime Tuning - Inter-node Communication Buffer Size](https://www.rabbitmq.com/runtime.html#distribution-buffer) - [erl +zdbbl](https://erlang.org/doc/man/erl.html#+zdbbl) Fixes #39 Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-06-24 16:34:48 +01:00
Gerhard Lazu	db2f70753e	Add tests for product name & version Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-06-18 11:51:47 +01:00
Gerhard Lazu	cba6aa06f4	Fix test that was made to fail on purpose Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-04-25 00:14:05 +01:00
Gerhard Lazu	9cc33c571d	Print the response body by default Makes is easier to spot why a match failed. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-04-25 00:06:32 +01:00
Jean-Sébastien Pédron	636c3f78dc	Update copyright (year 2020)	2020-03-10 16:42:08 +01:00
Gerhard Lazu	e7c997744d	Improve config for returning metrics per object Since metrics are now aggregated by default, it made more sense to use the inverse meaning of disabling aggregation, and call it a positive and explicit action: return_per_object_metrics. Naming pair: @michaelklishin Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-02-11 13:08:00 +00:00
dcorbacho	253ef8e827	Fix 0/0 and enable all tests	2020-02-07 17:08:10 +01:00
Gerhard Lazu	09b29057af	Aggregate metrics by default Having talked to @michaelklishin we've decided to enable metrics aggregation by default so that RabbitMQ nodes with many objects serve the same amount of metrics quickly rather than taking many seconds and transferring many MBs of data on every scrape. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-02-07 15:19:30 +00:00
Gerhard Lazu	11d676f3e1	Replace histogram type with gauge for raft_entry_commit_latency_seconds We want to keep the same metric type regardless whether we aggregate or don't. If we had used a histogram type, considering the ~12 buckets that we added, it would have meant 12 extra metrics per queue which would have resulted in an explosion of metrics. Keeping the gauge type and aggregating latencies across all members. re https://github.com/rabbitmq/rabbitmq-prometheus/pull/28 Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-02-06 17:37:37 +00:00
dcorbacho	06186065b4	Option to aggregate channel, queue and connection metrics `prometheus.enable_metric_aggregation = true` rabbitmq-prometheus#26	2020-01-10 16:35:50 +01:00
Gerhard Lazu	89efb964d9	Convert raft_entry_commit_latency to seconds & be explicit about unit This is a follow-up to https://github.com/rabbitmq/ra/pull/160 Had to introduce mf_convert/3 so that METRICS_REQUIRING_CONVERSIONS proplist does not clash with METRICS_RAW proplists that have the same number of elements. This is begging to be refactored, but I know that @dcorbacho is working on https://github.com/rabbitmq/rabbitmq-prometheus/issues/26 Also modified the RabbitMQ-Quorum-Queues-Raft dashboard Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-01-07 16:20:59 +00:00
Gerhard Lazu	b8893afcde	Add auto-generated test rabbitmq_management.schema	2019-12-03 11:27:26 +00:00
Michael Klishin	3aed601336	A typo	2019-11-26 12:25:08 +03:00
Michael Klishin	6e17eeb3c5	Update this test to use a consumer in a separate process	2019-11-26 12:23:40 +03:00
Gerhard Lazu	f550aa0706	Fix queue_metrics references Some properties had queue_ appended, while others used messages_ instead of message_. This meant that metrics such as rabbitmq_queue_consumers were not reported correctly, as captured in https://github.com/rabbitmq/rabbitmq-prometheus/issues/9#issuecomment-558233464 The test needs fixing before this can be merged, it's currently failing with: $ make ct-rabbit_prometheus_http t=with_metrics:metrics_test == rabbit_prometheus_http_SUITE == * [with_metrics] rabbit_prometheus_http_SUITE > with_metrics {error, {shutdown, {gen_server,call, [<0.245.0>, {call, {'basic.cancel',<<"amq.ctag-uHUunE5EoozMKYG8Bf6s1Q">>, false}, none,<0.252.0>}, infinity]}}} Closes #19	2019-11-26 07:47:22 +00:00
Michael Klishin	b70a8da7f0	Expose endpoint path configuration, references #8	2019-09-26 13:39:21 +03:00
Michael Klishin	b03dfa2dd2	New style configuration schema for listeners Closes #8.	2019-09-26 13:08:36 +03:00
Gerhard Lazu	2b73981ab1	Fix build & identity info metrics Improve pattern matching used in tests so that we don't match partial metric names. [#167846096]	2019-09-04 13:21:50 +01:00
Gerhard Lazu	5781130b61	Use the correct metric types & capture perspective when naming Some metrics were of type gauge while they should have been of type counter. Thanks @brian-brazil for making the distinction clear. This is now captured as a comment above the metric definitions. Because all metrics are from RabbitMQ's perspective, cached for up to 5 seconds by default (configurable), we prepend `rabbitmq_` to all metrics emitted by this collector. While Some metrics are for Erlang (erlang_), Mnesia (schema_db_) or the System (io_), they are all observed & cached by RabbitMQ, hence the prefix. This is the last PR which started in the context of prometheus/docs#1414 [#167846096]	2019-09-04 11:49:48 +01:00
Gerhard Lazu	aafc4c026b	Revert erlang_uptime_seconds to gauge, not counter We care about its value rather than the rate of change. [#167846096]	2019-09-03 19:56:59 +01:00
Gerhard Lazu	fbc945f710	Convert all time metrics to seconds This started in the context of prometheus/docs#1414, specifically https://github.com/prometheus/docs/pull/1414#issuecomment-524250746 [#167846096]	2019-09-03 17:17:50 +01:00
Gerhard Lazu	98e488f1c4	Use standard naming for metrics expected from the client library As described in https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics. Until prometheus.erl has the prometheus_process_collector functionality built-in - this may not happen -, we are exposing a subset of those metrics via rabbitmq_core_metrics_collector, so we are going to stick to the expected naming conventions. This commit supercedes the thought process captured in `1e5f4de4cb` [#167846096]	2019-09-03 15:31:55 +01:00
Gerhard Lazu	1e5f4de4cb	Rename process-related metrics to stay closer to conventions While `process_open_fds` would have been ideal, because the value is cached within RabbitMQ, and computed differently across platforms, it is important to keep the distinction from, say, what the kernel reports just-in-time. I am also capturing the Erlang context by adding `erlang_` to the relevant metrics. The full context is: RabbitMQ observed this Erlang VM process metric to be X, so this is why some metrics are prefixed with `rabbitmq_erlang_process_` Because there is a difference betwen what RabbitMQ limits are set to, e.g. `rabbitmq_memory_used_limit_bytes`, vs. what RabbitMQ reports about the Erlang process, e.g. `rabbitmq_erlang_process_memory_used_bytes`. This is the best that we can do while staying honest about what is being reported. cc @brian-brazil [#167846096]	2019-09-03 12:30:48 +01:00
Gerhard Lazu	2e686f1131	Continue updating RabbitMQ-Overview dashboard to use the new info metric [#167846096]	2019-08-27 17:11:41 +01:00
Gerhard Lazu	e2be7193ff	Use a higher config_port when testing Otherwise it will clash with docker-compose-overview.yml ports	2019-08-15 16:40:19 +01:00
Gerhard Lazu	052d92c74b	Replace global labels with build_info & identity_info metrics This started in the context of prometheus/docs#1414, specifically https://github.com/prometheus/docs/pull/1414#issuecomment-520505757 Rather than labelling all metrics with the same label, we are introducing 2 new metrics: rabbitmq_build_info & rabbitmq_identity_info. I suspect that we may want to revert deadtrickster/prometheus.erl#91 when we agree that the proposed alternative is better. We are yet to see through changes in Grafana dashboards. I am most interested in how the updated queries will look like and, more importantly, if we will have the same panels as we do now. More commits to follow shortly, wanted to get this out the door first. In summary, this commit changes: # TYPE erlang_mnesia_held_locks gauge # HELP erlang_mnesia_held_locks Number of held locks. erlang_mnesia_held_locks{node="rabbit@920f1e3272af",cluster="rabbit@920f1e3272af",rabbitmq_version="3.8.0-alpha.806",erlang_version="22.0.7"} 0 # TYPE erlang_mnesia_lock_queue gauge # HELP erlang_mnesia_lock_queue Number of transactions waiting for a lock. erlang_mnesia_lock_queue{node="rabbit@920f1e3272af",cluster="rabbit@920f1e3272af",rabbitmq_version="3.8.0-alpha.806",erlang_version="22.0.7"} 0 ... To this: # TYPE erlang_mnesia_held_locks gauge # HELP erlang_mnesia_held_locks Number of held locks. erlang_mnesia_held_locks 0 # TYPE erlang_mnesia_lock_queue gauge # HELP erlang_mnesia_lock_queue Number of transactions waiting for a lock. erlang_mnesia_lock_queue 0 ... # TYPE rabbitmq_build_info untyped # HELP rabbitmq_build_info RabbitMQ & Erlang/OTP version info rabbitmq_build_info{rabbitmq_version="3.8.0-alpha.809",prometheus_plugin_version="3.8.0-alpha.809-2019.08.15",prometheus_client_version="4.4.0",erlang_version="22.0.7"} 1 # TYPE rabbitmq_identity_info untyped # HELP rabbitmq_identity_info Node & cluster identity info rabbitmq_identity_info{node="rabbit@bc7aeb0c2564",cluster="rabbit@bc7aeb0c2564"} 1 ... [#167846096]	2019-08-15 16:00:29 +01:00
Gerhard Lazu	4aa3871194	Use different names for *_process_reductions_total metrics It is invalid to have multiple metrics with the same name, TYPE & HELP, but differing labels. [#167846096]	2019-08-14 16:17:48 +01:00
Gerhard Lazu	75ecd6af1d	Fix test that fails when the metric is empty	2019-08-14 12:44:29 +01:00
Gerhard Lazu	e218ea5ea2	Reorder elements in metric names & improve naming bytes / packets must come before _total Explaing element order difference in TOTALS vs the metrics above [#167846096]	2019-08-13 19:15:12 +01:00
Gerhard Lazu	f1043134f4	Fix test failure message description	2019-08-08 16:58:01 +01:00
Diana Corbacho	82d858719d	Label all metrics with Erlang and RMQ version [#166413229]	2019-08-05 09:40:51 +01:00
Gerhard Lazu	805dd5e3b2	Enable quorum queue feature flag when runnin e2e metrics tests Otherwise tests will fail in CI: https://ci.rabbitmq.com/teams/main/pipelines/server-release:v3.8.x-mixed-versions/jobs/test-rabbitmq-prometheus/builds/20 Remove unused rabbit_ct_client_helpers imports	2019-06-26 11:03:28 +01:00
Gerhard Lazu	5e280c0281	Add first version of RabbitMQ Raft metrics Depends on https://github.com/rabbitmq/ra/tree/metrics_tweaks & https://github.com/rabbitmq/rabbitmq-server/tree/qq_metrics_tweak [#166819045]	2019-06-20 20:11:31 +01:00
Gerhard Lazu	2645082738	Finish Erlang Distribution Grafana dashboard Includes Erlang node to colour pinning Adds a few make targets to help with docker-compose repetitive commands & Grafana dashboard updates. Split Overview & Distribution Docker deployments re deadtrickster/prometheus.erl#92 [finishes #166004512]	2019-05-29 18:19:09 +01:00

1 2

56 Commits