Commit Graph

405 Commits

Author SHA1 Message Date
Alexey Lebedeff 7676ed9685 Use `rabbitmq_cluster_` prefix for cluster-wide metrics 2021-11-24 16:49:43 +01:00
Michael Klishin 38d64a54b1
Wording 2021-11-24 14:19:57 +03:00
Michael Klishin a1c0cd3785
Wording 2021-11-24 14:02:10 +03:00
Alexey Lebedeff 6e3012aaf9 Add optional metrics for vhost and exchange count
These can make sense in some scenarios, e.g. when vhost/exchanges are
+created using self-service automation
2021-11-24 11:00:41 +01:00
Luke Bakken bd2858c208
Compile the regex 2021-11-22 08:30:17 -08:00
dcorbacho a7c9b66653 Use own key to exclude queues 2021-11-16 16:53:17 +01:00
dcorbacho 242cb539b3 Exclude queues from aggregated metrics in prometheus collector
Uses same exclusion pattern as the management agent
2021-11-16 10:23:39 +01:00
Alexey Lebedeff 8598c51579 Pre-render prometheus labels
This makes per-object metrics twice as fast.

Depends on https://github.com/deadtrickster/prometheus.erl/pull/137
2021-11-09 13:04:39 +01:00
Alexey Lebedeff b9ebfb8980 Fix ssl port handling in prometheus plugin
All ssl options were stored in the same proplist, and the code was
then trying to determine whether an option actually belongs to ranch
ssl options or not.

Some keys landed in the wrong place, like it did happen in #2975 -
different ports were mentioned in listener config (default at
top-level, and non-default in `ssl_opts`). Then `ranch` and
`rabbitmq_web_dispatch` were treating this differently.

This change just moves all ranch ssl opts into proper place using
schema, removing any need for guessing in code.

The only downside is that advanced config compatibility is broken.
2021-10-20 14:55:33 +02:00
Michael Klishin 3826a0df25
Compile #3561 2021-10-13 01:27:16 +03:00
Michael Klishin 670f240537
Compile #3561 2021-10-12 20:17:51 +03:00
Johannes Würbach 84de860b4c
feat(prom): expose cluster id in identity 2021-10-12 15:43:46 +02:00
Alexey Lebedeff 989a299720 Emit identity info in prometheus /metrics/detailed endpoint
This is needed to make filtering metrics on a cluster name possible.
2021-09-28 19:35:02 +02:00
Alexey Lebedeff 5501d07b8b Use rabbitmq_ct_helpers to allocate prometheus port
This test always used standard 15692 before, which were causing
conflicts with e.g. local `make run-broker`.
2021-09-22 15:23:35 +02:00
Alexey Lebedeff 4bb2262140 Allow selective querying for prometheus plugin 2021-09-20 14:59:17 +02:00
Michael Klishin 47b20e8f7c
Prometheus: alarm-related metric naming 2021-08-17 20:58:24 +03:00
Ilya Khaprov 9fed915192
Add alarms prometheus collector.
close #2653
2021-08-16 20:32:29 +02:00
Gerhard Lazu 62d82e1660
Break down metrics by node in all RabbitMQ-Stream pie charts
Otherwise we won't be able to see which nodes are running "hot"

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-08-11 13:39:30 +01:00
David Ansari 4b774db5c1 Use same threshold color for "Errors since boot" 2021-08-02 17:05:17 +02:00
David Ansari c99ee6961e Use same colorMode in all RabbitMQ-Stream panels
Co-authored-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-08-02 13:33:00 +02:00
David Ansari ea18c31288 Make RabbitMQ-Stream dashboard work via ConfigMap
Before this commit, importing the dashboard via ConfigMap as seen in
1eb1dc618e
didn't work because DS_PROMETHEUS variable was undefined in Grafana.

Related to https://github.com/rabbitmq/rabbitmq-server/pull/3250

Co-authored-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-08-02 13:12:48 +02:00
Gerhard Lazu 65afbb931b
Ensure RabbitMQ-Stream dashboard works correctly after import
This breaks the docker-compose integration, but we need to move away
from it anyways, the whole dev flow needs revisiting after our focus on
K8s.

$__rate_interval does not work with irate, dropping it in favour of 60s,
same as all other dashboards.

This is a follow-up to https://github.com/rabbitmq/rabbitmq-server/pull/3250

Thanks @ansd for mentioning about the post-import issues.

It was uploaded as https://grafana.com/api/dashboards/14798/revisions/3/download

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-07-30 13:53:02 +01:00
Gerhard Lazu 35a6369327
Restart stream-perf-test on-failure
This handles the scenario where rmq2 is not available, and
stream-perf-test exits with a non-zero exit code. Good spot @ansd!

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-07-30 11:25:36 +01:00
David Ansari 47d572908d Convert string to integer for ulimits.nofile
Before this commit:

> make overview metrics
services.rmq1.ulimits.nofile.hard must be a integer
make: *** [Makefile:68: overview] Error 15

Accoring to the docs
https://docs.docker.com/compose/compose-file/compose-file-v3/#ulimits
this must be an integer.
2021-07-30 09:46:38 +02:00
Gerhard Lazu 6f5c4118ea
Publish RabbitMQ-Stream dashboard to grafana.com
Removed the Dockerfile and slimmed down the Makefile, all of this is now
handled by https://github.com/rabbitmq/rabbitmq-server/blob/master/.github/workflows/oci.yaml
cc @Zerpet @pjk25

More details here (including the steps used to publish to grafana.com):
https://github.com/rabbitmq/release-engineering/issues/11#issuecomment-887627938

I don't want to hold up this PR, will invest in automating the
steps described in the previous link another time. Time to 🚀

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-07-29 19:34:05 +01:00
Philip Kuryloski b26289cb47 Adjust rabbitmq_prometheus test suite timeouts in bazel 2021-07-22 11:00:14 +02:00
Gerhard Lazu 66ef8adfc8
Fix accept dependency in rabbitmq_prometheus
It's a runtime dependency, not a build dependency.

This is a fix and should be backported to v3.9.x, after rc.2 and just
before the final release. Would you disagree @dumbbell?

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-07-21 13:38:54 +01:00
Philip Kuryloski 8f9de08de7 Also assert no missing suites for all other deps 2021-07-12 18:05:55 +02:00
dcorbacho b636ad2565 Rename protocol error counters to _total 2021-06-30 12:46:41 +02:00
dcorbacho c9305d948a
Use number of publishing channels as global publishers in amqp091 2021-06-29 08:10:42 +01:00
Philip Kuryloski 8c7e7e0656 Revert "Default all `rabbitmq_integration_suite` to flaky in bazel"
This reverts commit 70cb8147b2.
2021-06-23 20:53:14 +02:00
Gerhard Lazu c7971252cd
Global counters per protocol + protocol AND queue_type
This way we can show how many messages were received via a certain
protocol (stream is the second real protocol besides the default amqp091
one), as well as by queue type, which is something that many asked for a
really long time.

The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.

This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
https://github.com/rabbitmq/rabbitmq-server/pull/3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through. We continued working on this with @dumbbell, and
the most important changes are captured in
https://github.com/rabbitmq/seshat/pull/1.

We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, `rabbitmq_global_`, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pairs @kjnilsson @dcorbacho @dumbbell
(this is multiple commits squashed into one)

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-06-22 14:14:21 +01:00
Philip Kuryloski 70cb8147b2 Default all `rabbitmq_integration_suite` to flaky in bazel
Most tests that can start rabbitmq nodes have some chance of
flaking. Rather than chase individual flakes for now, this commit
changes the default (though it can still be overriden, as is the case
for config_scheme_SUITE in many places, since I have yet to see that
particular suite flake).
2021-06-21 16:10:38 +02:00
Philip Kuryloski 30f9a95b9f Add dialyze for remaning tier-1 plugins 2021-06-01 10:19:10 +02:00
Philip Kuryloski a3dbdecb8c Mark //deps/rabbitmq_prometheus:rabbit_prometheus_http_SUITE flaky 2021-05-21 18:32:20 +02:00
Philip Kuryloski 98e71c45d8 Perform xref checks on many tier-1 plugins 2021-05-21 12:03:22 +02:00
Philip Kuryloski e6df6615e1 Futher bazel file refactoring and deduplication 2021-05-11 16:15:33 +02:00
Philip Kuryloski b39cd342f2 buildifier formatting 2021-05-05 14:20:38 +02:00
Philip Kuryloski d61aa69039 Add rabbitmq_prometheus to bazel 2021-05-05 11:43:03 +02:00
Gerhard Lazu 1e5708b0c5
Fix Grafana dashboards when importing from URL
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-03-22 19:27:13 +00:00
Philip Kuryloski a63f169fcb Remove duplicate rabbitmq-components.mk and erlang.mk files
Also adjust the references in rabbitmq-components.mk to account for
post monorepo locations
2021-03-22 15:40:19 +01:00
kjnilsson 52f745dcde Update rabbitmq-components.mk
use v1.x branch of ra
2021-03-18 15:14:40 +00:00
Loïc Hoguin d5e3bdd623
Add ADDITIONAL_PLUGINS variable
This allows including additional applications or third party
plugins when creating a release, running the broker locally,
or just building from the top-level Makefile.

To include Looking Glass in a release, for example:

$ make package-generic-unix ADDITIONAL_PLUGINS="looking_glass"

A Docker image can then be built using this release and will
contain Looking Glass:

$ make docker-image

Beware macOS users! Applications such as Looking Glass include
NIFs. NIFs must be compiled in the right environment. If you
are building a Docker image then make sure to build the NIF
on Linux! In the two steps above, this corresponds to Step 1.

To run the broker with Looking Glass available:

$ make run-broker ADDITIONAL_PLUGINS="looking_glass"

This commit also moves Looking Glass dependency information
into rabbitmq-components.mk so it is available at all times.
2021-03-12 12:29:28 +01:00
Jean-Sébastien Pédron cdcf602749
Switch from Lager to the new Erlang Logger API for logging
The configuration remains the same for the end-user. The only exception
is the log root directory: it is now set through the `log_root`
application env. variable in `rabbit`. People using the Cuttlefish-based
configuration file are not affected by this exception.

The main change is how the logging facility is configured. It now
happens in `rabbit_prelaunch_logging`. The `rabbit_lager` module is
removed.

The supported outputs remain the same: the console, text files, the
`amq.rabbitmq.log` exchange and syslog.

The message text format slightly changed: the timestamp is more precise
(now to the microsecond) and the level can be abbreviated to always be
4-character long to align all messages and improve readability. Here is
an example:

    2021-03-03 10:22:30.377392+01:00 [dbug] <0.229.0> == Prelaunch DONE ==
    2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>
    2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>  Starting RabbitMQ 3.8.10+115.g071f3fb on Erlang 23.2.5
    2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>  Licensed under the MPL 2.0. Website: https://rabbitmq.com

The example above also shows that multiline messages are supported and
each line is prepended with the same prefix (the timestamp, the level
and the Erlang process PID).

JSON is also supported as a message format and now for any outputs.
Indeed, it is possible to use it with e.g. syslog or the exchange. Here
is an example of a JSON-formatted message sent to syslog:

    Mar  3 11:23:06 localhost rabbitmq-server[27908] <0.229.0> - {"time":"2021-03-03T11:23:06.998466+01:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","meta":{"domain":"rabbitmq.prelaunch","file":"src/rabbit_prelaunch_logging.erl","gl":"<0.228.0>","line":311,"mfa":["rabbit_prelaunch_logging","configure_logger",1],"pid":"<0.229.0>"}}

For quick testing, the values accepted by the `$RABBITMQ_LOGS`
environment variables were extended:
  * `-` still means stdout
  * `-stderr` means stderr
  * `syslog:` means syslog on localhost
  * `exchange:` means logging to `amq.rabbitmq.log`

`$RABBITMQ_LOG` was also extended. It now accepts a `+json` modifier (in
addition to the existing `+color` one). With that modifier, messages are
formatted as JSON intead of plain text.

The `rabbitmqctl rotate_logs` command is deprecated. The reason is
Logger does not expose a function to force log rotation. However, it
will detect when a file was rotated by an external tool.

From a developer point of view, the old `rabbit_log*` API remains
supported, though it is now deprecated. It is implemented as regular
modules: there is no `parse_transform` involved anymore.

In the code, it is recommended to use the new Logger macros. For
instance, `?LOG_INFO(Format, Args)`. If possible, messages should be
augmented with some metadata. For instance (note the map after the
message):

    ?LOG_NOTICE("Logging: switching to configured handler(s); following "
                "messages may not be visible in this log output",
                #{domain => ?RMQLOG_DOMAIN_PRELAUNCH}),

Domains in Erlang Logger parlance are the way to categorize messages.
Some predefined domains, matching previous categories, are currently
defined in `rabbit_common/include/logging.hrl` or headers in the
relevant plugins for plugin-specific categories.

At this point, very few messages have been converted from the old
`rabbit_log*` API to the new macros. It can be done gradually when
working on a particular module or logging.

The Erlang builtin console/file handler, `logger_std_h`, has been forked
because it lacks date-based file rotation. The configuration of
date-based rotation is identical to Lager. Once the dust has settled for
this feature, the goal is to submit it upstream for inclusion in Erlang.
The forked module is calld `rabbit_logger_std_h` and is based
`logger_std_h` in Erlang 23.0.
2021-03-11 15:17:36 +01:00
Michael Klishin f6e8320fc9
Merge branch 'otp-24-ranch' 2021-03-10 07:37:51 +03:00
dcorbacho 61f7b2a723 Update to ranch 2.0 2021-03-08 23:11:05 +01:00
Gerhard Lazu c18ad7a5b6
Fix colors for node names that include digits in Grafana dashboards
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-03-08 13:19:14 +00:00
Michael Klishin b6c4831e75
Bump Lager to 3.9.1 2021-03-04 04:36:39 +03:00
Loïc Hoguin 66ac1bf5e9
Bump observer_cli to 1.6.1
More responsive when the system is overloaded with file calls.
2021-03-01 21:55:27 +03:00
Michael Klishin 8fe3df9343
Upgrade Lager to 3.9.0 for OTP 24 compatibility
`lager_util:expand_path/1` use changes are
due to erlang-lager/lager#540
2021-02-26 00:52:15 +03:00