This allows including additional applications or third party
plugins when creating a release, running the broker locally,
or just building from the top-level Makefile.
To include Looking Glass in a release, for example:
$ make package-generic-unix ADDITIONAL_PLUGINS="looking_glass"
A Docker image can then be built using this release and will
contain Looking Glass:
$ make docker-image
Beware macOS users! Applications such as Looking Glass include
NIFs. NIFs must be compiled in the right environment. If you
are building a Docker image then make sure to build the NIF
on Linux! In the two steps above, this corresponds to Step 1.
To run the broker with Looking Glass available:
$ make run-broker ADDITIONAL_PLUGINS="looking_glass"
This commit also moves Looking Glass dependency information
into rabbitmq-components.mk so it is available at all times.
The configuration remains the same for the end-user. The only exception
is the log root directory: it is now set through the `log_root`
application env. variable in `rabbit`. People using the Cuttlefish-based
configuration file are not affected by this exception.
The main change is how the logging facility is configured. It now
happens in `rabbit_prelaunch_logging`. The `rabbit_lager` module is
removed.
The supported outputs remain the same: the console, text files, the
`amq.rabbitmq.log` exchange and syslog.
The message text format slightly changed: the timestamp is more precise
(now to the microsecond) and the level can be abbreviated to always be
4-character long to align all messages and improve readability. Here is
an example:
2021-03-03 10:22:30.377392+01:00 [dbug] <0.229.0> == Prelaunch DONE ==
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Starting RabbitMQ 3.8.10+115.g071f3fb on Erlang 23.2.5
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Licensed under the MPL 2.0. Website: https://rabbitmq.com
The example above also shows that multiline messages are supported and
each line is prepended with the same prefix (the timestamp, the level
and the Erlang process PID).
JSON is also supported as a message format and now for any outputs.
Indeed, it is possible to use it with e.g. syslog or the exchange. Here
is an example of a JSON-formatted message sent to syslog:
Mar 3 11:23:06 localhost rabbitmq-server[27908] <0.229.0> - {"time":"2021-03-03T11:23:06.998466+01:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","meta":{"domain":"rabbitmq.prelaunch","file":"src/rabbit_prelaunch_logging.erl","gl":"<0.228.0>","line":311,"mfa":["rabbit_prelaunch_logging","configure_logger",1],"pid":"<0.229.0>"}}
For quick testing, the values accepted by the `$RABBITMQ_LOGS`
environment variables were extended:
* `-` still means stdout
* `-stderr` means stderr
* `syslog:` means syslog on localhost
* `exchange:` means logging to `amq.rabbitmq.log`
`$RABBITMQ_LOG` was also extended. It now accepts a `+json` modifier (in
addition to the existing `+color` one). With that modifier, messages are
formatted as JSON intead of plain text.
The `rabbitmqctl rotate_logs` command is deprecated. The reason is
Logger does not expose a function to force log rotation. However, it
will detect when a file was rotated by an external tool.
From a developer point of view, the old `rabbit_log*` API remains
supported, though it is now deprecated. It is implemented as regular
modules: there is no `parse_transform` involved anymore.
In the code, it is recommended to use the new Logger macros. For
instance, `?LOG_INFO(Format, Args)`. If possible, messages should be
augmented with some metadata. For instance (note the map after the
message):
?LOG_NOTICE("Logging: switching to configured handler(s); following "
"messages may not be visible in this log output",
#{domain => ?RMQLOG_DOMAIN_PRELAUNCH}),
Domains in Erlang Logger parlance are the way to categorize messages.
Some predefined domains, matching previous categories, are currently
defined in `rabbit_common/include/logging.hrl` or headers in the
relevant plugins for plugin-specific categories.
At this point, very few messages have been converted from the old
`rabbit_log*` API to the new macros. It can be done gradually when
working on a particular module or logging.
The Erlang builtin console/file handler, `logger_std_h`, has been forked
because it lacks date-based file rotation. The configuration of
date-based rotation is identical to Lager. Once the dust has settled for
this feature, the goal is to submit it upstream for inclusion in Erlang.
The forked module is calld `rabbit_logger_std_h` and is based
`logger_std_h` in Erlang 23.0.
It uses the commercial edition of RabbitMQ, requires a valid Tanzu
Network account. Learn more: https://rabbitmq.com/tanzu
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
So that clusters with the same rabbitmq_cluster name in different K8s namespaces don't clash
Namespace filter comes first, because the order of the layers is namespace -> cluster -> node
Tested with the latest 3.9.0 dev build
We had to account for plugin changes from .ez to directories & the management.load_definitions deprecation which would prevent a node from booting (fixed in 07a0dd7438). This commit didn't make it through the 3.9.x pipeline yet, so there is no 3.9.0 dev build with this fix yet. The simplest fix is to drop `management.` from the load_definitions config.
The next manual step is to generate all dashboards using e.g. `make RabbitMQ-Overview.json > ~/Downloads/RabbitMQ-Overview.json` and upload them to https://grafana.com/orgs/rabbitmq
Great contribution @ansd, thank you 👏🏻
Regardless of the value of `return_per_object_metrics`, this endpoint
always returns per-object metrics. This allows scraping both endpoints
at different intervals or scraping per-object metrics only during
debugging.
Co-authored-by: Mirah Gary <mgary@vmware.com>
In the case where there are 0 channels (and as such 0 publishers), the
dashboard reports there are actually `n` publishers in an `n`-node
cluster. This changes the calculation of publishers to be number of
channels (which is always known) minus the number of consumers (which is
always known).
Context: we want to move away from environment variables and use either
config files or env files (such as the rabbitmq-env.conf).
Since .erlang.cookie is neither, the official RabbitMQ Docker image
handles this by writing the value from the RABBITMQ_ERLANG_COOKIE env
var into the file if it does not exist. The problem is that if this file
exists, and the value is different from the RABBITMQ_ERLANG_COOKIE env
var, CLI tools will not be able to communicate with the rabbit node, as
described here: https://github.com/rabbitmq/rabbitmq-cli/issues/443
The only gotcha is that this file must be owned by the user, and
privileges should not be too open (git should have captured this). If
not, RabbitMQ will fail to boot. This is somewhat similar to how OpenSSH
reacts when private key permissions are too open.
re https://github.com/docker-library/rabbitmq/pull/422#issuecomment-650074731
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
open http://localhost:3000/dashboards # select Erlang-Distribution
e > Metrics; General > Description # when on Data buffered in the distribution links queue
Save Dashboard > Export > +Export for sharing externally > Save to file
pwd
/Users/gerhard/github.com/rabbitmq/3.9.x/deps/rabbitmq_prometheus
vimdiff docker/grafana/dashboards/Erlang-Distribution.json ~/Downloads/Erlang-Distribution*.json
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Have to force prometheus.erl to a version that does not have this
feature, otherwise the test would succeed.
pwd
/Users/gerhard/github.com/rabbitmq/3.9.x/deps/rabbitmq_prometheus
rm -fr ../prometheus.erl
make tests
open logs/index.html
Pull request content:
Expose & visualise distribution buffer busy limit - zdbbl
> This will be closed after TGIR S01E04 gets recorded.
> The goal is to demonstrate how to do this, and then let an external contributor have a go.
Before this patch, the **Data buffered in the distribution links queue** graph was empty.
This is what that graph looks like after this gets applied:

## References
- [RabbitMQ Runtime Tuning - Inter-node Communication Buffer Size](https://www.rabbitmq.com/runtime.html#distribution-buffer)
- [erl +zdbbl](https://erlang.org/doc/man/erl.html#+zdbbl)
Fixes#39
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
They are the product name & version. They are added at the beginning of
`rabbitmq_build_info` if they are set. If they are not, the content of
`rabbitmq_build_info` is the same as before.
This will make future diffs smaller
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
(cherry picked from commit e1a08d6ae752181177cbcc411219a8dd780359d2)
Hard-coding number of dashboards, because there is no API for
https://grafana.com/orgs/rabbitmq/dashboards, and setting up our own
JSON endpoint to return this information would be overkill.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Supporting n-1 Erlang/OTP versions: v21 & v22 and taking into account
RabbitMQ's Erlang Version Requirements:
https://www.rabbitmq.com/which-erlang.html
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Otherwise the singlestat panel will return a 'Only queries that return
single series/table is supported' error if the node changed some
properties, like the instance label because the IP changed.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Given that we test Linux distributions and upgrades after this artefact
is produced, it's quicker to get the latest RabbitMQ generic-unix dev
build into a Docker image without waiting on tests that are less
relevant & useful for the audience of this image.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Hiding "all other" values stopped working since Grafana v6.6.1, need to
be explicit about which values should be hidden. Picked up a few other
changes from Grafana after Save JSON to file.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
It activates and extra graph on the RabbitMQ-Overview dashboard and
let's be honest - why use Quorum Queues if the workload didn't care
whether the broker received the message? They go together, seriously!
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Since metrics are now aggregated by default, it made more sense to use
the inverse meaning of disabling aggregation, and call it a positive and
explicit action: return_per_object_metrics.
Naming pair: @michaelklishin
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Having talked to @michaelklishin we've decided to enable metrics
aggregation by default so that RabbitMQ nodes with many objects serve
the same amount of metrics quickly rather than taking many seconds and
transferring many MBs of data on every scrape.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
We want to keep the same metric type regardless whether we aggregate or
don't. If we had used a histogram type, considering the ~12 buckets that
we added, it would have meant 12 extra metrics per queue which would
have resulted in an explosion of metrics. Keeping the gauge type and
aggregating latencies across all members.
re https://github.com/rabbitmq/rabbitmq-prometheus/pull/28
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
This is a follow-up to https://github.com/rabbitmq/ra/pull/160
Had to introduce mf_convert/3 so that METRICS_REQUIRING_CONVERSIONS
proplist does not clash with METRICS_RAW proplists that have the same
number of elements. This is begging to be refactored, but I know that
@dcorbacho is working on https://github.com/rabbitmq/rabbitmq-prometheus/issues/26
Also modified the RabbitMQ-Quorum-Queues-Raft dashboard
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Grafan will keep failing with the following error message otherwise:
failed to load dashboard from /dashboards/__inputs.json Dashboard title cannot be empty
It still puts a significant load on the host, but any lower and we won't
see any change in the Uncommited log entries graph, and too little
variation in the Log entry commit latency.
Well, almost. flat-statusmap-panel v0.1.1 breaks on Grafana v6.5.0.
Since it's already been mentioned in
https://github.com/flant/grafana-statusmap/issues/76 for a different
reason, let's wait until it this is addressed.
Most keys load fine, but if one doesn't, everything fails. The package
will still verify OK even if we have just a subset of keys installed, be
more permissive...
Now:
autocomplete ac | Configure shell for autocompletion - eval "$(gmake autocomplete)"
clean-docker cd | Clean all Docker containers & volumes
cto cto | Interact with all containers via a top-like utility
dist-tls dt | Make Erlang-Distribution panels come alive - HIGH LOAD
docker-image di | Build & push Docker image to Docker Hub
docker-image-build dib | Build Docker image locally - make tests
docker-image-bump diu | Bump Docker image version across all docker-compose-* files
docker-image-push dip | Push local Docker image to Docker Hub
docker-image-run dir | Run container with local Docker image
dockerhub-login dl | Login to Docker Hub as pivotalrabbitmq
down d | Stop all containers
find-latest-otp flo | Find latest OTP version archive + sha1
metrics m | Run all metrics containers
overview o | Make RabbitMQ Overview panels come alive
preview-readme pre | Preview README & live reload on edit
qq | Make RabbitMQ-Quorum-Queues-Raft panels come alive - HIGH LOAD
Before:
-------------------------------------------------------------------------------------------------
autocomplete ac | Configure shell for autocompletion - eval "$(gmake autocomplete)"
-------------------------------------------------------------------------------------------------
clean-docker cd | Clean all Docker containers & volumes
-------------------------------------------------------------------------------------------------
cto cto | Interact with all containers via a top-like utility
-------------------------------------------------------------------------------------------------
dockerhub-login dl | Login to Docker Hub as pivotalrabbitmq
-------------------------------------------------------------------------------------------------
docker-image di | Build & push Docker image to Docker Hub
-------------------------------------------------------------------------------------------------
docker-image-build dib | Build Docker image locally - make tests
-------------------------------------------------------------------------------------------------
docker-image-bump diu | Bump Docker image version across all docker-compose-* files
-------------------------------------------------------------------------------------------------
docker-image-push dip | Push local Docker image to Docker Hub
-------------------------------------------------------------------------------------------------
docker-image-run dir | Run container with local Docker image
-------------------------------------------------------------------------------------------------
down d | Stop all containers
-------------------------------------------------------------------------------------------------
find-latest-otp flo | Find latest OTP version archive + sha1
-------------------------------------------------------------------------------------------------
metrics m | Run all metrics containers
-------------------------------------------------------------------------------------------------
overview o | Make RabbitMQ Overview panels come alive
-------------------------------------------------------------------------------------------------
dist-tls dt | Make Erlang-Distribution panels come alive - HIGH LOAD
-------------------------------------------------------------------------------------------------
qq | Make RabbitMQ-Quorum-Queues-Raft panels come alive - HIGH LOAD
-------------------------------------------------------------------------------------------------
Some properties had queue_ appended, while others used messages_ instead
of message_. This meant that metrics such as rabbitmq_queue_consumers
were not reported correctly, as captured in https://github.com/rabbitmq/rabbitmq-prometheus/issues/9#issuecomment-558233464
The test needs fixing before this can be merged, it's currently failing with:
$ make ct-rabbit_prometheus_http t=with_metrics:metrics_test
== rabbit_prometheus_http_SUITE ==
* [with_metrics]
rabbit_prometheus_http_SUITE > with_metrics
{error,
{shutdown,
{gen_server,call,
[<0.245.0>,
{call,
{'basic.cancel',<<"amq.ctag-uHUunE5EoozMKYG8Bf6s1Q">>,
false},
none,<0.252.0>},
infinity]}}}
Closes#19
It captures the Quorum-Queues Raft, so let's be specific, especially
since we know that there will be other Raft implementations in RabbitMQ,
not just Quorum Queues.
[#166926415]
It is essential to know which RabbitMQ & Erlang/OTP version the cluster
is running, as well as how many nodes there are in the cluster. We now
have a table which lists this information, right under all singlestat
panels.
The singlestat panels have been re-organized to make room for 2 new
ones: Nodes & Publishers. Classic & Quorum Queues would be great to
have, as would VHosts. The last singlestats that I would add are Alarms
& Partitions. This would bring the total number of singlestat panels to
14 (we currently have 10). While 14 feel overwhelming, it captures all
the important information that I believe is worth knowing about any
RabbitMQ cluster.
All message-related sections now display 2 graph panels instead of 3.
While 3 panels look good on 27" screens, they don't work as well on 15"
screens, which is what the majority will be using. Also the 3rd panel
would always be for anti-pattern graphs (e.g. unroutable messages,
polling operations, etc.) and would be mostly empty in the majority of
cases. Fitting fewer panels per row not only helps focusing and
understanding what is being displayed, but it also makes it easier to
compare when viewing 2 panels side-by-side, on 27" screens. Nodes &
churn sections still have 3 panels, which works well when 1 panel is
more important than the others. The compromise that we need to make is
between giving enough horizontal space to equally important panels vs
making the dashboard page too long. RabbitMQ-Overview has always been a
comprehensive dashboard which captures a lot of imformation, it was
always tough balancing the important vs the complete.
[finishes #167836027]
9.313226 GiB is a lot harder to read than 9.31 GiB, and therefore less
useful. Observing other people use this made it obvious that limiting
the precision was the human-friendly thing to do.
* explains source of metrics via row names
* makes tables slightly wider to mitigate long names line wrapping
* do not limit entries in tables, refresh resets table pagination
[finishes #168734621]
The yardstick for all Grafana dashboards should be 1920 x 1200, the
screen format most common in our team. If the dashboards look good on
our screens, they will look good on other screens too. Smalle
resolutions won't look too crammed, and bigger resolutions can be split
in half (e.g. 27" iMacs).
Some take-aways from optimising the layout of this dashboard:
* limit horizontal graph panels to 3
* limit horizontal panels to 2 if the information is dense (e.g. table + graph)
* use the same width for graph panels that need comparing, stack vertically
To get an import-friendly RabbitMQ Overview dashboard, run the following
command:
make RabbitMQ-Overview.json
On macOS, to send this output to clipboard:
make RabbitMQ-Overview.json | pbcopy
This is the preferred alternative to
9aa22e1895
See
dae49b5c08
for more context. cc @mkuratczyk
This commit introduces a few other somewhat related changes:
* BASH autocompletion for make targets - make ac
* descriptions for all custom targets - make h
* continuous feedback loops for ac & h targets - make CFac
I would really like to see some of the above features be part of
erlang.mk. What do you think @essen? Anything in particular that you
would like me to PR?
@dumbbell, my other Make partner-in-crime, may be interested in
discussing the above ; )
== LINKS
* https://medium.com/@lavieenroux20/how-to-win-friends-influence-people-and-autocomplete-makefile-targets-e6cd228d856d
* https://github.com/Bash-it/bash-it/blob/master/completion/available/makefile.completion.bash
While __inputs are required for the dashboards to work in environments
where Prometheus is not the default datasource, it breaks the local
development flow. In other words,
9aa22e1895
prevents `make metrics overview` from working as designed.
We are going to add shortly a simple way of converting the local
dashboards into a format that can be imported in Grafana and will work
when Prometheus is not the default datasource (e.g. when using
https://github.com/coreos/kube-prometheus)
Long-term, these dashboards will be available via grafana.com, which is
the preferred way of consuming them.
cc @mkuratczyk
Some metrics were of type gauge while they should have been of type
counter. Thanks @brian-brazil for making the distinction clear. This is
now captured as a comment above the metric definitions.
Because all metrics are from RabbitMQ's perspective, cached for up to 5
seconds by default (configurable), we prepend `rabbitmq_` to all metrics
emitted by this collector. While Some metrics are for Erlang (erlang_),
Mnesia (schema_db_) or the System (io_), they are all observed & cached
by RabbitMQ, hence the prefix.
This is the last PR which started in the context of prometheus/docs#1414
[#167846096]
As described in
https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics.
Until prometheus.erl has the prometheus_process_collector functionality
built-in - this may not happen -, we are exposing a subset of those
metrics via rabbitmq_core_metrics_collector, so we are going to stick to
the expected naming conventions.
This commit supercedes the thought process captured in
1e5f4de4cb
[#167846096]
While `process_open_fds` would have been ideal, because the value is
cached within RabbitMQ, and computed differently across platforms, it is
important to keep the distinction from, say, what the kernel reports
just-in-time.
I am also capturing the Erlang context by adding `erlang_` to the
relevant metrics. The full context is: RabbitMQ observed this Erlang VM
process metric to be X, so this is why some metrics are prefixed with
`rabbitmq_erlang_process_`
Because there is a difference betwen what RabbitMQ limits are set to,
e.g. `rabbitmq_memory_used_limit_bytes`, vs. what RabbitMQ reports about
the Erlang process, e.g. `rabbitmq_erlang_process_memory_used_bytes`.
This is the best that we can do while staying honest about what is being
reported. cc @brian-brazil
[#167846096]