High VM memory watermark cannot actually be set to 'infinity' (and beyond :P)
but it can be returned as a fallback value. See #2733 for some additional
context.
We format 'infinity' as "100% of available memory". This seems to be
a reasonable way to do it because the status command will try to
present a final interpreted limit value.
Generally a limit of infinity won't be returned except very early in node
boot when the monitor(s) haven't yet started.
Per discussion with @evaskova.
Adds WORKSPACE.bazel, BUILD.bazel & *.bzl files for partial build & test with Bazel. Introduces a build-time dependency on https://github.com/rabbitmq/bazel-erlang
The configuration remains the same for the end-user. The only exception
is the log root directory: it is now set through the `log_root`
application env. variable in `rabbit`. People using the Cuttlefish-based
configuration file are not affected by this exception.
The main change is how the logging facility is configured. It now
happens in `rabbit_prelaunch_logging`. The `rabbit_lager` module is
removed.
The supported outputs remain the same: the console, text files, the
`amq.rabbitmq.log` exchange and syslog.
The message text format slightly changed: the timestamp is more precise
(now to the microsecond) and the level can be abbreviated to always be
4-character long to align all messages and improve readability. Here is
an example:
2021-03-03 10:22:30.377392+01:00 [dbug] <0.229.0> == Prelaunch DONE ==
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Starting RabbitMQ 3.8.10+115.g071f3fb on Erlang 23.2.5
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Licensed under the MPL 2.0. Website: https://rabbitmq.com
The example above also shows that multiline messages are supported and
each line is prepended with the same prefix (the timestamp, the level
and the Erlang process PID).
JSON is also supported as a message format and now for any outputs.
Indeed, it is possible to use it with e.g. syslog or the exchange. Here
is an example of a JSON-formatted message sent to syslog:
Mar 3 11:23:06 localhost rabbitmq-server[27908] <0.229.0> - {"time":"2021-03-03T11:23:06.998466+01:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","meta":{"domain":"rabbitmq.prelaunch","file":"src/rabbit_prelaunch_logging.erl","gl":"<0.228.0>","line":311,"mfa":["rabbit_prelaunch_logging","configure_logger",1],"pid":"<0.229.0>"}}
For quick testing, the values accepted by the `$RABBITMQ_LOGS`
environment variables were extended:
* `-` still means stdout
* `-stderr` means stderr
* `syslog:` means syslog on localhost
* `exchange:` means logging to `amq.rabbitmq.log`
`$RABBITMQ_LOG` was also extended. It now accepts a `+json` modifier (in
addition to the existing `+color` one). With that modifier, messages are
formatted as JSON intead of plain text.
The `rabbitmqctl rotate_logs` command is deprecated. The reason is
Logger does not expose a function to force log rotation. However, it
will detect when a file was rotated by an external tool.
From a developer point of view, the old `rabbit_log*` API remains
supported, though it is now deprecated. It is implemented as regular
modules: there is no `parse_transform` involved anymore.
In the code, it is recommended to use the new Logger macros. For
instance, `?LOG_INFO(Format, Args)`. If possible, messages should be
augmented with some metadata. For instance (note the map after the
message):
?LOG_NOTICE("Logging: switching to configured handler(s); following "
"messages may not be visible in this log output",
#{domain => ?RMQLOG_DOMAIN_PRELAUNCH}),
Domains in Erlang Logger parlance are the way to categorize messages.
Some predefined domains, matching previous categories, are currently
defined in `rabbit_common/include/logging.hrl` or headers in the
relevant plugins for plugin-specific categories.
At this point, very few messages have been converted from the old
`rabbit_log*` API to the new macros. It can be done gradually when
working on a particular module or logging.
The Erlang builtin console/file handler, `logger_std_h`, has been forked
because it lacks date-based file rotation. The configuration of
date-based rotation is identical to Lager. Once the dust has settled for
this feature, the goal is to submit it upstream for inclusion in Erlang.
The forked module is calld `rabbit_logger_std_h` and is based
`logger_std_h` in Erlang 23.0.
Add GitHub Actions workflows for Erlang/OTP 22.3 & 23.0.
The workflows run tests for each component that is now part of this
repo, with test suite parallelization specifically for the rabbit
erlang application.
... when no Mnesia directory is specified. The reason is that the
default behavior changed: if the node is unavailable and no Mnesia
directory is configured, we use the default directory.
Despite being the implicit default, the "/" virtual host
may be required in function assertions, so an explicit
merge can still be necessary.
Lets introduce a reusable macro module for this.
and add a VMware copyright notice.
We did not mean to make this code Incompatible with Secondary Licenses
as defined in [1].
1. https://www.mozilla.org/en-US/MPL/2.0/FAQ/
Not directly related to rabbitmq/rabbitmq-server#2321
but useful on its own. We've had requests for and discussions
about such commands in the past and now that Ranch supports
listener suspension, there aren't many reasons not to do it.
For a couple of reasons:
* The Observability, Monitoring and Health Checks group has grown too large
* Some commands in it clearly have to do with exploring effective node
or CLI tool configuration, not its dynamically changing operational state
Currently we have a --auto-complete magic argument which
does not show up in `help` and requires its argument to be
explcitly -- separated from the actual flag, which is
counter-intuitive.
This introduces a new autocomplete command which is delegated
to by --auto-complete, much like --help is simply a special
way of invoking the `help` command.
Closes#439
for evaluation of scripts in environments where standard input
redirection is not an option or problematic, e.g.
certain Windows environments.
Closes#438
* `help` printed an "Error:" at the top for no reason
* `help [command]` with a non-existent command did not offer a suggestion
like an attempt to invoke a non-existent command would
* Exit codes were not consistent
* `help --list-commands` had line break issues
With this change,
* `help` is consistent with --help
* `help [command]` is consistent with `[command] --help`
* If a command is not found, either during the execution flow
or `help [commnad]`, we consistently attempt a Jaro distance suggestion
* Successful or effectively successful exits from the `help`
command do not produce any error messages at the top
HiPE has been deprecated/only partially supported in Erlang 22
and will be removed completely in Erlang 24 next year.
Part of rabbitmq/rabbitmq-server#2392
Helps with troubleshooting hostname resolution behavior
on nodes and locally for CLI tools. This is obviously not meant
to be a replacement for existing tools such as dig, only
a way to quickly spot obvious irregularities, e.g. those
in environments that use custom Erlang inetrc files.
Per discussion @harshac.
The reason is that currently, the repository dispatch event only
triggers the workflow of the target repository's default branch (i.e.
master in our case).
This is ok for now, but this prevents us from using GitHub Actions with
release branches unfortunately.
* It requires a fully booted node, so not generally suitable for a Kubernetes readiness probe.
* It can produce false positives
* It is too intrusive and CPU-intensive to use at scale
* Most operators do not understand what it really does and when they learn about it,
consider it to be too opinionated and intrusive
Time for the One True Health Check™ to retire from duty.
Part of rabbitmq/rabbitmq-cli#426
During the 3.8.4 cycle we have backported `rabbit_env` to v3.8.x.
Instead of messing with env variable prefixing, it tries both
RABBITMQ_{VAR} and {VAR} environment variables. However,
in CLI tools node name currently only picks up RABBITMQ_NODENAME,
so environments where node name has to be explicitly configured
via rabbitmq-env.conf:
NODENAME=rabbit@our.custom.hostname
would not pick this node name up. RABBITMQ_NODENAME had to be added
as a workaround.
With this change the behavior of CLI tools and the server is closer.
Note that this updates a few places which used `Config.get_option/2`
to get a "default node name" which more often than not ended up
being a node prefix ("rabbit"). Those tests had to be updated
to use `Config.default/1`.
Closes#421.
References c8e766dec7, 8a5ab87038.
It prints RabbitMQ-specific environment variables that
are set on the target node. Can be used to inspect env variable-based
configuration without access to the target host.
Fail unsuccessful HTTP requests, go silent
This will be used to trigger rabbitmq-server tests, because there is a
new commit in a rabbitmq-cli release branch.
cc @dumbbell
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Before this commit, if the product name & version were not overridden,
they would default to the base product name & version ("RabbitMQ" + its
version).
Now, if they are not set/overridden, their corresponding lines are not
added to the output of `status`. Therefore, `rabbitmqctl status` on a
regular RabbitMQ will output the same thing as before.
@michaelklishin:
> Some test cases cannot be run in parallel since they rely on target
> node state as a shared resource.
If/when we want to improve this:
@michaelklishin:
> What we can do is make this configurable. Some tests can run in
> parallel, e.g. all rabbitmq-diagnostics tests that do not trigger/report
> alarms. But this only would be useful for interactive runs or if we
> split all tests into groups and allow parallel runs for some groups.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
This changes the mix test default which is 2 x number of CPUs.
We only append --trace if VERBOSE_TEST is true (enabled by default) and
don't change the number of max-cases. I'm not sure why we coupled the
number of max cases to verbose testing, they don't seem related to me.
Since @dfedotov is no longer with us, I can't ask him, so just making
the change which feels right.
This is in response to GitHub Actions failing consistently with:
== Compilation error in file test/ctl/list_connections_command_test.exs ==
** (exit) exited in: :gen_server.call(#PID<32153.2990.0>, {:read_cache, -576460752303386991}, :infinity)
** (EXIT) no connection to rabbit_ctl_36@fv-az56
##[error] (stdlib 3.8) gen_server.erl:223: :gen_server.call/3
##[error] (stdlib 3.8) erl_eval.erl:680: :erl_eval.do_apply/6
##[error] (stdlib 3.8) erl_eval.erl:888: :erl_eval.expr_list/6
##[error] (stdlib 3.8) erl_eval.erl:411: :erl_eval.expr/5
##[error] (elixir 1.10.2) lib/kernel/parallel_compiler.ex:396: Kernel.ParallelCompiler.require_file/2
##[error] (elixir 1.10.2) lib/kernel/parallel_compiler.ex:306: anonymous fn/4 in Kernel.ParallelCompiler.spawn_workers/7
##[error]Makefile:114: recipe for target 'tests' failed
Our assumption is that limiting the number of test parallelism will make
these failures go away.
cc @michaelklishin @dumbbell
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
The readiness is similar to that of 'rabbitmq-diagnostics check_if_node_is_quorum_critical'
but this command awaits for it up to --timeout seconds.
While at it, refactor DefautOutput to detect and support JSON formatting
of most basic return values suc has :ok or {:error, map}.
Part of #408.
Compared to the regular configuration we use, it modifies it heavily to:
* start a background RabbitMQ node with the federation and STOMP plugins
enabled
* run the specific test target of the CLI
* test against the oldest and latest versions of Elixir
... down from 10% of the configured timeout.
This has a significant impact on the time it takes to start RabbitMQ in
all our testsuites. rabbitmq-ct-helpers sets a wait timeout of 180
seconds. Thus before this patch, the wait loop would sleep for 18
seconds between each check. Given it takes about 1.5 seconds to start
RabbitMQ, a lot of time is wasted here.
Here are some numbers after running testsuites with and without this
patch:
* `make ct-fast` in rabbitmq-server: 8m15s down to 4m58s
* `make ct` in rabbitmq-mqtt: 9m23s down to 6m43s
* `make ct` in rabbitmq-stomp: 4m31s down to 2m04s
[#171535484]
They are printed in addition to the underlying RabbitMQ version.
If it is unavailable, for instance because the node is old enough to
not export the product info, we use "RabbitMQ" as the name and the
underlying RabbitMQ version as the version.
[#171467799]
Now that the node removes its PID file on exit, we need to read it
before stopping the node.
Otherwise, if the PID file was already removed when
`OsPid.read_pid_from_file()` is called, it will wait for the PID file to
appear again in an infinite loop.
This was found when testing the RHEL 6 package on CentOS 6 in CI.
The context is either initialized from the CLI's process environment or
from the remote node's own context.
This is required to e.g. find plugins or Mnesia directory.
In `code_path`, we don't append `ebin` anymore to the code path because
the `rabbit` application is now packaged as an .ez archive like plugins.
This simplifies the overall layout of the project.
They are not dependencies of the CLI, but dependencies of rabbit_common.
Unfortunately, mix(1) doesn't embed them in the final escripts. In fact,
it doesn't embed any dependencies of rabbit_common, but the CLI probably
doesn't call the code which would trigger a crash.
Before this patch, the command would wait for the PID file to appear,
then it would read the PID, check if that process exists and terminates
with `no_process_running` if it wasn't.
This was a problem if the PID file was still there with stale data. The
`wait` command would fail even though another node is starting but
hasn't had a chance to write its PID yet.
Now, the command will read the PID, verify the system process and try to
ping the Erlang node in a loop with the specified timeout. This helps
if a node is restarted but the new PID is not yet written to the file.
Note that it's a slight change in behavior w.r.t crashed nodes though:
if a node crashes (or already crashed), the command will wait until
timeout. Before the command would have exited almost immediately.