Exit code is useful for monitoring and process supervisors when it comes
to deciding on what to do when the process exits, for example we may
want to restart it or send a report. The current implementation of
`rabbitmq-server` script does not propagate the exit code in a general
case which makes it impossible to know whether the exit was clean and,
for example, use restart policy `on-failure` in docker.
This change makes the exit code to be propagated.
Since Erlang/OTP 26:
```
OTP-18445
Application(s):
erts, stdlib
It is no longer necessary to enable a feature in the runtime system in order to load modules that are using it.
It is sufficient to enable the feature in the compiler when compiling it.
That means that to use feature maybe_expr in Erlang/OTP 26, it is sufficient to enable it during compilation.
In Erlang/OTP 27, feature maybe_expr will be enabled by default, but it will be possible to disable it.
```
This commit is pure refactoring making the code base more maintainable.
Replace rabbit_misc:pipeline/3 with the new OTP 25 experimental maybe
expression because
"Frequent ways in which people work with sequences of failable
operations include folds over lists of functions, and abusing list
comprehensions. Both patterns have heavy weaknesses that makes them less
than ideal."
https://www.erlang.org/eeps/eep-0049#obsoleting-messy-patterns
Additionally, this commit is more restrictive in the type spec of
rabbit_mqtt_processor state fields.
Specifically, many fields were defined to be `undefined | T` where
`undefined` was only temporarily until the first CONNECT packet was
processed by the processor.
It's better to initialise the MQTT processor upon first CONNECT packet
because there is no point in having a processor without having received
any packet.
This allows many type specs in the processor to change from `undefined |
T` to just `T`.
Additionally, memory is saved by removing the `received_connect_packet`
field from the `rabbit_mqtt_reader` and `rabbit_web_mqtt_handler`.
This reverts commit 8070344a38.
We learnt during the last 6 days on master branch that RabbitMQ
- as of today - is not compatible with kernel parameter
`prevent_overlapping_partitions` set to `true`.
RabbitMQ explicitly disconnects node in at least two places:
1. rabbit_node_monitor to "promote" a partial network partition
to a full partition, and
2. rabbit_mnesia after a node reset to disconnect it from the
rest of the cluster.
There is no atomicity in the way we disconnect several nodes,
because it's a simple loop. Therefore, remote nodes may/will detect
disconnection at different times obviously. In global's new
behavior behind prevent_overlapping_partitions, our attempt to
disconnect all nodes in rabbit_mnesia creates a partial network
partition from global's point of view, leading to a complete
disconnection of the cluster.
For example, test
```
make ct-clustering_management t=cluster_size_3:join_and_part_cluster
```
was flaky and demonstrates the 2nd bullet point above where RabbitMQ
interfering with Erlang distribution conflicts with global's
prevent_overlapping_partitions.
When RabbitMQ resets a node, its last step is to loop over
clustered nodes and disconnect from them one at a time.
In this test with a 3-node cluster where we reset node A:
1. Node A instructs node B and C to remove node A from their view
of the cluster
2. Node A disconnects from node B
3. global on node B get a nodedow event for node A, but node C is
still connected to node A
4. global on node B concludes there is a network partition and
disconnect from node A and node C
At this point, each node is on its own.
Nothing in RabbitMQ tries to restore the connection between
nodes B and C.
The correct path forward is:
1. Get rid of Mnesia replacing it with Khepri.
2. Once mirrored classic queues are removed, get rid of rabbit_node_monitor.
3. Have a clear and consistent view of the nodes comprising a RabbitMQ Cluster:
In other words, do not use different sources of truths like nodes(),
Mnesia, Ra clusters, global monitor at different places in the code.
For the time being we live with `prevent_overlapping_partitions` set to `false`
and with the workaround for global:sync/0 being stuck introduced in
9fcb31f348
This kernel parameter got introduced in Erlang 24.3.
It is set to `false` by default in Erlang 24.
It is set to `true` by default in Erlang 25.
This commit requires Erlang >= 24.3.
As described in commit message
4bf78d822d
setting this flag to `true` will prevent global:sync/0 from hanging
in the presence of network failures.
Instead of relying on our own workaround of global:sync/0 being stuck
introduced in
9fcb31f348
let us instead rely on the official Erlang fix that comes by setting
prevent_overloapping_partitions to true.
The configuration remains the same for the end-user. The only exception
is the log root directory: it is now set through the `log_root`
application env. variable in `rabbit`. People using the Cuttlefish-based
configuration file are not affected by this exception.
The main change is how the logging facility is configured. It now
happens in `rabbit_prelaunch_logging`. The `rabbit_lager` module is
removed.
The supported outputs remain the same: the console, text files, the
`amq.rabbitmq.log` exchange and syslog.
The message text format slightly changed: the timestamp is more precise
(now to the microsecond) and the level can be abbreviated to always be
4-character long to align all messages and improve readability. Here is
an example:
2021-03-03 10:22:30.377392+01:00 [dbug] <0.229.0> == Prelaunch DONE ==
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Starting RabbitMQ 3.8.10+115.g071f3fb on Erlang 23.2.5
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Licensed under the MPL 2.0. Website: https://rabbitmq.com
The example above also shows that multiline messages are supported and
each line is prepended with the same prefix (the timestamp, the level
and the Erlang process PID).
JSON is also supported as a message format and now for any outputs.
Indeed, it is possible to use it with e.g. syslog or the exchange. Here
is an example of a JSON-formatted message sent to syslog:
Mar 3 11:23:06 localhost rabbitmq-server[27908] <0.229.0> - {"time":"2021-03-03T11:23:06.998466+01:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","meta":{"domain":"rabbitmq.prelaunch","file":"src/rabbit_prelaunch_logging.erl","gl":"<0.228.0>","line":311,"mfa":["rabbit_prelaunch_logging","configure_logger",1],"pid":"<0.229.0>"}}
For quick testing, the values accepted by the `$RABBITMQ_LOGS`
environment variables were extended:
* `-` still means stdout
* `-stderr` means stderr
* `syslog:` means syslog on localhost
* `exchange:` means logging to `amq.rabbitmq.log`
`$RABBITMQ_LOG` was also extended. It now accepts a `+json` modifier (in
addition to the existing `+color` one). With that modifier, messages are
formatted as JSON intead of plain text.
The `rabbitmqctl rotate_logs` command is deprecated. The reason is
Logger does not expose a function to force log rotation. However, it
will detect when a file was rotated by an external tool.
From a developer point of view, the old `rabbit_log*` API remains
supported, though it is now deprecated. It is implemented as regular
modules: there is no `parse_transform` involved anymore.
In the code, it is recommended to use the new Logger macros. For
instance, `?LOG_INFO(Format, Args)`. If possible, messages should be
augmented with some metadata. For instance (note the map after the
message):
?LOG_NOTICE("Logging: switching to configured handler(s); following "
"messages may not be visible in this log output",
#{domain => ?RMQLOG_DOMAIN_PRELAUNCH}),
Domains in Erlang Logger parlance are the way to categorize messages.
Some predefined domains, matching previous categories, are currently
defined in `rabbit_common/include/logging.hrl` or headers in the
relevant plugins for plugin-specific categories.
At this point, very few messages have been converted from the old
`rabbit_log*` API to the new macros. It can be done gradually when
working on a particular module or logging.
The Erlang builtin console/file handler, `logger_std_h`, has been forked
because it lacks date-based file rotation. The configuration of
date-based rotation is identical to Lager. Once the dust has settled for
this feature, the goal is to submit it upstream for inclusion in Erlang.
The forked module is calld `rabbit_logger_std_h` and is based
`logger_std_h` in Erlang 23.0.
Without this change using anything other than `rabbit` or the `rabbitmq-env-conf.bat` file will result in `erlang_dist_running_with_unexpected_nodename`
Follow-up to #2673
cc @dumbbell @michaelklishin
Currently RABBITMQ_BASE is always dynamically picked up from the
environment. This change would fix it at the time of configuration
of the service allowing multiple RabbitMQ services to be configured.