The default value of ?SEGMENT_ENTRY_COUNT is 16384. Due to
the index file format the entire segment file has to be loaded
into memory whenever messages from this segment must be accessed.
This can result in hundreds of kilobytes that are read, processed
and converted to Erlang terms. This creates a lot of garbage in
memory, and this garbage unfortunately ends up in the old heap
and so cannot be reclaimed before a full GC. Even when forcing
a full GC every run (fullsweep_after=0) the process ends up
allocating a lot of memory to read segment files and this can
create issues in some scenarios.
While this is not a problem when there are only a small number
of queues, this becomes a showstopper when the number of queues
is more important (hundreds or thousands of queues). This only
applies to classic/lazy queues.
This commit allows configuring the segment file entry count
so that the size of the file can be greatly reduced, as well
as the memory footprint when reading from the file becomes
necessary.
Experiments using a segment entry count of 1024 show no
noticeable downside other than the natural increase in the
number of segment files.
The segment entry count can only be set on nodes that have
no messages in their queue index. This is because the index
uses this value to calculate in which segment file specific
messages are sitting in.
Now that the Cuttlefish schema sets default values for the application
environment in `{rabbit, [{log, ...}]}`, the values set in the testsuite
using application:setenv() are overwritten.
By using the $RABBITMQ_LOGS environment variable, we can override those
default values.
This suite contains only one group, but is long enough to warrant
sharding. This is probably a bit of a time penalty in absolute terms
because init_per_suite and init_per_group re-run in each shard.
The implementation depends on erlang-systemd [1] which uses Unix socket
support introduced in Erlang 19. Therefore it doesn't rely on a native
library. We also don't need special handling if the host doesn't use
journald.
To enable the journald handler, add the following configuration
variable:
log.journald = true
The log level can also be set the same way it is with other handlers:
log.journald.level = debug
The log messages are communicated to journald using structured data. It
is possible to configure which fields are transmitted and how they are
named:
log.journald.fields = SYSLOG_IDENTIFIER="rabbitmq-server" syslog_timestamp syslog_pid priority ERL_PID=pid
In this example:
* the `SYSLOG_IDENTIFIER` is set to a string literal
* `syslog_timestamp and `syslog_pid` are aliases for
SYSLOG_TIMESTAMP=time and SYSLOG_PID=os_pid
* `priority` is a special field computed from the log level
* `ERL_PID=pid` indicates `pid` should be sent as the `ERL_PID`
field.
The message itself is implicit and always sent. Otherwise, the list of
fields must be exhaustive: fields which are unset in a particular log
event meta are sent as an empty string and non-mentionned fields are not
sent. The order is not important.
Here are some messages printed by `journalctl -f` during RabbitMQ
startup:
Mar 26 11:58:31 ip-172-31-43-179 rabbitmq-server[19286]: Ready to start client connection listeners
Mar 26 11:58:31 ip-172-31-43-179 rabbitmq-server[19286]: started TCP listener on [::]:5672
Mar 26 11:58:31 ip-172-31-43-179 rabbitmq-server[19286]: Server startup complete; 0 plugins started.
[1] https://github.com/rabbitmq/erlang-systemd
This is not exposed to the end user (yet) through the Cuttlefish
configuration. But this is required to make logging_SUITE timezone
agnostic (i.e. the timezone of the host running the testsuite should not
affect the formatted times).
The design of the rabbit_ct_config_schema helper makes it impossible to
do pattern matching and thus handle default values in the schema. As a
consequence, the helper explicitly removes the `{rabbit, {log, _}}`
configuration key to work around this limitation until a proper solution
is implemented and all testsuites rewritten. See
rabbitmq/rabbitmq-ct-helpers@b1f1f1ce68.
Therefore, we can't test log configuration variables anymore using this
helper. Thatt's ok because logging_SUITE already tests many things.
In addition to the existing configuration variables to configure
logging, the following variables were added to extend the settings.
log.*.formatter = plaintext | json
Selects between the plain text (default) and JSON formatters.
log.*.formatter.time_format = rfc3339_space | rfc3339_T | epoch_usecs | epoch_secs | lager_default
Configures how the timestamp should be formatted. It has several
values to get RFC3339 date & time, Epoch-based integers and Lager
default format.
log.*.formatter.level_format = lc | uc | lc3 | uc3 | lc4 | uc4
Configures how to format the level. Things like uppercase vs.
lowercase, full vs. truncated.
Examples:
lc: debug
uc: DEBUG
lc3: dbg
uc3: DBG
lw4: dbug
uc4: DBUG
log.*.formatter.single_line = on | off
Indicates if multi-line messages should be reformatted as a
single-line message. A multi-line message is converted to a
single-line message by joining all lines and separating them
with ", ".
log.*.formatter.plaintext.format
Set to a pattern to indicate the format of the entire message. The
format pattern is a string with $-based variables. Each variable
corresponds to a field in the log event. Here is a non-exhaustive list
of common fields:
time
level
msg
pid
file
line
Example:
$time [$level] $pid $msg
log.*.formatter.json.field_map
Indicates if fields should be renamed or removed, and the ordering
which they should appear in the final JSON object. The order is set by
the order of fields in that coniguration variable.
Example:
time:ts level msg *:-
In this example, `time` is renamed to `ts`. `*:-` tells to remove all
fields not mentionned in the list. In the end the JSON object will
contain the fields in the following order: ts, level, msg.
log.*.formatter.json.verbosity_map
Indicates if a verbosity field should be added and how it should be
derived from the level. If the verbosity map is not set, no verbosity
field is added to the JSON object.
Example:
debug:2 info:1 notice:1 *:0
In this example, debug verbosity is 2, info and notice verbosity is 1,
other levels have a verbosity of 0.
All of them work with the console, exchange, file and syslog outputs.
The console output has specific variables too:
log.console.stdio = stdout | stderr
Indicates if stdout or stderr should be used. The default is stdout.
log.console.use_colors = on | off
Indicates if colors should be used in log messages. The default
depends on the environment.
log.console.color_esc_seqs.*
Indicates how each level is mapped to a color. The value can be any
string but the idea is to use an ANSI escape sequence.
Example:
log.console.color_esc_seqs.error = \033[1;31m
V2: A custom time format pattern was introduced, first using variables,
then a reference date & time (e.g. "Mon 2 Jan 2006"), thanks to
@ansd. However, we decided to remove it for now until we have a
better implementation of the reference date & time parser.
V3: The testsuite was extended to cover new settings as well as the
syslog output. To test it, a fake syslogd server was added (Erlang
process, part of the testsuite).
V4: The dependency to cuttlefish is moved to rabbitmq_prelaunch which
actually uses the library. The version is updated to 3.0.1 because
we need Kyorai/cuttlefish#25.
Adds WORKSPACE.bazel, BUILD.bazel & *.bzl files for partial build & test with Bazel. Introduces a build-time dependency on https://github.com/rabbitmq/bazel-erlang
... instead of function arity.
I don't know when this was introduced, perhaps Erlang 23. Anyway, each
stacktrace entry has now the form:
{Mod, Fun, ArgListOrArity, Props}
where `ArgListOrArity` is either the function arity (an integer) or the
actual list of arguments passed to the function.
If we get the latter, we format the stacktrace line as usual and we add
the list of arguments to the line below it.
... if it is set in the configuration file.
Here is an example of that use case:
* The official Docker image sets RABBITMQ_LOGS=- in the environment
* A user of that image adds a configuration file with:
log.console.level = debug
The initial implementation, introduced in rabbitmq/rabbitmq-server#2861,
considered that if the output is overriden in the environment (through
$RABBITMQ_LOGS), any output configuration in the configuration file is
ignored.
The problem is that the output-specific configuration could also set the
log level which is not changed by $RABBITMQ_LOGS. This patch fixes that
by keeping the log level from the configuration (if it is set obviously)
even if the output is overridden in the environment.
To make sure the PID is alive, as the mnesia record can stale after a
failure.
Make also the local PID lookup in the stream coordinator do a consistent
query over the cluster if the PID is not alive.
Co-authored-by: Karl Nilsson <kjnilsson@users.noreply.github.com>
Sleep for 5s after a failure due to a node being down before reporting
back to stream coordinator (which will immediately retry).
stream coordinator: correct command type spec
tidy up
fix rabbit_fifo_prop tests
stream coord: add function for member state query
This allows including additional applications or third party
plugins when creating a release, running the broker locally,
or just building from the top-level Makefile.
To include Looking Glass in a release, for example:
$ make package-generic-unix ADDITIONAL_PLUGINS="looking_glass"
A Docker image can then be built using this release and will
contain Looking Glass:
$ make docker-image
Beware macOS users! Applications such as Looking Glass include
NIFs. NIFs must be compiled in the right environment. If you
are building a Docker image then make sure to build the NIF
on Linux! In the two steps above, this corresponds to Step 1.
To run the broker with Looking Glass available:
$ make run-broker ADDITIONAL_PLUGINS="looking_glass"
This commit also moves Looking Glass dependency information
into rabbitmq-components.mk so it is available at all times.
by flipping the two list comprehension conditions.
If not is_local_to_node, then is_down will not be evaluated.
This saves (R-1) * Q network calls every 2 minutes where R is the number
of replicas per quorum queue and Q is the number of quorum queues in the
RabbitMQ cluster.
Lager strips trailing newline characters but OTP logger with the default
formatter adds a newline at the end. To avoid unintentional multi-line log
messages we have to revisit most messages logged.
Some log entries are intentionally multiline, others
are printed to stdout directly: newlines are required there
for sensible formatting.
The configuration remains the same for the end-user. The only exception
is the log root directory: it is now set through the `log_root`
application env. variable in `rabbit`. People using the Cuttlefish-based
configuration file are not affected by this exception.
The main change is how the logging facility is configured. It now
happens in `rabbit_prelaunch_logging`. The `rabbit_lager` module is
removed.
The supported outputs remain the same: the console, text files, the
`amq.rabbitmq.log` exchange and syslog.
The message text format slightly changed: the timestamp is more precise
(now to the microsecond) and the level can be abbreviated to always be
4-character long to align all messages and improve readability. Here is
an example:
2021-03-03 10:22:30.377392+01:00 [dbug] <0.229.0> == Prelaunch DONE ==
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Starting RabbitMQ 3.8.10+115.g071f3fb on Erlang 23.2.5
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Licensed under the MPL 2.0. Website: https://rabbitmq.com
The example above also shows that multiline messages are supported and
each line is prepended with the same prefix (the timestamp, the level
and the Erlang process PID).
JSON is also supported as a message format and now for any outputs.
Indeed, it is possible to use it with e.g. syslog or the exchange. Here
is an example of a JSON-formatted message sent to syslog:
Mar 3 11:23:06 localhost rabbitmq-server[27908] <0.229.0> - {"time":"2021-03-03T11:23:06.998466+01:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","meta":{"domain":"rabbitmq.prelaunch","file":"src/rabbit_prelaunch_logging.erl","gl":"<0.228.0>","line":311,"mfa":["rabbit_prelaunch_logging","configure_logger",1],"pid":"<0.229.0>"}}
For quick testing, the values accepted by the `$RABBITMQ_LOGS`
environment variables were extended:
* `-` still means stdout
* `-stderr` means stderr
* `syslog:` means syslog on localhost
* `exchange:` means logging to `amq.rabbitmq.log`
`$RABBITMQ_LOG` was also extended. It now accepts a `+json` modifier (in
addition to the existing `+color` one). With that modifier, messages are
formatted as JSON intead of plain text.
The `rabbitmqctl rotate_logs` command is deprecated. The reason is
Logger does not expose a function to force log rotation. However, it
will detect when a file was rotated by an external tool.
From a developer point of view, the old `rabbit_log*` API remains
supported, though it is now deprecated. It is implemented as regular
modules: there is no `parse_transform` involved anymore.
In the code, it is recommended to use the new Logger macros. For
instance, `?LOG_INFO(Format, Args)`. If possible, messages should be
augmented with some metadata. For instance (note the map after the
message):
?LOG_NOTICE("Logging: switching to configured handler(s); following "
"messages may not be visible in this log output",
#{domain => ?RMQLOG_DOMAIN_PRELAUNCH}),
Domains in Erlang Logger parlance are the way to categorize messages.
Some predefined domains, matching previous categories, are currently
defined in `rabbit_common/include/logging.hrl` or headers in the
relevant plugins for plugin-specific categories.
At this point, very few messages have been converted from the old
`rabbit_log*` API to the new macros. It can be done gradually when
working on a particular module or logging.
The Erlang builtin console/file handler, `logger_std_h`, has been forked
because it lacks date-based file rotation. The configuration of
date-based rotation is identical to Lager. Once the dust has settled for
this feature, the goal is to submit it upstream for inclusion in Erlang.
The forked module is calld `rabbit_logger_std_h` and is based
`logger_std_h` in Erlang 23.0.
as node names grow.
Prior to this change, direct reply-to consumer channels
were encoded using term_to_binary/1, which means the result
would grow together with node name (since node name
is one of the components of an Erlang pid type).
This means that with long enough hostnames, reply-to
identifiers could overflow the 255 character limit of
message property field type, longstr.
With this change, the encoded value uses a hash of the node name
and then locates the actual node name from a map of
hashes to current cluster members.
In addition, instead of generating non-predictable "secure"
GUIDs the feature now generates "regular" predictable GUIDs
which compensates some of the additional PID pre- and post-processing
outlined above.
It's not possible to know all aplicable policies since plugins can extend
these, i.e. federation. Thus, we'll exclude the known unapplicable core policies
and allow through any other policy.
After several releases and several months of
docs saying that 22.3 is the minimum supported
version, we pull the trigger.
This is required by rabbitmq/credentials-obfuscation#10
as it uses the new crypto API available
in 22.1+ only.
(cherry picked from commit 36d6693776)
Since the validation fails with "or isn't readable", we should actually
check whether we can read the file. This way, when configuring TLS for
example, you get early feedback if the cert files are not readable.
The time this operation can take in clusters with a lot of classic
mirrored queue (say, 10s or 100s of thousands) be prohibitive for
upgrades.
Upgrades that use a health check to ensure that there are in-sync
replicas before entering maintenance mode, in which case
the transfer is not really necessary.
All of the above is more obvious with the recent changes in #2749.
Since we only consider nodes hosting
in-sync replicas for transfer candidates,
we can drop only one mirror instead of N,
and reduce the load caused by this operation.
This does not affect CMQ leadership
transfer when performed in the context
'rabbitmq-queues rebalance'
Pair: @dcorbacho, @mkuratczyk
of classic mirrored queues.
There are cases when asynchronously adding mirrors makes
a lot of sense: e.g. when a new node joins the cluster.
In this case, if we add mirrors asynchronously, this
operation will race with the step that removes mirrors.
As a result, we can end up with a queue that decided
that it had no promotable replicas => data loss
from the transfer.
Closes#2749.
Pairs: @dcorbacho, @mkuratczyk
Detect when a new stream leader is elected and make stream_queues
re-send any unconfirmed, pending messages to ensure they did not get
lost during the leader change. This is done using the osiris
deduplication feature to ensure the resend does not create duplicates of
messages in the stream.
This restores the behavior prior the commit making `rabbit` closer to a
standard Erlang application.
Plugins are still actually started after rabbit is started (because they
depend on the `rabbit` application). Only the execution of their boot
steps was moved earlier.
With the behavior restored, it also means that a plugin's dependencies
are not started yet when its boot steps are executed.
V2: Move the maintenance mode reset before the plugin boot steps run.
V3: Add a `core_started` boot state. That state is reached at the end of
the `rabbit` app start function. It indicates when the RabbitMQ core
is started but the full service is not yet ready.
We now use this state in direct connection code to determine if
clients can open a direct connection. We have to do that because
some plugins open a direct connection as part of their own startup
(i.e. they can't wait for the `ready` boot state which comes later).
Without this change using anything other than `rabbit` or the `rabbitmq-env-conf.bat` file will result in `erlang_dist_running_with_unexpected_nodename`
Follow-up to #2673
cc @dumbbell @michaelklishin
Currently RABBITMQ_BASE is always dynamically picked up from the
environment. This change would fix it at the time of configuration
of the service allowing multiple RabbitMQ services to be configured.
Connections to the stream plugin does not have a type, so they can
trigger some function_clause errors. This was the case when trying to
close a connection from rabbit_connection_tracking module. The function
now falls back to a simple gen_server call to the connection process for
connections without a type.
As the connection may crash during the previous declaration and a caught
error would be returned in amqp_connection:open_channel/1 that wasn't
handled previously. Exactly how things fail in this test is most likely
very timing dependent and may vary.
Also fixes mqtt test where the process that set up a mock auth ETS table
was transient when an rpc timeout was introduced
Else an application that polled an empty quorum queue frequntly using basic.get
would never result in a snapshot being taken and results in unlimited
log growth.