Otherwise, if the FHC runs into an exception because of an ETS
write failure, most code paths begin failing.
This may help with the condition in #8784.
[Why]
Up to RabbitMQ 3.7.x, versions were compared to determine if two nodes
were compatible. In 3.8.0, feature flags were introduced to fulfill this
check.
The current branch of RabbitMQ is no longer compatible with 3.7.x
because several feature flags are now required. This means the user must
upgrade to intermediate versions first before.
Therefore, we don't need to keep comparing versions.
[How]
The check is now simply calling the feature flags subsystem check.
We still keep versions comparison for plugins. However, we can get rid
of the special case for RabbitMQ 3.6.6.
Currently it is not possible for an AMQP 1.0 to make find
out a message is unroutable as it is settled with the
accepted state. This is because the current implementation
relies only on the publish confirms AMQP 091 extension.
This commit changes this by settling the message with the
released state. It uses the mandatory flag mechanism from
AMQP 091 and "internally" extends it to provide not only
the message in the callback, but the publishing sequence
as well. This applies only for AMQP 1.0, not for other
cases. Publish confirms and the mandatory flag are used
in conjonction in this case.
References #7823
Allow rabbitmq_exchange:route_return() to return infos in addition to
the matched binding keys.
For example, some exchange types read the full #amqqueue{} record from the database
and can in future return it from the route/3 function to avoid reading the full
record again in the channel.
For MQTT 5.0 destination queues, the topic exchange does not only have
to return the destination queue names, but also the matched binding
keys.
This is needed to implement MQTT 5.0 subscription options No Local,
Retain As Published and Subscription Identifiers.
Prior to this commit, as the trie was walked down, we remembered the
edges being walked and assembled the final binding key with
list_to_binary/1.
list_to_binary/1 is very expensive with long lists (long topic names),
even in OTP 26.
The CPU flame graph showed ~3% of CPU usage was spent only in
list_to_binary/1.
Unfortunately and unnecessarily, the current topic exchange
implementation stores topic levels as lists.
It would be better to store topic levels as binaries:
split_topic_key/1 should ideally use binary:split/3 similar as follows:
```
1> P = binary:compile_pattern(<<".">>).
{bm,#Ref<0.1273071188.1488322568.63736>}
2> Bin = <<"aaa.bbb..ccc">>.
<<"aaa.bbb..ccc">>
3> binary:split(Bin, P, [global]).
[<<"aaa">>,<<"bbb">>,<<>>,<<"ccc">>]
```
The compiled pattern could be placed into persistent term.
This commit decided to avoid migrating Mnesia tables to use binaries
instead of lists. Mnesia migrations are non-trivial, especially with the
current feature flag subsystem.
Furthermore the Mnesia topic tables are already getting migrated to
their Khepri counterparts in 3.13.
Adding additional migration only for Mnesia does not make sense.
So, instead of assembling the binding key as we walk down the trie and
then calling list_to_binary/1 in the leaf, it
would be better to just fetch the binding key from the database in the leaf.
As we reach the leaf of the trie, we know both source and destination.
Unfortunately, we cannot fetch the binding key efficiently with the
current rabbit_route (sorted by source exchange) and
rabbit_reverse_route (sorted by destination) tables as the key is in
the middle between source and destination.
If there are a huge number of bindings for a given sourc exchange (very
realistic in MQTT use cases) or a large number of bindings for a given
destination (also realistic), it would require scanning these large
number of bindings.
Therefore this commit takes the simplest possible solution:
The solution leverages the fact that binding arguments are already part of
table rabbit_topic_trie_binding.
So, if we simply include the binding key into the binding arguments, we
can fetch and return it efficiently in the topic exchange
implementation.
The following patch omitting fetching the empty list binding argument
(the default) makes routing slower because function
`analyze_pattern.constprop.0` requires significantly more (~2.5%) CPU time
```
@@ -273,7 +273,11 @@ trie_bindings(X, Node) ->
node_id = Node,
destination = '$1',
arguments = '$2'}},
- mnesia:select(?MNESIA_BINDING_TABLE, [{MatchHead, [], [{{'$1', '$2'}}]}]).
+ mnesia:select(
+ ?MNESIA_BINDING_TABLE,
+ [{MatchHead, [{'andalso', {'is_list', '$2'}, {'=/=', '$2', []}}], [{{'$1', '$2'}}]},
+ {MatchHead, [], ['$1']}
+ ]).
```
Hence, this commit always fetches the binding arguments.
All MQTT 5.0 destination queues will create a binding that
contains the binding key in the binding arguments.
Not only does this solution avoid expensive list_to_binay/1 calls, but
it also means that Erlang app rabbit (specifically the topic exchange
implementation) does not need to be aware of MQTT anymore:
It just returns the binding key when the binding args tell to do so.
In future, once the Khepri migration completed, we should be able to
relatively simply remove the binding key from the binding arguments
again to free up some storage space.
Note that one of the advantages of a trie data structue is its space
efficiency that you don't have to store the same prefixes multiple
times.
However, for RabbitMQ the binding key is already stored at least N times
in various routing tables, so storing it a few times more via the
binding arguments should be acceptable.
The speed improvements are favoured over a few more MBs ETS usage.
A recent change in OTP made the unit_SUITE fail with a case clause.
cdd7200cbe
Reverting this old commit seems to fix the problem for OTP master
without breaking OTP25/26.
Co-authored-by: @lhoguin
This commit replaces file combining with single-file compaction
where data is moved near the beginning of the file before
updating the index entries. The file is then truncated when
all existing readers are gone. This allows removing the lock
that existed before and enables reading multiple messages at
once from the shared files. This also helps us avoid many
ets operations and simplify the code greatly.
This commit still has some issues: reading a single message
is currently slow due to the removal of FHC in the client
code. This will be resolved by implementing read buffering
in a similar way as FHC but without keeping files open
more than necessary.
The dirty recovery code also likely has a number of issues
because of the compaction changes.
Follow up of https://github.com/rabbitmq/rabbitmq-server/pull/7913
and https://github.com/rabbitmq/rabbitmq-server/pull/7921
This commit uses the approach explained in https://github.com/erlang/otp/issues/7130#issuecomment-1512808759
We cannot use the macro `?OTP_RELEASE` since macros are evaluated at
compile time. RabbitMQ can be compiled with OTP 25 and executed with OTP
26. Therefore, we use `erlang:system_info(otp_release)` instead.
As `erlang:system_info/1` might be costly, we store the send function in
persistent_term.
For OTP 25, we use the "old tcp send workaround" (i.e.
`erlang:port_command/2`) which avoids expensive selective receives.
For OTP 26, we use `gen_tcp:send/2` which uses the optimised selective
receive.
Once the minimum required version becomes OTP 26, we can just switch to
`gen_tcp:send/2` and delete the `inet_reply` handling code in the various
RabbitMQ reader and writer processes.
Note that `rabbit_net:port_command/2` is not only used by RabbitMQ server,
but also by the AMQP 0.9.1 client.
Therefore, instead of putting the OTP version (or send function) into
persistent_term within the rabbit app, we just do it the first time
`rabbit_net:port_command/2` is invoked.
(`rabbit_common` is just a library without supervision hierarchy.)
Revert the change introduced in https://github.com/rabbitmq/rabbitmq-server/pull/7900
RabbitMQ 3.12 will require OTP 25 (not yet OTP 26).
The use of macro `OTP_RELEASE` was wrong because RabbitMQ can be
compiled with OTP 25, but run with OTP 26. Macros are evaluated at
compile time.
An alternative runtime equivalent would have been
```
1> erlang:system_info(otp_release).
"26"
```
In OTP 25, gen_tcp:send/2 has poor performance when the Erlang mailbox
is large because the selective receive is not optimised.
https://erlang.org/download/otp_src_26.0-rc3.readme
OTP-18520 Application(s): erts
Related Id(s): GH-6455
gen_tcp:send/*, gen_udp:send/* and gen_sctp:send/* have
been optimized to use the infamous receive reference
optimization, so now sending should not have bad
performance when the calling process has a large
message queue.
These functions extend the functionality of `erlang:is_process_alive/1`
to take into account the node a process is running on and its cluster
membership.
These functions are moved away from `rabbit_mnesia` because we don't
want `rabbit_mnesia` to be a central piece of RabbitMQ.
Classic-mirrored-queue-related modules continue to use `rabbit_mnesia`
functions, therefore relying on Mnesia, because they depend entirely on
Mnesia anyway. They will go away at the same time as our use of Mnesia.
So by keeping this code untouched, we avoid possible regressions.
This is the latest commit in the series, it fixes (almost) all the
problems with missing and circular dependencies for typing.
The only 2 unsolved problems are:
- `lg` dependency for `rabbit` - the problem is that it's the only
dependency that contains NIF. And there is no way to make dialyzer
ignore it - looks like unknown check is not suppressable by dialyzer
directives. In the future making `lg` a proper dependency can be a
good thing anyway.
- some missing elixir function in `rabbitmq_cli` (CSV, JSON and
logging related).
- `eetcd` dependency for `rabbitmq_peer_discovery_etcd` - this one
uses sub-directories in `src/`, which confuses dialyzer (or our bazel
machinery is not able to properly handle it). I've tried the latest
rules_erlang which flattens directory for .beam files, but it wasn't
enough for dialyzer - it wasn't able to find core erlang files. This
is a niche plugin and an unusual dependency, so probably not worth
investigating further.
This commit is pure refactoring making the code base more maintainable.
Replace rabbit_misc:pipeline/3 with the new OTP 25 experimental maybe
expression because
"Frequent ways in which people work with sequences of failable
operations include folds over lists of functions, and abusing list
comprehensions. Both patterns have heavy weaknesses that makes them less
than ideal."
https://www.erlang.org/eeps/eep-0049#obsoleting-messy-patterns
Additionally, this commit is more restrictive in the type spec of
rabbit_mqtt_processor state fields.
Specifically, many fields were defined to be `undefined | T` where
`undefined` was only temporarily until the first CONNECT packet was
processed by the processor.
It's better to initialise the MQTT processor upon first CONNECT packet
because there is no point in having a processor without having received
any packet.
This allows many type specs in the processor to change from `undefined |
T` to just `T`.
Additionally, memory is saved by removing the `received_connect_packet`
field from the `rabbit_mqtt_reader` and `rabbit_web_mqtt_handler`.
- Use the same base .plt everywhere, so there is no need to list
standard apps everywhere
- Fix typespecs: some typos and the use of not-exported types
* Add rabbitmq_cli dialyze to bazel
and fix a number of warnings
Because we stop mix from recompiling rabbit_common in bazel, many
unknown functions are reported, so this dialyzer analysis is somewhat
incomplete.
* Use erlang dialyzer for rabbitmq_cli rather than mix dialyzer
Since this resolves all of the rabbit functions, there are far fewer
unknown functions.
Requires yet to be released rules_erlang 3.9.2
* Temporarily use pre-release rules_erlang
So that checks can run on this PR without a release
* Fix additional dialyzer warnings in rabbitmq_cli
* rabbitmq_cli: mix format
* Additional fixes for ignored return values
* Revert "Temporarily use pre-release rules_erlang"
This reverts commit c16b5b6815.
* Use rules_erlang 3.9.2
The MQTT protocol specs define the term "MQTT Control Packet".
The MQTT specs never talk about "frame".
Let's reflect this naming in the source code since things get confusing
otherwise:
Packets belong to MQTT.
Frames belong to AMQP 0.9.1 or web sockets.
Prior to this commit, 1 MQTT publisher publishing to 1 Million target
classic queues requires around 680 MB of process memory.
After this commit, it requires around 290 MB of process memory.
This commit requires feature flag classic_queue_type_delivery_support
and introduces a new one called no_queue_name_in_classic_queue_client.
Instead of storing the binary queue name 4 times, this commit now stores
it only 1 time.
The monitor_registry is removed since only classic queue clients monitor
their classic queue server processes.
The classic queue client does not store the queue name anymore. Instead
the queue name is included in messages handled by the classic queue
client.
Storing the queue name in the record ctx was unnecessary.
More potential future memory optimisations:
* When routing to destination queues, looking up the queue record,
delivering to queue: Use streaming / batching instead of fetching all
at once
* Only fetch ETS columns that are necessary instead of whole queue
records
* Do not hold the same vhost binary in memory many times. Instead,
maintain a mapping.
* Remove unnecessary tuple fields.
This function returns the data directory where all subsystems should
store their files.
Historically, this was the Mnesia directory. But semantically, this
should be the reverse: RabbitMQ owns the data directory and Mnesia is
configured to put its files there too.
`rabbit_mnesia:dir/0` now calls `rabbit:data_dir/0`.
Other subsystems will be modified in a follow-up commit to call
`rabbit:data_dir/0` instead of `rabbit_mnesia:dir/0`.
The location and name of this directory remains the same for
compatibility reasons. Therefore, it sill contains "mnesia" in its name.
However, semantically, we want this directory to be unrelated to Mnesia.
In the end, many subsystems write files and directories there, including
Mnesia, all Ra systems and in the future, Khepri.
This value is used internally by `rabbit_env` and usually not read by
RabbitMQ otherwise.
This patch prepares the rename of `mnesia_dir` to `data_dir`, in order
to not semantically rely on Mnesia configuration or use to locate data,
whether it is stored in Mnesia or not.
Seems like we are not using it anywhere in our code base.
It's unlikely that it's used somewhere else and even if it is,
the API is backwards compatible - we just pass 0, as if the priority_queue
was empty.
That was done in PR #3865.
The changes introduced in #3865 can cause message arrival ordering guarantees
between two logical erlang process (sending messages via delegate) to
be violated as a message sent to a single destination can overtake a prior
message sent as part of a fan-out. This is due to the fact that the fan-out
take a different route via the delegate process than the direct delivery that
bypasses it.
This commit only reverses it for the `invoke_no_result/2|3` API and leaves the
optimisation in for the synchronous `invoke/` API. This means that the message
send ordering you expect between erlang processes still can be violated when
mixing invoke and invoke_no_result invocations. As far as I can see there are
no places where the code relies on this and there are uses of invoke (mgmt db)
that very well could benefit from avoiding the additional copying.
This category should be unused with the decommissioning of the old
upgrade subsystem (in favor of the feature flags subsystem). It means:
1. The upgrade log file will not be created by default anymore.
2. The `$RABBITMQ_UPGRADE_LOG` environment variable is now unsupported.
The configuration variables remain to avoid breaking an existing and
working configuration.
For the following flags I see an improvement of
30k/s to 34k/s on my machine:
-x 1 -y 1 -A 1000 -q 1000 -c 1000 -s 1000 -f persistent
-u cqv2 --queue-args=x-queue-version=2
Discovered by @dumbbell
Ensure externally read strings are saved as utf-8 encoded binaries. This
is necessary since `cmd.exe` on Windows uses ISO-8859-1 encoding and
directories can have latin1 characters, like `RabbitMQ Sérvér`.
The `é` is represented by decimal `233` in the ISO-8859-1 encoding. The
unicode code point is the same decimal value, `233`, so you will see
this in the charlist data. However, when encoded using utf-8, this
becomes the two-byte sequence `C3 A9` (hexidecimal).
When reading strings from env variables and configuration, they will be
unicode charlists, with each list item representing a unicode code
point. All of Erlang string functions can handle strings in this form.
Once these strings are written to ETS or Mnesia, they will be converted
to utf-8 encoded binaries. Prior to these changes just
`list_to_binary/1` was used.
Fix xref error
re:replace requires an iodata, which is not a list of unicode code points
Correctly parse unicode vhost tags
Fix many format strings to account for utf8 input. Try again to fix unicode vhost tags
More format string fixes, try to get the CONFIG_FILE var correct
Be sure to use the `unicode` option for re:replace when necessary
More unicode format strings, add unicode option to re:split
More format strings updated
Change ~s to ~ts for vhost format strings
Change ~s to ~ts for more vhost format strings
Change ~s to ~ts for more vhost format strings
Add unicode format chars to disk monitor
Quote the directory on unix
Finally figure out the correct way to pass unicode to the port
Stop sending connection_stats from protocol readers to rabbit_event.
Stop sending queue_stats from queues to rabbit_event.
Sending these stats every 5 seconds to the event manager process is
superfluous because noone handles these events.
They seem to be a relict from before rabbit_core_metrics ETS tables got
introduced in 2016.
Delete test head_message_timestamp_statistics because it tests that
head_message_timestamp is set correctly in queue_stats events
although queue_stats events are used nowhere.
The functionality of head_message_timestamp itself is still tested in
deps/rabbit/test/priority_queue_SUITE.erl and
deps/rabbit/test/temp/head_message_timestamp_tests.py
in e.g. the `advanced.config` file, or manually in runtime.
This also adds tracing through use of `rabbit_event`, controllable by
use of compile time flag, e.g. TRACE_SUP2.
A couple users reported `badmatch` crashes due to scenarios where
`inet:peername/1` does not return the expected value, most likely due to
the port closing between the time they are listed and when
`inet:peername/1` is called.
Fixes#5496
Discussion in #5490
Thoas is more efficient both in terms of encoding
time and peak memory footprint.
In the process we have discovered an issue:
https://github.com/lpil/thoas/issues/15
Pair: @pjk25
This avoids printing the full stacktrace when the error comes from the
sysctl invocation, the error message itself is sufficient
In practice, when testing with bazel with macos, sysctl is blocked by
the sandbox, so logging the stacktrace is rather noisy for tests
This gen_statem-based process is responsible for handling concurrency
when feature flags are enabled and synchronized when a cluster is
expanded.
This clarifies and stabilizes the behavior of the feature flag subsystem
w.r.t. situations where e.g. a feature flag migration function takes
time to update data and a new node joins a cluster and synchronizes its
feature flag states with the cluster. There was a chance that the
feature flag was marked as enabled on the joining node, even though the
migration function didn't take care of that node.
With this new feature flags controller, enabling or synchronizing
feature flags blocks and delays any concurrent operations which try to
modify feature flags states too.
This change also clarifies where and when the migration function is
called: it is called at least once on each node who knows the feature
flag and when the state goes from "disabled" to "enabled" on that node.
Note that even if the feature flag is being enabled on a subset of the
nodes (because other nodes already have it enabled), it is marked as
"state_changing" everywhere during the migration. This is to prevent
that a node where it is enabled assumes it is enabled on all nodes who
know the feature flag.
There is a new feature as well: just after a feature flag is enabled,
the migration function is called a second time for any post-enable
actions. The feature flag is marked as enabled between these "enable"
and "post-enable" steps. The success or failure of this "post-enable"
run does not affect the state of the feature flag (i.e. it is ignored).
A new migration function API is introduced to allow more advanced
things. The new API is:
my_migration_function(
#ffcommand{name = ...,
props = ...,
command = enable | post_enable,
extra = #{...}})
The record is defined in `include/feature_flags.hrl`. Here is the
meaning of each field:
* `name` and `props` are the equivalent of the `FeatureName` and
`FeatureProps` arguments of the previous migration function API.
* `command` is basically the same as the previous `Arg` arguments.
* `extra` is map containing context-specific information. For instance, it
contains the list of nodes where the feature flag state changes.
This whole new behavior is behind a new feature flag called
`feature_flags_v2`. If a feature flag uses the new migration function
API, `feature_flags_v2` will be automatically enabled.
If many feature flags are enabled at once (like when a fresh RabbitMQ
node is started for the first time), `feature_flags_v2` will be enabled
first if it is in the list.
Use the `sys_dist` ets table to get distribution port information.
Fixes#4981
Get cluster links stats for TLS dist
Use code from prometheus.erl to get dist links info
This will be used to fixrabbitmq/osiris#78
If a RabbitMQ `advanced.config` file contains the following:
```
{customize_hostname_check, [
{match_fun, public_key:pkix_verify_hostname_match_fun(https)}
]}
```
...`file:consult/1` will fail because it does not evaluate terms in the
file.
The code in `rabbit_consult` was copied from this OTP module:
https://github.com/erlang/otp/blob/master/lib/ssl/src/ssl_dist_sup.erl
...and then modified for our use.
Add Bazel suite
Use the same license as Erlang/OTP, add link to source cc @dumbbell
Add test and ensure value returned matches file:consult/1
Add test data file
Ensure that Funs are converted to binaries before jsx:encode is called
Add a check that customize_hostname_check can be JSON encoded
Ensure that customize_hostname_check and match_fun are filtered out from listener data
When applications accidentally set an unreasonable high value for
the message TTL expiration field, e.g. 6779303336614035452,
before this commit quorum queue and classic queue processes crashed:
```
2022-05-17 13:35:26.488670+00:00 [notice] <0.1000.0> queue 'test' in vhost '/': candidate -> leader in term: 2 machine version: 2
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> crasher:
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> initial call: ra_server_proc:init/1
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> pid: <0.1000.0>
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> registered_name: '%2F_test'
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> exception error: bad argument
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> in function erlang:start_timer/4
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> called as erlang:start_timer(6779303336614035351,<0.1000.0>,
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> {timeout,expire_msgs},
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> [])
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> *** argument 1: exceeds the maximum supported time value
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> in call from gen_statem:loop_timeouts_start/16 (gen_statem.erl, line 2108)
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> ancestors: [<0.999.0>,ra_server_sup_sup,<0.250.0>,ra_systems_sup,ra_sup,
2022-05-17 13:35:26.489492+00:00 [error] <0.1000.0> <0.186.0>]
```
In this commit, we disallow expiry fields higher than 100 years.
This causes the channel to be closed which is better than crashing the
queue process.
This new validation applies to message TTLs and queue expiry.
From the docs of erlang:start_timer:
"The absolute point in time, the timer is set to expire on, must be in the interval
[erlang:convert_time_unit(erlang:system_info(start_time), native, millisecond),
erlang:convert_time_unit(erlang:system_info(end_time), native, millisecond)].
If a relative time is specified, the Time value is not allowed to be negative.
end_time:
The last Erlang monotonic time in native time unit that can be represented
internally in the current Erlang runtime system instance.
The time between the start time and the end time is at least a quarter of a millennium."
In particular:
- io_file_handle_open_attempt
- queue_index_journal_write
Neither have proven to be very useful in recent years
and with the move to FHC-less and journal-less v2 index
they will slowly become irrelevant. This should be a
good compromise until we can switch to v2 permanently
or rework the stats module to use counters.
During most of the time the file_handle_cache_stats ets table is
used for writing only.
By enabeling `write_concurrency` on the table we allow different values
to be written concurrently without taking a global lock.
There the only codepath reading from the ets table is run on the
`collect_statistics_interval` interval and reads the whole table.
So we can assume we are not blocking any large amount of concurrent reads.
as an opt-in feature. The goal is to avoid re-importing the definition
from the definition file/directory/source if we know the content
has not changed. Since this feature won't be appropriate for
every environment (sometimes unconditional reimporting is expected),
the feature is opt-in.
This is still a WIP.
This is to address another memory leak on win32 reported here:
https://groups.google.com/g/rabbitmq-users/c/UE-wxXerJl8
"RabbitMQ constant memory increase (binary_alloc) in idle state"
The root cause is the Prometheus plugin making repeated calls to `rabbit_misc:otp_version/0` which then calls `file:read_file/1` and leaks memory on win32.
See https://github.com/erlang/otp/issues/5527 for the report to the Erlang team.
Turn `badmatch` into actual error
This is copied from https://github.com/rabbitmq/rabbitmq-common/pull/349
If a message is sent to only one queue(in most application scenarios), passing through the 'delegate' is meaningless. Otherwise, it increases the delay of the message and the possibility of 'delegate' congestion.
Here are some test data:
node1: Pentium(R) Dual-Core CPU E5300 @ 2.60GHz
node2: Pentium(R) Dual-Core CPU E5300 @ 2.60GHz
Join node1 and node2 to a cluster. Create 100 queues on node2, and start 100 consumers to receive messages from these queues.
Start 100 publishers on node1 to send messages to the queues of node2. Each publisher will send 10k messages at the rate of 100/s(10k/s theoretically in total), and all the messages for all publishers is 1 million.
Before optimisation:
{1,[{msg_time,812312(=<1ms),177922(=<5ms),9507(=<50ms),221(=<500ms),38(=<1000ms),0,0,0,0,1061,1069,0,0}]}
After optimisation:
{1,[{msg_time,902854(=< 1ms),93993(=<5ms),3038(=<50ms),96(=<500ms),19(=<1000ms),0,0,0,0,1049,1060,0,0}]}
Additional information:
Time counted here is the stay time of a message in the cluster, that is, Time(leaving from node2 at) - Time(reaching node1 at).
"812312(=<1ms)" is the number of messages with time consumption less than or equal to 1ms.
Overall, the optimisation is effective.
When we fail to parse name of cipher suite from PROXY protocol
just say that no ssl is used, instead of trying to fill that
with data from connection between proxy and our server.
A user could already enable single-line logging (the `single_line`
option of `logger_formatter` or RabbitMQ internal formatters) from the
configuration file. For example:
log.console.formatter.single_line = on
With this patch, the option can be enabled from the `$RABBITMQ_LOG`
environment variable as well:
make run-broker RABBITMQ_LOG=+single_line
Those environment variables are unset by default. The default values are
set in the `rabbit` application environment and can be configured in the
configuration file. However, the environment variables will take
precedence over them respectively if they are set.
Unlike pg2, pg in Erlang 24 is eventually consistent. So this
reintroduces some of the same kind of locking mirrored_supervisor
used to rely on implicitly via pg2.
Per discussion with @lhoguin.
Closes#3260.
References #3132, #3154.
and assume it is a string-like value ("directory string")
because other values would not make much sense in the
username extraction context.
References #2983.
instead of specific ones since they will vary with the payload
(one of them likely indicates UTF string length).
This is still not perfect because we limit the maximum
allowed length but it works fine with identifiers up to 100
characters long, which should be good enough for this
best effort handling of an abscure SAN type.
References ##2983.
The parser didn't handle literals of the form:
'single-quoted'unquoted'single-quoted-again'"or-even-double-quoted"
In particular, the unquoted parsing assumed that nothing else could
follow it. The testsuite is extended with the issue reporter's case.
While here, improve escaped characters handling. They used to be not
parsed specifically at all.
Fixes#2969.
Note that the type by definition contains arbitrary values. According
to the OTP types, they are triplets that represent effectively
a key/value pair. So we assume the pair is a string that needs a bit
massaging, namely stripping the UTF encoding prefix OTP AnotherName
decoder leaves in.
Kudos to @Thibi2000 for providing an example value.
Closes#2983.
for usability. It is not any different from when a float value
is used and only exists as a counterpart to '{absolute, N}'.
Also nothing changes for rabbitmq.conf users as that format performs
validation and correct value translation.
See #2694, #2965 for background.
Adds WORKSPACE.bazel, BUILD.bazel & *.bzl files for partial build & test with Bazel. Introduces a build-time dependency on https://github.com/rabbitmq/bazel-erlang
In kind version 0.10.0, when creating a 5-node RabbitMQ cluster
with the new parallel PodManagementPolicy, we observed that some
pods were restarted. Their logs included:
```
10:10:03.794 [error]
10:10:03.804 [error] BOOT FAILED
10:10:03.805 [error] ===========
BOOT FAILED
10:10:03.805 [error] ERROR: epmd error for host r1-server-0.r1-nodes.rabbitmq-system: nxdomain (non-existing domain)
10:10:03.805 [error]
===========
ERROR: epmd error for host r1-server-0.r1-nodes.rabbitmq-system: nxdomain (non-existing domain)
10:10:04.806 [error] Supervisor rabbit_prelaunch_sup had child prelaunch started with rabbit_prelaunch:run_prelaunch_first_phase() at undefined exit with reason {epmd_error,"r1-server-0.r1-nodes.rabbitmq-system",nxdomain} in context start_error
10:10:04.806 [error] CRASH REPORT Process <0.152.0> with 0 neighbours exited with reason: {{shutdown,{failed_to_start_child,prelaunch,{epmd_error,"r1-server-0.r1-nodes.rabbitmq-system",nxdomain}}},{rabbit_prelaunch_app,start,[normal,[]]}} in application_master:init/4 line 138
```
Eventually, after some pods restarted up to 2 times, all pods were running and ready.
In kind, we observed that during the first couple of seconds, nslookup was failing as well for that domain
with nxdomain.
It took up to 30 seconds until nslookup succeeded.
With this commit, pods don't need to be restarted when creating a fresh
RabbitMQ cluster.
Lager strips trailing newline characters but OTP logger with the default
formatter adds a newline at the end. To avoid unintentional multi-line log
messages we have to revisit most messages logged.
Some log entries are intentionally multiline, others
are printed to stdout directly: newlines are required there
for sensible formatting.
The configuration remains the same for the end-user. The only exception
is the log root directory: it is now set through the `log_root`
application env. variable in `rabbit`. People using the Cuttlefish-based
configuration file are not affected by this exception.
The main change is how the logging facility is configured. It now
happens in `rabbit_prelaunch_logging`. The `rabbit_lager` module is
removed.
The supported outputs remain the same: the console, text files, the
`amq.rabbitmq.log` exchange and syslog.
The message text format slightly changed: the timestamp is more precise
(now to the microsecond) and the level can be abbreviated to always be
4-character long to align all messages and improve readability. Here is
an example:
2021-03-03 10:22:30.377392+01:00 [dbug] <0.229.0> == Prelaunch DONE ==
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Starting RabbitMQ 3.8.10+115.g071f3fb on Erlang 23.2.5
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Licensed under the MPL 2.0. Website: https://rabbitmq.com
The example above also shows that multiline messages are supported and
each line is prepended with the same prefix (the timestamp, the level
and the Erlang process PID).
JSON is also supported as a message format and now for any outputs.
Indeed, it is possible to use it with e.g. syslog or the exchange. Here
is an example of a JSON-formatted message sent to syslog:
Mar 3 11:23:06 localhost rabbitmq-server[27908] <0.229.0> - {"time":"2021-03-03T11:23:06.998466+01:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","meta":{"domain":"rabbitmq.prelaunch","file":"src/rabbit_prelaunch_logging.erl","gl":"<0.228.0>","line":311,"mfa":["rabbit_prelaunch_logging","configure_logger",1],"pid":"<0.229.0>"}}
For quick testing, the values accepted by the `$RABBITMQ_LOGS`
environment variables were extended:
* `-` still means stdout
* `-stderr` means stderr
* `syslog:` means syslog on localhost
* `exchange:` means logging to `amq.rabbitmq.log`
`$RABBITMQ_LOG` was also extended. It now accepts a `+json` modifier (in
addition to the existing `+color` one). With that modifier, messages are
formatted as JSON intead of plain text.
The `rabbitmqctl rotate_logs` command is deprecated. The reason is
Logger does not expose a function to force log rotation. However, it
will detect when a file was rotated by an external tool.
From a developer point of view, the old `rabbit_log*` API remains
supported, though it is now deprecated. It is implemented as regular
modules: there is no `parse_transform` involved anymore.
In the code, it is recommended to use the new Logger macros. For
instance, `?LOG_INFO(Format, Args)`. If possible, messages should be
augmented with some metadata. For instance (note the map after the
message):
?LOG_NOTICE("Logging: switching to configured handler(s); following "
"messages may not be visible in this log output",
#{domain => ?RMQLOG_DOMAIN_PRELAUNCH}),
Domains in Erlang Logger parlance are the way to categorize messages.
Some predefined domains, matching previous categories, are currently
defined in `rabbit_common/include/logging.hrl` or headers in the
relevant plugins for plugin-specific categories.
At this point, very few messages have been converted from the old
`rabbit_log*` API to the new macros. It can be done gradually when
working on a particular module or logging.
The Erlang builtin console/file handler, `logger_std_h`, has been forked
because it lacks date-based file rotation. The configuration of
date-based rotation is identical to Lager. Once the dust has settled for
this feature, the goal is to submit it upstream for inclusion in Erlang.
The forked module is calld `rabbit_logger_std_h` and is based
`logger_std_h` in Erlang 23.0.
as node names grow.
Prior to this change, direct reply-to consumer channels
were encoded using term_to_binary/1, which means the result
would grow together with node name (since node name
is one of the components of an Erlang pid type).
This means that with long enough hostnames, reply-to
identifiers could overflow the 255 character limit of
message property field type, longstr.
With this change, the encoded value uses a hash of the node name
and then locates the actual node name from a map of
hashes to current cluster members.
In addition, instead of generating non-predictable "secure"
GUIDs the feature now generates "regular" predictable GUIDs
which compensates some of the additional PID pre- and post-processing
outlined above.