HiPE has been deprecated/only partially supported in Erlang 22
and will be removed completely in Erlang 24 next year.
Part of rabbitmq/rabbitmq-server#2392
Helps with troubleshooting hostname resolution behavior
on nodes and locally for CLI tools. This is obviously not meant
to be a replacement for existing tools such as dig, only
a way to quickly spot obvious irregularities, e.g. those
in environments that use custom Erlang inetrc files.
Per discussion @harshac.
The reason is that currently, the repository dispatch event only
triggers the workflow of the target repository's default branch (i.e.
master in our case).
This is ok for now, but this prevents us from using GitHub Actions with
release branches unfortunately.
* It requires a fully booted node, so not generally suitable for a Kubernetes readiness probe.
* It can produce false positives
* It is too intrusive and CPU-intensive to use at scale
* Most operators do not understand what it really does and when they learn about it,
consider it to be too opinionated and intrusive
Time for the One True Health Check™ to retire from duty.
Part of rabbitmq/rabbitmq-cli#426
During the 3.8.4 cycle we have backported `rabbit_env` to v3.8.x.
Instead of messing with env variable prefixing, it tries both
RABBITMQ_{VAR} and {VAR} environment variables. However,
in CLI tools node name currently only picks up RABBITMQ_NODENAME,
so environments where node name has to be explicitly configured
via rabbitmq-env.conf:
NODENAME=rabbit@our.custom.hostname
would not pick this node name up. RABBITMQ_NODENAME had to be added
as a workaround.
With this change the behavior of CLI tools and the server is closer.
Note that this updates a few places which used `Config.get_option/2`
to get a "default node name" which more often than not ended up
being a node prefix ("rabbit"). Those tests had to be updated
to use `Config.default/1`.
Closes#421.
References c8e766dec7, 8a5ab87038.
It prints RabbitMQ-specific environment variables that
are set on the target node. Can be used to inspect env variable-based
configuration without access to the target host.
Fail unsuccessful HTTP requests, go silent
This will be used to trigger rabbitmq-server tests, because there is a
new commit in a rabbitmq-cli release branch.
cc @dumbbell
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Before this commit, if the product name & version were not overridden,
they would default to the base product name & version ("RabbitMQ" + its
version).
Now, if they are not set/overridden, their corresponding lines are not
added to the output of `status`. Therefore, `rabbitmqctl status` on a
regular RabbitMQ will output the same thing as before.
@michaelklishin:
> Some test cases cannot be run in parallel since they rely on target
> node state as a shared resource.
If/when we want to improve this:
@michaelklishin:
> What we can do is make this configurable. Some tests can run in
> parallel, e.g. all rabbitmq-diagnostics tests that do not trigger/report
> alarms. But this only would be useful for interactive runs or if we
> split all tests into groups and allow parallel runs for some groups.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
This changes the mix test default which is 2 x number of CPUs.
We only append --trace if VERBOSE_TEST is true (enabled by default) and
don't change the number of max-cases. I'm not sure why we coupled the
number of max cases to verbose testing, they don't seem related to me.
Since @dfedotov is no longer with us, I can't ask him, so just making
the change which feels right.
This is in response to GitHub Actions failing consistently with:
== Compilation error in file test/ctl/list_connections_command_test.exs ==
** (exit) exited in: :gen_server.call(#PID<32153.2990.0>, {:read_cache, -576460752303386991}, :infinity)
** (EXIT) no connection to rabbit_ctl_36@fv-az56
##[error] (stdlib 3.8) gen_server.erl:223: :gen_server.call/3
##[error] (stdlib 3.8) erl_eval.erl:680: :erl_eval.do_apply/6
##[error] (stdlib 3.8) erl_eval.erl:888: :erl_eval.expr_list/6
##[error] (stdlib 3.8) erl_eval.erl:411: :erl_eval.expr/5
##[error] (elixir 1.10.2) lib/kernel/parallel_compiler.ex:396: Kernel.ParallelCompiler.require_file/2
##[error] (elixir 1.10.2) lib/kernel/parallel_compiler.ex:306: anonymous fn/4 in Kernel.ParallelCompiler.spawn_workers/7
##[error]Makefile:114: recipe for target 'tests' failed
Our assumption is that limiting the number of test parallelism will make
these failures go away.
cc @michaelklishin @dumbbell
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
The readiness is similar to that of 'rabbitmq-diagnostics check_if_node_is_quorum_critical'
but this command awaits for it up to --timeout seconds.
While at it, refactor DefautOutput to detect and support JSON formatting
of most basic return values suc has :ok or {:error, map}.
Part of #408.
Compared to the regular configuration we use, it modifies it heavily to:
* start a background RabbitMQ node with the federation and STOMP plugins
enabled
* run the specific test target of the CLI
* test against the oldest and latest versions of Elixir
... down from 10% of the configured timeout.
This has a significant impact on the time it takes to start RabbitMQ in
all our testsuites. rabbitmq-ct-helpers sets a wait timeout of 180
seconds. Thus before this patch, the wait loop would sleep for 18
seconds between each check. Given it takes about 1.5 seconds to start
RabbitMQ, a lot of time is wasted here.
Here are some numbers after running testsuites with and without this
patch:
* `make ct-fast` in rabbitmq-server: 8m15s down to 4m58s
* `make ct` in rabbitmq-mqtt: 9m23s down to 6m43s
* `make ct` in rabbitmq-stomp: 4m31s down to 2m04s
[#171535484]
They are printed in addition to the underlying RabbitMQ version.
If it is unavailable, for instance because the node is old enough to
not export the product info, we use "RabbitMQ" as the name and the
underlying RabbitMQ version as the version.
[#171467799]
Now that the node removes its PID file on exit, we need to read it
before stopping the node.
Otherwise, if the PID file was already removed when
`OsPid.read_pid_from_file()` is called, it will wait for the PID file to
appear again in an infinite loop.
This was found when testing the RHEL 6 package on CentOS 6 in CI.
The context is either initialized from the CLI's process environment or
from the remote node's own context.
This is required to e.g. find plugins or Mnesia directory.
In `code_path`, we don't append `ebin` anymore to the code path because
the `rabbit` application is now packaged as an .ez archive like plugins.
This simplifies the overall layout of the project.
They are not dependencies of the CLI, but dependencies of rabbit_common.
Unfortunately, mix(1) doesn't embed them in the final escripts. In fact,
it doesn't embed any dependencies of rabbit_common, but the CLI probably
doesn't call the code which would trigger a crash.
Before this patch, the command would wait for the PID file to appear,
then it would read the PID, check if that process exists and terminates
with `no_process_running` if it wasn't.
This was a problem if the PID file was still there with stale data. The
`wait` command would fail even though another node is starting but
hasn't had a chance to write its PID yet.
Now, the command will read the PID, verify the system process and try to
ping the Erlang node in a loop with the specified timeout. This helps
if a node is restarted but the new PID is not yet written to the file.
Note that it's a slight change in behavior w.r.t crashed nodes though:
if a node crashes (or already crashed), the command will wait until
timeout. Before the command would have exited almost immediately.
Some streaming commands with a duration argument can send the empty
string as the output (along with a finishing marker). This case was not
handled properly and would result in a stack trace when the command
returned once the duration has elapsed.
... now we depend on Elixir 1.7+. We can use the new syntax or, in this
case, simply call `Exception.format_stacktrace()` without any argument:
it will take care of querying the stacktrace.
This fixes a warning reported by elixirc.
It makes a lot of assumptions about Lager's log flush
timing and can be tripped by the peak rate protection
mechanism. This test module has a high rate of false
positives on Concourse.
There is another test that asserts over a "folded" stream, so
code coverage is kept about the same.
Per discussion with @lukebakken.
It serves no purpose and to make scripting with stream
redirection work we had to make validation changes that make
that flag irrelevant and even confusing.
The only downside of this behavior is that something like
rabbitmqctl add_user --silent "a-username"
(without a password or redirected stream, with suppressed output)
would "hang" waiting for stdin input. If --silent is omitted
there would be an input prompt, making it clearer what's going on.
Closes#365 with a different behavior from the originally suggested.
This reverts commit 0a68e5944a.
More QQ operations have default timeout of 5s. This has to be addressed
in a more fundamental way (or not at all unless we have evidence of
false positives).