In order to retain deterministic results of state machine applications
during upgrades we need to make the stream coordinator versioned such
that we only use the new logic once the stream coordinator switches to
machine version 1.
The list consists of candidates which is a tuple {node, tail}, and the tail is made of {epoch, offset}.
While the 'select_leader' think the tail is made of {offset, epoch}.
Suppose there are two candidates:
[{node1,{1,100}},{node2,{2,99}}]
It selects node1 as the leader instead of node2 with larger epoch.
Add an item to the configuration file(/etc/rabbitmq/rabbitmq.config):
{kernel, [{inet_dist_use_interface, {8193,291,0,0,0,0,0,1}}]}
Use the netstat command to check the IP address of the distribution port(25672):
netstat -anp | grep 25672
tcp6 0 0 2001:123::1:25672 :::* LISTEN 2075/beam.smp
However, 'rabbitmqctl status' shows:
...
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
...
This is to address another memory leak on win32 reported here:
https://groups.google.com/g/rabbitmq-users/c/UE-wxXerJl8
"RabbitMQ constant memory increase (binary_alloc) in idle state"
The root cause is the Prometheus plugin making repeated calls to `rabbit_misc:otp_version/0` which then calls `file:read_file/1` and leaks memory on win32.
See https://github.com/erlang/otp/issues/5527 for the report to the Erlang team.
Turn `badmatch` into actual error
Related to VESC-1015
* Remove `infinity` timeouts
* Improve free disk space retrieval on win32
Run commands with a timeout
This PR fixes an issue I observed while reproducing VESC-1015 on Windows
10. Within an hour or so of running a 3-node cluster that has health
checks being run against it, one or more nodes' memory use would spike.
I would see that the rabbit_disk_monitor process is stuck executing
os:cmd to retrieve free disk space information. Thus, all
gen_server:call calls to the process would never return, especially
since they used an infinity timeout.
Do something with timeout
Fix unit_disk_monitor_mocks_SUITE
If a delete happens shortly after a declare or other stream change
there is a chance the mnesia update process that is spawned will crash
when the amqqueue record cannot be recovered from durable storage.
This isn't harmful but does pollute the logs.
For booleans, we can prefer the operator policy value
unconditionally, without any safety implications.
Per discussion with @binarin @pjk25
(cherry picked from commit 6edb7396fd)
A channel that first sends a mandatory publish before enabling
confirms mode may not receive confirms for messages published
after that. This is because the publish_seqno was increased
also for mandatory publishes even if confirms were disabled.
But the mandatory feature has nothing to do with publish_seqno.
The issue exists since at least
38e5b687de
The test case introduced focuses for multiple=false. The issue
also exists for multiple=true but it has a different impact:
sending multiple=true,delivery_tag=2 results in both messages
1 and 2 being acked, even if message 2 doesn't exist as far
as the client is concerned. If the message does exist
it might get confirmed earlier than it should have been. The
issue is a bigger problem the more mandatory messages were
sent before enabling confirms mode.