rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
Loïc Hoguin	cc8892df3b	CQv2 prop: Tweak command weights	2022-02-09 14:24:39 +01:00
Loïc Hoguin	ae51c6fbaf	CQv2 prop: Refactor duplicate code Now that the test suite is solid some of the duplicated code can be merged together.	2022-02-09 12:46:02 +01:00
David Ansari	d6a81a0bbc	Add lqueue:get/2 as less garbage alternative to lqueue:peek/1	2022-02-04 17:43:16 +01:00
David Ansari	6b9e501516	Add lqueue:get/1 and lqueue:get_r/1	2022-02-04 17:43:03 +01:00
David Ansari	676d0dfd85	Allocate 2 bytes less per queue operation by changing the lqueue state. This results in less memory usage and hence less garbage collection when queue functions are called many thousand times per second. The subset of functions lqueue supports are copied from OTP's queue module and extended to update the length. lqueue's foldr/3 is deleted since there is no usage. lqueue's foldl/3 is renamed to fold/3 to match OTP's queue naming. lqueue accepts both old and new state, but returns only new state.	2022-02-04 17:42:42 +01:00
Loïc Hoguin	4a7cce831c	CQv2 prop: Clean restarts don't drop transient messages	2022-02-04 15:22:25 +01:00
Loïc Hoguin	9e59c6a698	CQv2 prop: Properly handle messages sent before enabling confirms	2022-02-04 14:34:31 +01:00
Loïc Hoguin	c6092cacaa	CQv2 prop: Keep track of confirms for more accuracy We now know that we will not be expecting transient messages after clean restarts, and we can preserve the order of persistent messages that were confirmed.	2022-02-04 10:45:01 +01:00
Loïc Hoguin	e042660393	CQv2 prop: Add commands to discard one/many messages Uses basic.reject with requeue=false.	2022-02-03 12:28:32 +01:00
Loïc Hoguin	b77bc8e67a	CQv2 prop: Be more specific: reject is really requeue	2022-02-03 11:27:47 +01:00
Loïc Hoguin	36b2466ef7	CQv2 prop: Use a single limiter pid per test case Instead of one per rabbit_amqqueue:basic_get.	2022-02-03 10:42:56 +01:00
Loïc Hoguin	fd8380ebdc	CQv2 prop: Don't make cmd_publish_msg use confirms The test suite was not checking them anyway, and channels are a better fit for this kind of test.	2022-02-03 10:21:40 +01:00
Loïc Hoguin	c3ded33872	CQv2 prop: Ensure we can fill up entire segments This is done by changing cmd_channel_publish_many to send up to 512 messages and reducing the segment_entry_count of both v1 and v2 to 512.	2022-02-03 09:01:23 +01:00
Loïc Hoguin	6916c1bb7e	CQv2 prop: Send transient messages as well	2022-02-02 16:40:57 +01:00
Loïc Hoguin	e93e0a4c57	CQv2 prop: Only use the "many" commands for 2+ messages	2022-02-02 16:24:09 +01:00
Loïc Hoguin	02d0cbd1de	CQv2 property suite: Rename to classic_queue_prop_SUITE	2022-02-02 15:11:06 +01:00
Loïc Hoguin	a76e3294fd	CQv2 property suite: Add acks/rejects out of order Also a command cmd_channel_publish_many to make the scenarios more likely.	2022-02-02 14:56:01 +01:00
Loïc Hoguin	895c18e563	CQv2 property suite: Add command to disable v2 CRC32 check	2022-02-01 15:41:58 +01:00
Loïc Hoguin	52d577dd8e	CQv2 property suite: Enable cmd_purge again	2022-02-01 13:27:18 +01:00
Loïc Hoguin	6197f54fa5	CQv2 property suite: Fix vhost restart cancelling consumers	2022-02-01 13:26:38 +01:00
Loïc Hoguin	02bc9f9a32	CQv2 property suite: Enable cmd_restart_queue_dirty	2022-02-01 11:00:49 +01:00
Loïc Hoguin	ab559b4714	CQv2 property suite: Register a default consumer This gets around race condition related problems caused by the client sending a basic.cancel while the server sends a basic.cancel itself. This also solves a similar issue where the server delivers a message that I have not investigated thoroughly. Details about the basic.cancel race condition issue can be found in https://github.com/rabbitmq/rabbitmq-server/issues/4070	2022-02-01 10:54:21 +01:00
Loïc Hoguin	af24b4b4c4	CQv2 property suite: Handle shutdown exits on teardown	2022-02-01 10:52:31 +01:00
Michael Klishin	b0a60cc779	Don't skip definition import if target node count has not been reached Otherwise delayed definition import [of queues and bindings] would never be run in scenario where target node count is greater than one.	2022-01-31 10:52:49 +03:00
Michael Klishin	3837b252b0	New tests For definitions.skip_if_unchanged	2022-01-30 00:32:53 +03:00
Loïc Hoguin	8c9f417959	Account for state differences in channel_operation_timeout_test_queue The state has differences across versions and the test was no longer compatible with previous versions in mixed version testing.	2022-01-28 14:10:56 +01:00
Loïc Hoguin	a74f0a354e	CQ property suite: Fix cmd_basic_get_msg I mistakenly only edited cmd_channel_basic_get before. This brings the other command up to speed.	2022-01-25 12:47:34 +01:00
Loïc Hoguin	777e0fd6ea	Fix a consistency issue in the v1 index after dirty restarts The issue was found via classic_queue_SUITE using a currently disabled command that kills the queue. A function has also been added to convert a set of commands given by PropEr as a result to Erlang code that can be put in the `do_manual` function. Some tips have also been added. The Erlang code that could reproduce the issue follows. The issue never needed a loop on my machine for what it's worth, but it might on other machines. Commands that were not necessary were commented out. The timer:sleep(1) calls were added as the issue did not seem to trigger without them. do_manual(Config) -> St0 = #cq{name=prop_classic_queue_v1, mode=lazy, version=1, config=minimal_config(Config)}, Res1 = cmd_setup_queue(St0), St1 = St0#cq{amq=Res1}, do_manual_loop(St1). do_manual_loop(St1) -> % Res2 = cmd_set_mode(St1, lazy), % true = postcondition(St1, {call, undefined, cmd_set_mode, [St1, lazy]}, Res2), % St2 = next_state(St1, Res2, {call, undefined, cmd_set_mode, [St1, lazy]}), St2 = St1, timer:sleep(1), % Res3 = cmd_basic_get_msg(St2), % true = postcondition(St2, {call, undefined, cmd_basic_get_msg, [St2]}, Res3), % St3 = next_state(St2, Res3, {call, undefined, cmd_basic_get_msg, [St2]}), St3 = St2, timer:sleep(1), Res4 = cmd_channel_open(St3), true = postcondition(St3, {call, undefined, cmd_channel_open, [St3]}, Res4), St4 = next_state(St3, Res4, {call, undefined, cmd_channel_open, [St3]}), timer:sleep(1), Res5 = cmd_channel_publish(St4, Res4, 22, false, undefined), true = postcondition(St4, {call, undefined, cmd_channel_publish, [St4, Res4, 22, false, undefined]}, Res5), St5 = next_state(St4, Res5, {call, undefined, cmd_channel_publish, [St4, Res4, 22, false, undefined]}), timer:sleep(1), Res6 = cmd_restart_vhost_clean(St5), true = postcondition(St5, {call, undefined, cmd_restart_vhost_clean, [St5]}, Res6), St6 = next_state(St5, Res6, {call, undefined, cmd_restart_vhost_clean, [St5]}), timer:sleep(1), % Res7 = cmd_channel_publish(St6, Res4, 13, false, 71), % true = postcondition(St6, {call, undefined, cmd_channel_publish, [St6, Res4, 13, false, 71]}, Res7), % St7 = next_state(St6, Res7, {call, undefined, cmd_channel_publish, [St6, Res4, 13, false, 71]}), St7 = St6, timer:sleep(1), % Res8 = cmd_channel_open(St7), % true = postcondition(St7, {call, undefined, cmd_channel_open, [St7]}, Res8), % St8 = next_state(St7, Res8, {call, undefined, cmd_channel_open, [St7]}), St8 = St7, timer:sleep(1), % Res9 = cmd_channel_close(Res8), % true = postcondition(St8, {call, undefined, cmd_channel_close, [Res8]}, Res9), % St9 = next_state(St8, Res9, {call, undefined, cmd_channel_close, [Res8]}), St9 = St8, timer:sleep(1), % Res10 = cmd_channel_close(Res4), % true = postcondition(St9, {call, undefined, cmd_channel_close, [Res4]}, Res10), % St10 = next_state(St9, Res10, {call, undefined, cmd_channel_close, [Res4]}), St10 = St9, timer:sleep(1), Res11 = cmd_restart_queue_dirty(St10), true = postcondition(St10, {call, undefined, cmd_restart_queue_dirty, [St10]}, Res11), St11 = next_state(St10, Res11, {call, undefined, cmd_restart_queue_dirty, [St10]}), timer:sleep(1), Res12 = cmd_restart_vhost_clean(St11), true = postcondition(St11, {call, undefined, cmd_restart_vhost_clean, [St11]}, Res12), St12 = next_state(St11, Res12, {call, undefined, cmd_restart_vhost_clean, [St11]}), timer:sleep(1), % Res13 = cmd_set_version(St12, 1), % true = postcondition(St12, {call, undefined, cmd_set_version, [St12, 1]}, Res13), % St13 = next_state(St12, Res13, {call, undefined, cmd_set_version, [St12, 1]}), St13 = St12, timer:sleep(1), Res14 = cmd_restart_vhost_clean(St13), true = postcondition(St13, {call, undefined, cmd_restart_vhost_clean, [St13]}, Res14), St14 = next_state(St13, Res14, {call, undefined, cmd_restart_vhost_clean, [St13]}), timer:sleep(1), logger:error("loop~n"), do_manual_loop(St14).	2022-01-25 11:23:24 +01:00
Luke Bakken	c352525e0c	Rename `variable_queue_default_version` to `classic_queue_default_version`	2022-01-25 11:23:23 +01:00
Luke Bakken	5da7396bf3	Add rabbit.variable_queue_default_version to the cuttlefish schema	2022-01-25 11:23:23 +01:00
Loïc Hoguin	087045e319	CQ property suite: Temporarily disable dirty queue restart	2022-01-25 11:23:23 +01:00
Loïc Hoguin	61f2c972eb	CQ property suite: Fix race condition after closing consumers	2022-01-25 11:23:22 +01:00
Loïc Hoguin	08d78f0885	CQ property suite: Add queue crash The added command resulted in the following additional changes: - Allow more queues restart for test purposes - Fix crashes that may happen following a queue crash+restart - Fix reading from the index after crash+restart where messages may be both in q1 and read from the index - Fix a race condition when stopping a node while a queue crash+restart: make lookup of message store use pg	2022-01-25 11:23:22 +01:00
Loïc Hoguin	673bdecea2	CQ property suite: Add vhost restart This helps us test cases where the queue restarts cleanly. Because it is not completely deterministic, there is some clever handling of messages to accept messages that were acked by the client but the server didn't ack before the restart, and messages published without confirms that never made it to the server before it restarted. There is potential to improve confirms handling for that scenario but that is left as an exercise for a later time.	2022-01-25 11:23:22 +01:00
Loïc Hoguin	678ae3c83a	Cleanup the CQ property suite	2022-01-25 11:23:21 +01:00
Loïc Hoguin	eb69bc04ff	Set backtrace_depth to 16 in CQ property suite I had this locally for the longest time, it's time to commit it. It helps figure out exactly what triggers crashes in the logs.	2022-01-25 11:23:21 +01:00
Loïc Hoguin	8cceeb1248	CQ property suite: queue:all/2 and queue:delete/2 not in OTP 23	2022-01-25 11:23:21 +01:00
Loïc Hoguin	3587c1756d	CQ property suite: queue:fold/3 not available in OTP 23	2022-01-25 11:23:21 +01:00
Loïc Hoguin	16f1843725	CQ property suite: Add publisher confirms	2022-01-25 11:23:20 +01:00
Loïc Hoguin	47f1198cb1	CQ property: test messages with expiration	2022-01-25 11:23:20 +01:00
Loïc Hoguin	a32e2f053f	CQ property: Set "mandatory" with channel publish too	2022-01-25 11:23:20 +01:00
Loïc Hoguin	3de6b8a73d	Tweak command weights in CQ property suite	2022-01-25 11:23:20 +01:00
Loïc Hoguin	59ca114b61	Add channel_receive_and_reject to CQ property suite	2022-01-25 11:23:19 +01:00
Loïc Hoguin	a357bd4fa8	Add channel_receive_and_ack to CQ property suite Also improves queue name handling (now has a random element that lets us do shrinking properly) and clean up channels when tearing down a queue.	2022-01-25 11:23:19 +01:00
Loïc Hoguin	bc23a05d60	Add basic.cancel to CQ property test suite	2022-01-25 11:23:19 +01:00
Loïc Hoguin	aafd4c6e14	Add channel consume to the CQ property suite The messages are not being received/acked at the moment.	2022-01-25 11:23:18 +01:00
Loïc Hoguin	44fd112e6d	Implement resuming v2->v1 conversion during dirty recovery	2022-01-25 11:23:16 +01:00
Loïc Hoguin	390bffb4cd	Fix classic_queue_SUITE for OTP < 24 It doesn't have rand:bytes/1 so instead we will be calling crypto:strong_rand_bytes/1 until such a time comes that we can drop it.	2022-01-25 11:23:15 +01:00
Loïc Hoguin	92d95bf5c7	Fix unused vars warning in classic_queue_SUITE	2022-01-25 11:23:14 +01:00
Loïc Hoguin	469788a820	Ensure index files with holes get removed On dirty recovery the count in the segment file was already accurate. It was not accurate otherwise as it assumed that all messages would be written to the index, which is not the case in the current implementation.	2022-01-25 11:23:14 +01:00
Loïc Hoguin	e5f51e960c	Add missing callback in channel_operation_timeout_test_queue	2022-01-25 11:23:14 +01:00
Loïc Hoguin	467309418c	Reenable some checks in backing_queue_SUITE	2022-01-25 11:23:13 +01:00
Loïc Hoguin	ffe3be34d1	Update channel_operation_timeout_test_queue	2022-01-25 11:23:13 +01:00
Loïc Hoguin	4231143900	Remove commented-out code, todos, unused _Vars	2022-01-25 11:23:12 +01:00
Loïc Hoguin	c63a75ca66	Reenable internal publish/basic_get	2022-01-25 11:23:11 +01:00
Loïc Hoguin	0f9d36a73b	Test using concurrent channels Queue purge and direct publish/get no longer work for the time being as a result.	2022-01-25 11:23:11 +01:00
Loïc Hoguin	216025f192	Test queue purge	2022-01-25 11:23:11 +01:00
Loïc Hoguin	94cb595b50	Set mandatory/confirm flags in proper test suite	2022-01-25 11:23:11 +01:00
Loïc Hoguin	3595cbd34d	Do a basic_get even when the queue is empty	2022-01-25 11:23:11 +01:00
Loïc Hoguin	dff6e2ad82	Test changing both mode and version at the same time	2022-01-25 11:23:10 +01:00
Loïc Hoguin	97a7e8128c	Tweak the property based test suite efficacy Removed the now pointless command checking for the process liveness. Increased the number of checks to 500 (x5 the default). Added weights for the commands: publish/get at 900 and set mode/version at 100.	2022-01-25 11:23:10 +01:00
Loïc Hoguin	9f15f86252	CQ version switch via policies + proper test for this	2022-01-25 11:23:10 +01:00
Loïc Hoguin	006996014c	Small cleanup of CQ property suite	2022-01-25 11:23:10 +01:00
Loïc Hoguin	3ac5a678fc	Switch queue mode	2022-01-25 11:23:10 +01:00
Loïc Hoguin	fcaefd786b	Test all combinations of classic/lazy v1/v2 Confirmed manually that the right mode/version are picked.	2022-01-25 11:23:10 +01:00
Loïc Hoguin	d7925af1eb	Initial classic queue property based suite Much remains to be added but there's some publish/basic_get going on now. It is starting to look good.	2022-01-25 11:23:09 +01:00
Loïc Hoguin	3fc1eb14de	Fix remaining tests for CQ v1	2022-01-25 11:23:09 +01:00
Loïc Hoguin	c4672b6f2c	Test both indexes	2022-01-25 11:23:09 +01:00
Loïc Hoguin	6dfe6a7be8	Test both CQ v1 and v2	2022-01-25 11:23:09 +01:00
Loïc Hoguin	ad67f787ab	Reenable embed 0/1024 groups and fix embed 0 recovery	2022-01-25 11:23:06 +01:00
Loïc Hoguin	d1b8f623dc	Fix channel_operation_timeout test The records have to be kept in sync.	2022-01-25 11:23:05 +01:00
Loïc Hoguin	2473ff7328	Reenable some tests that were commented out	2022-01-25 11:23:05 +01:00
Loïc Hoguin	b0b9b46313	Fix remaining tests	2022-01-25 11:23:04 +01:00
Loïc Hoguin	c02de4d252	Some cleanup and fix most tests Still need to improve recovery and do some sort of check in the store so we know the file isn't corrupted.	2022-01-25 11:23:04 +01:00
Loïc Hoguin	fc9846d01d	Fix obvious mistakes in previous commit	2022-01-25 11:23:04 +01:00
Loïc Hoguin	33fada8847	Track delivers per-queue rather than per-message Because queues deliver messages sequentially we do not need to keep track of delivers per message, we just need to keep track of the highest message that was delivered, via its seq_id(). This allows us to avoid updating the index and storing data unnecessarily and can help simplify the code (not seen in this WIP commit because the code was left there or commented out for the time being). Includes a few small bug fixes.	2022-01-25 11:23:03 +01:00
Loïc Hoguin	b439d2e6bb	Per-queue classic queue store This currently works both with confirms and not. It currently always writes to the per-queue store, it would be good to write fan-out messages to the shared store though. It would be good to remove the usage of MsgId except when the shared store is needed.	2022-01-25 11:23:02 +01:00
Loïc Hoguin	cf080b9937	Add file_handle_cache FD reservations	2022-01-25 11:23:02 +01:00
Loïc Hoguin	0102191e2b	Rename to rabbit_classic_queue_index_v2	2022-01-25 11:23:00 +01:00
Loïc Hoguin	0f431876f2	No longer tests with different embed settings Since messages are no longer embedded those settings are ignored.	2022-01-25 11:22:58 +01:00
Loïc Hoguin	98f64f2fa8	Replace classic queue index with a modern implementation	2022-01-25 11:22:56 +01:00
Philip Kuryloski	efcd881658	Use rules_erlang v2 bazel-erlang has been renamed rules_erlang. v2 is a substantial refactor that brings Windows support. While this alone isn't enough to run all rabbitmq-server suites on windows, one can at least now start the broker (bazel run broker) and run the tests that do not start a background broker process	2022-01-18 13:43:46 +01:00
Karl Nilsson	9a5d0f9d85	Make stream coodinator machine versioned In order to retain deterministic results of state machine applications during upgrades we need to make the stream coordinator versioned such that we only use the new logic once the stream coordinator switches to machine version 1.	2022-01-07 12:11:11 +00:00
dcorbacho	0bd8d41b72	Skip new import testcase on mixed environments	2022-01-03 17:37:06 +01:00
Michael Klishin	19ae35aa14	#3925 follow-up: don't include Erlang client headers	2021-12-28 01:24:32 +03:00
Michael Klishin	b569ab5d74	Rename two newly introduced test modules	2021-12-28 00:35:55 +03:00
dcorbacho	c88605aab4	Import definitions: support user limits	2021-12-26 04:32:00 +03:00
Luke Bakken	d1496a2c7c	Fix tests	2021-12-26 04:32:00 +03:00
Luke Bakken	043641c99f	Use protected ets so that data can be read quickly	2021-12-26 04:31:59 +03:00
Thuan Duong Ba	dc6fb24761	minor fix on condition to stop batching when total batch size is large	2021-12-20 17:39:06 -08:00
Thuan Duong Ba	1ab485b44c	minor update for batching messages when syncthroughput is 0	2021-12-20 17:39:06 -08:00
Thuan Duong Ba	157bffa332	Support configure max sync throughput in CMQs	2021-12-20 17:39:06 -08:00
polaris-alioth	6431584a10	Prevent creating unnamed policy when loading definition	2021-12-19 12:52:26 +08:00
Philip Kuryloski	249e8c853c	Adjust the way rabbit_fifo.hrl is referenced in rabbit_fifo_SUITE For erlang_ls convenience	2021-12-16 16:41:15 +01:00
Michael Klishin	ebd79836c1	Revisit operator policy merging rules for boolean fields For booleans, we can prefer the operator policy value unconditionally, without any safety implications. Per discussion with @binarin @pjk25 (cherry picked from commit `6edb7396fd`)	2021-12-10 19:48:16 +00:00
Loïc Hoguin	1b0eb9a4a3	Fix case where confirms may not be sent A channel that first sends a mandatory publish before enabling confirms mode may not receive confirms for messages published after that. This is because the publish_seqno was increased also for mandatory publishes even if confirms were disabled. But the mandatory feature has nothing to do with publish_seqno. The issue exists since at least `38e5b687de` The test case introduced focuses for multiple=false. The issue also exists for multiple=true but it has a different impact: sending multiple=true,delivery_tag=2 results in both messages 1 and 2 being acked, even if message 2 doesn't exist as far as the client is concerned. If the message does exist it might get confirmed earlier than it should have been. The issue is a bigger problem the more mandatory messages were sent before enabling confirms mode.	2021-12-08 15:53:47 +01:00
Luke Bakken	9ff201c3ab	Remove flaky assertion Thanks @kjnilsson	2021-12-01 06:57:25 -08:00
dcorbacho	5e9664f9e7	Query total number of messages on stream leader on queue.declare	2021-11-30 15:09:30 +01:00
David Ansari	45f69f8829	Add missing Ra commands to the log Before this commit, the tests were not including any settle, return, or discard Ra commands. Do not pattern match against 'ra_event' because nowadays: _Opts = [local, ra_event]	2021-11-26 16:16:45 +01:00
Michael Klishin	4f09fd109c	quorum_queue_SUITE: bump some timeouts	2021-11-24 18:04:35 +03:00
Michael Klishin	6a08e143e9	quorum_queue_SUITE: drop a debug line	2021-11-24 16:47:20 +03:00
Luke Bakken	6d545447b9	Fix quorum queue crash during consumer cancel with return Fixes #3729	2021-11-23 08:59:47 -08:00
Michael Klishin	e22e667a10	Do not count unroutable message in global totals	2021-11-23 16:37:46 +03:00
Luke Bakken	6aaf7ec597	Merge pull request #3740 from rabbitmq/rabbitmq-server-3739 Distribution listener settings support in rabbitmq.conf	2021-11-16 06:36:48 -08:00
Michael Klishin	8a30cf1c86	Distribution listener settings support in rabbitmq.conf * distribution.listener.interface * distribution.listener.port_range.min * distribution.listener.port_range.max Closes #3739	2021-11-16 16:37:28 +03:00
Karl Nilsson	bc7b339e7a	Stream coordinator: only update amqqueue record if stream id matches From the coordinator's POV each stream has a unique id consisting of the vhost, queuename and a high resolution timestamp even if several stream ids relate to the same queue record. When performing the mnesia update the coordinator now checks that the current stream id matches that of the update_mnesia action and does not change the queue record if the stream id is not the same. This should avoid "old" incarnations of a stream queue updating newer ones with incorrect information.	2021-11-16 12:32:33 +00:00
Karl Nilsson	1c6e45257d	QQ: set better timeouts for commands Refactor how the single active consumer check is performed when consuming. Improve timeouts in rabbit_fifo_client.	2021-11-08 11:07:41 +00:00
Michael Klishin	686dccf410	Introduce a target cluster size hint setting This is meant to be used by deployment tools, core features and plugins that expect a certain minimum number of cluster nodes to be present. For example, certain setup steps in distributed plugins might require at least three nodes to be available. This is just a hint, not an enforced requirement. The default value is 1 so that for single node clusters, there would be no behavior changes.	2021-11-03 08:42:58 +00:00
Karl Nilsson	691de2bea4	Take all clustered nodes into account when declaring stream. Deriving a max-cluster-size only from running nodes would create situations where in a three-node with only two nodes running cluster it would select an non-running node as follower.	2021-10-18 15:44:53 +01:00
Karl Nilsson	5520c6cafe	Stream queue: handle unsupported header value types As AMQP 0.9.1 headers are translated into AMQP 1.0 application properties they are not able to contain complex values such as arrays or tables. RabbitMQ federation does use array and table values so to avoid crashing when delivering a federated message to a stream queue we drop them. These header values should be considered internal however so dropping them before a final queue deliver should not be a huge problem.	2021-10-13 10:27:00 +01:00
Philip Kuryloski	9c9fb7ffb0	Shard cluster_management_SUITE by testcase to better manage timeouts The suite level timeout the .erl I've learned is actually per case. By sharding bu testcase, we can better match the common test level and bazel level timeouts, such that we can get logs from remote test run failures.	2021-09-30 10:38:39 +02:00
Philip Kuryloski	860653c97a	Adjust the clustering_management_SUITE timeout at the ct level Previously the bazel timeout and common test timeout were equal, which meant that in practice the bazel timeout was often reached first, in which case we don't receive the test logs	2021-09-23 13:55:18 +02:00
Philip Kuryloski	7dc0c29227	Use only 3 nodes for feature_flags_with_unpriveleged_user_SUITE The test does not appear reliable when it runs in Github actions. This is currently the only test that does so. Other tests run of BuildBuddy workers.	2021-09-22 17:22:49 +02:00
Philip Kuryloski	6e6279eb2b	Reduce a test timeout The original value of 15 minutes was inherited from a larger suite. 5 should be sufficient, as a passing run is typically around 2 minutes.	2021-09-21 10:16:38 +02:00
Karl Nilsson	eaa216da82	QQ: emit release cursors after consumer cancel If this is not done apps that consume/cancel from empty queues in a loop will grow the raft log in an unbounded manner. This could also be the case for the garbage_collect command.	2021-09-17 17:09:30 +01:00
Karl Nilsson	5779059bd5	QQ: fix memory leak when cancelling consumer If the queue is empty when a consumer is cancelled it would leave the consumer id inside the service queue. If an application subscribes/unsubscibes in a loop from an empty queue this would cause the service queue to never be cleared up. NB: whenever we make a change to how the quorum queue state machien is calculated we need to consider how this effects determinism as during an upgrade different members may calculate a different service queue state. In this case it should be ok as they will eventually converge on the same state once all "dead" consumer ids have been removed from the queue. In any case it should not affect how messages are assigned to consumers.	2021-09-17 14:53:33 +01:00
Philip Kuryloski	eea99e1cd5	Split the feature_flags_SUITE into two parts for CI/Bazel Two testcases in the original suite fail if the test is run as the root user. Currently under remote execution with bazel this is the only working option. There is a workaround in place, but the entire suite when run that way takes around 12 minutes. This splits the suite so that the minimal set of cases is executed using the slower workaround.	2021-09-17 11:08:48 +02:00
Michal Kuratczyk	624767281f	Enable metrics collection in run_tests Proposed `min-masters` implementation relies on metrics so they need to be collected during queue_master_location tests.	2021-09-10 14:51:11 +02:00
Gerhard Lazu	6a1faa6fd6	Keep checking that replica recovered in rabbit_stream_queue Rather than sleeping for 6 seconds, we want to check that replica recovered multiple times within 30 seconds, and either eventually succeed, or fail if this does not recover within 30 seconds, the default await_condition time interval. Pair: @kjnilsson Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2021-08-31 17:02:21 +01:00
Philip Kuryloski	09fb5c5321	Skip additional tests in mixed versions The tests in question won't pass consistently as they are at the mercy of how the quorum queue is placed across the mixed version nodes	2021-08-30 17:17:25 +02:00
Michael Klishin	83f007be54	Merge pull request #3341 from rabbitmq/local-exclusive-queues Always place exclusive queues on the local node	2021-08-28 09:54:49 +03:00
Michael Klishin	3b4b4dc222	Exclude roundtrip definition import cases from mixed version runs References #3333	2021-08-26 19:10:11 +03:00
Michael Klishin	54f7b6d77c	Re-format two definition import input files	2021-08-26 19:03:14 +03:00
Michael Klishin	42a3dfa81b	Exclude the #3333 test case from mixed version runs	2021-08-26 17:25:07 +03:00
Michal Kuratczyk	d3dcd48ea5	Always place exclusive queues on the local node Prior to this change, exclusive queues have been subject to the queue location process, just like other queues. Therefore, if queue_master_locator was not client-local and x-queue-master-locator was not set to client-local, an exclusive queue was likely to be located on a different node than the connection it is exclusive to. This is suboptimal and may lead to inconsistencies when the queue's node goes down while the connection's node is still up.	2021-08-26 13:05:55 +02:00
Michael Klishin	2e61f51773	Commit definition import case16 file	2021-08-24 04:41:51 +03:00
Michael Klishin	6f97707dac	Definition import: correctly import vhost metadata	2021-08-24 04:41:04 +03:00
Michael Klishin	6a0058fe7c	Introduce TLS-related rabbitmq.conf settings for definition import currently only used by the HTTPS mechanism but can be used by any other.	2021-08-17 20:42:53 +03:00
Michael Klishin	f3a5235408	Refactor definition import to allow for arbitrary sources The classic local filesystem source is still supported using the same traditional configuration key, load_definitions. Configuration schema follows peer discovery in spirit: * definitions.import_backend configures the mechanism to use, which can be a module provided by a plugin * definitions.* keys can be defined by plugins and contain any keys a specific mechanism needs For example, the classic local filesystem source can now be configured like this: ``` ini definitions.import_backend = local_filesystem definitions.local.path = /path/to/definitions.d/definition.json ``` ``` ini definitions.import_backend = https definitions.https.url = https://hostname/path/to/definitions.json ``` HTTPS may require additional configuration keys related to TLS/x.509 peer verification. Such extra keys will be added as the need for them becomes evident. References #3249	2021-08-14 14:53:45 +03:00
Loïc Hoguin	24c25ab3cc	Add tests for the regression introduced in #3041	2021-08-11 12:50:04 +02:00
Jean-Sébastien Pédron	6c8cf4c510	Logging: Fix crash when Epoch-based timestamps are used with JSON The code was passing a number (the timestamp) to unicode:characters_to_binary/1 which expects an iolist to convert to UTF-8. We now verify if we have a number before calling that function. If this is a number (integer or float), we keep it as is because JSON supports that type.	2021-08-10 12:34:11 +02:00
Michael Klishin	2efc3d22fa	Merge pull request #3176 from rabbitmq/stream-error-handling Better error handling for streams	2021-07-27 22:25:06 +03:00
David Ansari	6d968718c8	Fix function_clause error in tracking_status/2 Before this commit: > ./sbin/rabbitmq-streams stream_status --tracking s1 Status of stream s1 on node rabbit@localhost ... Error: {:function_clause, [{:rabbit_stream_queue, :"-tracking_status/2-fun-0-", [:offsets, %{"s1-1" => 5}, []], [file: 'src/rabbit_stream_queue.erl', line: 608]}, {:maps, :fold_1, 3, [file: 'maps.erl', line: 410]}, {:rabbit_stream_queue, :tracking_status, 2, []}]} After this commit: > ./sbin/rabbitmq-streams stream_status --tracking s1 Status of stream s1 on node rabbit@localhost ... ┌────────┬───────────┬───────┐ │ type │ reference │ value │ ├────────┼───────────┼───────┤ │ offset │ s1-1 │ 51 │ └────────┴───────────┴───────┘	2021-07-23 19:34:31 +02:00
Michael Klishin	c1e3710140	Squash a compiler warning	2021-07-20 00:55:40 +03:00
Philip Kuryloski	d6399bbb5b	Mixed version testing in bazel (#3200 ) Unlike with gnu make, mixed version testing with bazel uses a package-generic-unix for the secondary umbrella rather than the source. This brings the benefit of being able to mixed version test releases built with older erlang versions (even though all nodes will run under the single version given to bazel) This introduces new test labels, adding a `-mixed` suffix for every existing test. They can be skipped if necessary with `--test_tag_filters` (see the github actions workflow for an example) As part of the change, it is now possible to run an old release of rabbit with rabbitmq_run rule, such as: `bazel run @rabbitmq-server-generic-unix-3.8.17//:rabbitmq-run run-broker`	2021-07-19 14:33:25 +02:00
Philip Kuryloski	0f4cf2755d	Increase a timeout for flakiness sake	2021-07-19 14:24:46 +02:00
Philip Kuryloski	0a78484999	Make things a little more consistent between per_*_limit suites	2021-07-16 14:40:51 +02:00
Philip Kuryloski	4f514f435b	Try to reduce flakes in per_user_connection_channel_limit_partitions_SUITE	2021-07-16 14:32:35 +02:00
Philip Kuryloski	97e8037b80	Replace some static sleeps in tests with dynamic waits This should help with flakiness	2021-07-15 16:42:14 +02:00
dcorbacho	e65ba8347c	Fix delete_replica bug It caused a lot of flakiness on the rabbit_stream_queue_SUITE, both on `delete_replica` and `delete_last_replica` test cases.	2021-07-14 17:18:20 +02:00
dcorbacho	6052ecdc9c	Split cluster_size_3_parallel in two groups Faster to test locally the flaky tests and isolate them	2021-07-14 17:18:20 +02:00
Philip Kuryloski	923d87f847	Avoid using a duplicate group name in rabbitm_stream_queue_SUITE Since bazel-erlang doesn't support this with sharding	2021-07-09 16:26:56 +02:00
Karl Nilsson	284809e750	Merge pull request #3170 from rabbitmq/stream-flaky Fix restart of stream coordinator when there are no stream queues	2021-07-05 15:03:45 +01:00
Philip Kuryloski	da6da8d6c7	Clear memory alarms on all nodes in the memory_alarm_rolls_wal test If the alarm is triggered directly with `rabbit_alarm` it has to be cleared on all nodes	2021-07-05 15:57:56 +02:00
dcorbacho	deaa42ecac	Fix restart of stream coordinator when there are no stream queues Recovering from an existing queue is fine but if a node is restarted when there are no longer stream queues on the system, the recovery process won't restart the pre-existing coordinator as that's only performed on queue recovery. The first attempt to declare a new stream queue on this cluster will crash with `coordinator unavailable` error, as it only restarts the local coordinator and not the whole ra cluster, thus lacking quorum. Recovering the coordinator during the boot process ensures that a pre-existing coordinator cluster is restarted in any case, and does nothing if there was never a coordinator on the node.	2021-07-05 15:34:05 +02:00
Philip Kuryloski	390a00b828	Handle feature flag enablement failure more gracefully in test setup	2021-07-05 11:22:38 +02:00
Philip Kuryloski	1b92fadd80	Skip additional quorum_queue_SUITE cases under mixed versions	2021-06-30 12:41:59 +02:00
Philip Kuryloski	d086af8070	Reduce test case flakyness in quorum_queue_SUITE for the clustered/cluster_size_3/confirm_availability_on_leader_change case	2021-06-30 10:05:23 +02:00
Philip Kuryloski	ef9647671f	Introduce dynamic wait in parts of the quorum_queue_SUITE to help with test flakes	2021-06-29 18:32:53 +02:00
Philip Kuryloski	a8ae32e2f7	Skip an additional quorum_queue_SUITE case in mixed versions	2021-06-29 16:43:19 +02:00
Michael Klishin	cf147ebfe5	Merge branch 'master' into mk-stricter-stop-start-assertions-in-quorum-queue-suite	2021-06-29 12:49:31 +03:00
Michael Klishin	a1ab7452ef	Improve assertions in a QQ suite test	2021-06-29 12:16:23 +03:00
Philip Kuryloski	3cb8ff1ab9	Mixed version testing skip updates	2021-06-29 10:49:06 +02:00
dcorbacho	c9305d948a	Use number of publishing channels as global publishers in amqp091	2021-06-29 08:10:42 +01:00
Michael Klishin	bed64f2cc9	Reduce priority_queue_SUITE to single node tests Other tests (that produce flakes) arguably test classic mirrored queues, a deprecated feature reasonably well covered in other suites. Per discussion with @gerhard.	2021-06-28 21:59:16 +03:00
Michael Klishin	a19a0f924a	quorum_queue_SUITE: don't unconditionally skip node_removal_is_not_quorum_critical Unintentionally introduced in `a3c97d491f`	2021-06-28 13:02:55 +03:00
Philip Kuryloski	a3c97d491f	Update additional test skipping for 3.8/3.9 mixed versions	2021-06-25 11:17:46 +02:00
Philip Kuryloski	dca208abce	Additional skipping of unsupported tests in mixed version clusters Also consolidate the mixed version check on rabbit_ct_helpers:is_mixed_versions/1 as much as possible	2021-06-23 14:27:41 +02:00
Gerhard Lazu	c7971252cd	Global counters per protocol + protocol AND queue_type This way we can show how many messages were received via a certain protocol (stream is the second real protocol besides the default amqp091 one), as well as by queue type, which is something that many asked for a really long time. The most important aspect is that we can also see them by protocol AND queue_type, which becomes very important for Streams, which have different rules from regular queues (e.g. for example, consuming messages is non-destructive, and deep queue backlogs - think billions of messages - are normal). Alerting and consumer scaling due to deep backlogs will now work correctly, as we can distinguish between regular queues & streams. This has gone through a few cycles, with @mkuratczyk & @dcorbacho covering most of the ground. @dcorbacho had most of this in https://github.com/rabbitmq/rabbitmq-server/pull/3045, but the main branch went through a few changes in the meantime. Rather than resolving all the conflicts, and then making the necessary changes, we (@gerhard + @kjnilsson) took all learnings and started re-applying a lot of the existing code from #3045. We are confident in this approach and would like to see it through. We continued working on this with @dumbbell, and the most important changes are captured in https://github.com/rabbitmq/seshat/pull/1. We expose these global counters in rabbitmq_prometheus via a new collector. We don't want to keep modifying the existing collector, which grew really complex in parts, especially since we introduced aggregation, but start with a new namespace, `rabbitmq_global_`, and continue building on top of it. The idea is to build in parallel, and slowly transition to the new metrics, because semantically the changes are too big since streams, and we have been discussing protocol-specific metrics with @kjnilsson, which makes me think that this approach is least disruptive and... simple. While at this, we removed redundant empty return value handling in the channel. The function called no longer returns this. Also removed all DONE / TODO & other comments - we'll handle them when the time comes, no need to leave TODO reminders. Pairs @kjnilsson @dcorbacho @dumbbell (this is multiple commits squashed into one) Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2021-06-22 14:14:21 +01:00
Philip Kuryloski	2d06a67921	Fix test for rabbitmq-ct-helpers change rabbit_ct_broker_helpers:rpc now raised an exception if the remote call fails. This matches the assertion with the new behavior	2021-06-22 11:12:17 +02:00
Michael Klishin	0e6298d1ae	Explain (cherry picked from commit `2387022e8c`)	2021-06-21 14:43:54 +08:00
Michael Klishin	6550cd1752	Reduce flakiness in rabbitmq_queues_cli_integration_SUITE In case removed node hosts a leader, it takes a moment for the QQ to elect a new one and begin accepting cluster membership change operations again. (cherry picked from commit `a9d8816c6a`)	2021-06-21 14:41:10 +08:00
Philip Kuryloski	40a7a1c24c	Bring rabbit:logger_SUITE online in bazel and bump mismatched deps	2021-06-18 14:41:14 +02:00
Philip Kuryloski	cff7516317	Skip some tests that are not mixed version compatible Mark per_user_connection_channel_tracking_SUITE:cluster_size_2_network as not mixed version compatible. In a mixed 3.8/3.9 cluster, changes to rabbit_core_ff.erl imply that some feature flag related migrations cannot occur, and therefore user_limits cannot be enabled as required by the test	2021-06-17 15:36:22 +02:00
Karl Nilsson	8a4f4c6d45	Ignore dynamic_qq test that isn't mixed version compatible quorum_unaffected_after_vhost_failure isn't mixed versions compatible as it tries to declare a queue in a mixed cluster from a node running RA 1.x where all other nodes are running Ra 2.0.	2021-06-17 11:52:34 +01:00
Karl Nilsson	d8ac46d745	Mark quorum queue test as non-mixed-version compatible simple_confirm_availability_on_leader_change can't be made forwards compatible as when running in mixed mode the queue declaration happens on an old node in a cluster of mostly new nodes. As new nodes run Ra 2.0 and Ra 1.x does not know how to create members on Ra 2.0 nodes this test fails. This is an acceptable limitation for a transient mixed versions cluster.	2021-06-17 10:48:20 +01:00
Michal Kuratczyk	437d8aa8c5	Don't run policy tests in parallel Now that a policy overwrites queue arguments, running policy tests in parallel with other tests leads to non-deterministic test results with some tests randomly failing.	2021-06-07 16:46:14 +02:00
David Ansari	0876746d5f	Remove randomized startup delays On initial cluster formation, only one node in a multi node cluster should initialize the Mnesia database schema (i.e. form the cluster). To ensure that for nodes starting up in parallel, RabbitMQ peer discovery backends have used either locks or randomized startup delays. Locks work great: When a node holds the lock, it either starts a new blank node (if there is no other node in the cluster), or it joins an existing node. This makes it impossible to have two nodes forming the cluster at the same time. Consul and etcd peer discovery backends use locks. The lock is acquired in the consul and etcd infrastructure, respectively. For other peer discovery backends (classic, DNS, AWS), randomized startup delays were used. They work good enough in most cases. However, in https://github.com/rabbitmq/cluster-operator/issues/662 we observed that in 1% - 10% of the cases (the more nodes or the smaller the randomized startup delay range, the higher the chances), two nodes decide to form the cluster. That's bad since it will end up in a single Erlang cluster, but in two RabbitMQ clusters. Even worse, no obvious alert got triggered or error message logged. To solve this issue, one could increase the randomized startup delay range from e.g. 0m - 1m to 0m - 3m. However, this makes initial cluster formation very slow since it will take up to 3 minutes until every node is ready. In rare cases, we still end up with two nodes forming the cluster. Another way to solve the problem is to name a dedicated node to be the seed node (forming the cluster). This was explored in https://github.com/rabbitmq/cluster-operator/pull/689 and works well. Two minor downsides to this approach are: 1. If the seed node never becomes available, the whole cluster won't be formed (which is okay), and 2. it doesn't integrate with existing dynamic peer discovery backends (e.g. K8s, AWS) since nodes are not yet known at deploy time. In this commit, we take a better approach: We remove randomized startup delays altogether. We replace them with locks. However, instead of implementing our own lock implementation in an external system (e.g. in K8s), we re-use Erlang's locking mechanism global:set_lock/3. global:set_lock/3 has some convenient properties: 1. It accepts a list of nodes to set the lock on. 2. The nodes in that list connect to each other (i.e. create an Erlang cluster). 3. The method is synchronous with a timeout (number of retries). It blocks until the lock becomes available. 4. If a process that holds a lock dies, or the node goes down, the lock held by the process is deleted. The list of nodes passed to global:set_lock/3 corresponds to the nodes the peer discovery backend discovers (lists). Two special cases worth mentioning: 1. That list can be all desired nodes in the cluster (e.g. in classic peer discovery where nodes are known at deploy time) while only a subset of nodes is available. In that case, global:set_lock/3 still sets the lock not blocking until all nodes can be connected to. This is good since nodes might start sequentially (non-parallel). 2. In dynamic peer discovery backends (e.g. K8s, AWS), this list can be just a subset of desired nodes since nodes might not startup in parallel. That's also not a problem as long as the following requirement is met: "The peer disovery backend does not list two disjoint sets of nodes (on different nodes) at the same time." For example, in a 2-node cluster, the peer discovery backend must not list only node 1 on node 1 and only node 2 on node 2. Existing peer discovery backends fullfil that requirement because the resource the nodes are discovered from is global. For example, in K8s, once node 1 is part of the Endpoints object, it will be returned on both node 1 and node 2. Likewise, in AWS, once node 1 started, the described list of instances with a specific tag will include node 1 when the AWS peer discovery backend runs on node 1 or node 2. Removing randomized startup delays also makes cluster formation considerably faster (up to 1 minute faster if that was the upper bound in the range).	2021-06-03 08:01:28 +02:00
Michael Klishin	12253d2fb4	Merge pull request #2954 from rabbitmq/new-segment-entry-count-default Set segment_entry_count per vhost and use a better default	2021-05-27 01:56:02 +03:00
Karl Nilsson	1ea7bf5519	quorum_queue_SUITE restructure tests Run more tests with 3 node cluster and have only one group definition for cluster_size_2	2021-05-21 15:13:18 +01:00
Karl Nilsson	355b1cbe21	quorum_queue_SUITE only configure dist proxy when needed Only configure the dist proxy for groups that require it.	2021-05-21 12:39:07 +01:00
Arnaud Cogoluègnes	c30e013d7a	Rename max-segment-size to stream-max-segment-size-bytes	2021-05-20 10:16:19 +02:00
Michael Klishin	040f8cc912	Replace a few more leftover MPLv1.1 license headers Most files have been using the MPLv2 headers for months now. These were detected by the OSL process.	2021-05-19 21:20:47 +03:00
Michael Klishin	09a4ad411e	Merge pull request #3046 from rabbitmq/mk-extra-bcc-routing-target-in-queue-metadata Make it possible for queues to have extra BCC targets specified as options	2021-05-19 17:56:39 +03:00
Karl Nilsson	a96670b6c6	Fix stream x-stream-offset regression x-stream-offset supports "friendly" relative timebase specifications such as 100s. A recent change introduced a validation of the x-stream-offset that disallowed such specs.	2021-05-19 11:23:37 +01:00
Michael Klishin	38c15d691d	Make it possible for queues to have extra BCC targets specified as options This introduces a backup mechanism that can be controlled by plugins via policies. Benchmarks suggest the cost of this change on Erlang 24 is well under 1%. With a stream target, it is less than routing to one extra queue of the same type (e.g. a quorum queue).	2021-05-18 22:22:16 +03:00
dcorbacho	733f5fb367	Report stream coordinator unavailable as an amqp error Uses code 506: resource_error	2021-05-12 17:12:09 +01:00
Karl Nilsson	94e943692b	Merge pull request #3022 from rabbitmq/relative-time-offset Support relative time based offset specs	2021-05-11 13:50:00 +01:00
Loïc Hoguin	d9344b2b58	Set segment_entry_count per vhost and use a better default The new default of 2048 was chosen based on various scenarios. It provides much better memory usage when many queues are used (allowing one host to go from 500 queues to 800+ queues) and there seems to be none or negligible performance cost (< 1%) for single queues.	2021-05-11 10:45:28 +02:00
dcorbacho	464bf69cc4	Support relative time based offset specs	2021-05-03 17:55:43 +02:00
dcorbacho	bcac37d442	Disallow removal of the last stream member	2021-04-30 17:25:06 +02:00
Michael Klishin	62df3b7ebc	Reduce log output	2021-04-28 00:24:06 +03:00
Karl Nilsson	63e33aef6d	Merge pull request #2996 from rabbitmq/stream-add-replica-check Streams: safer replica addition	2021-04-27 10:37:57 +01:00
kjnilsson	a827275a43	Streams: safer replica addition Disallow replica additions if any of the existing replicas are more than 10 seconds out of date.	2021-04-27 09:38:39 +01:00
Ayanda-D	d78e14ad3b	Allow #amqp_error{} responses in channel interceptors	2021-04-20 14:57:55 +01:00
Michael Klishin	d147a08aee	Correct parse tags provided as a list Discovered while testing a PR for rabbit-hole	2021-04-16 18:35:47 +03:00
kjnilsson	b35c29d7b2	QQ: ensure that messages are delivered in order In the case where there are some messages kept in memory mixed with some that are not it is possible that a messages are delivered to the consuming channel with gaps/out of order which would in some cases cause the channel to treat them as re-sends it has already seen and just discard them. When this happens the messages get stuck in the consumer state inside the queue and are never seen by the client consumer and thus never acked. When this happen the release cursors can't be emitted as the smallest raft index will be one of the stuck messages.	2021-04-15 15:01:22 +01:00
Michael Klishin	6a4ee16b79	Merge pull request #2968 from rabbitmq/longer-qq-names Allow quorum queue names to exceed atom max chars	2021-04-12 18:45:22 +03:00
kjnilsson	432edb11fc	Allow quorum queue names to exceed atom max chars If the concatenation of the vhost and the queue name exceeds 255 chars we instead generate an arbitrary atom name instead of throwing an exception.	2021-04-12 14:14:26 +01:00
Michael Klishin	7f98bc3d1c	Add more VM memory monitor tests, pass Dialyzer (cherry picked from commit `57ec1f8768`)	2021-04-11 11:36:30 +03:00
Michael Klishin	30cbbba167	High VM watermark: support {relative, N} values set via advanced.config for usability. It is not any different from when a float value is used and only exists as a counterpart to '{absolute, N}'. Also nothing changes for rabbitmq.conf users as that format performs validation and correct value translation. See #2694, #2965 for background.	2021-04-11 10:28:35 +03:00
Philip Kuryloski	3644ed58ee	Test sharding and flaky annotations Also rename a nested common test group in quorum_queue_SUITE to avoid a name collision that prevented running the duplicates individually	2021-04-08 15:33:19 +02:00
kjnilsson	e2fd14b996	Bump timeouts for peer discovery suite	2021-04-07 10:00:07 +01:00
kjnilsson	b576242952	Increase rabbit_stream_queue_SUITE timetrap And set the default of make start-cluster to 3 nodes.	2021-04-06 15:50:22 +01:00
Jean-Sébastien Pédron	95f9e92caa	unit_log_management_SUITE: Use $RABBITMQ_LOGS to configure logging Now that the Cuttlefish schema sets default values for the application environment in `{rabbit, [{log, ...}]}`, the values set in the testsuite using application:setenv() are overwritten. By using the $RABBITMQ_LOGS environment variable, we can override those default values.	2021-04-06 11:52:55 +02:00
Philip Kuryloski	0caeb65d04	Shard the eager_sync_SUITE by case This suite contains only one group, but is long enough to warrant sharding. This is probably a bit of a time penalty in absolute terms because init_per_suite and init_per_group re-run in each shard.	2021-03-31 15:47:36 +02:00
Jean-Sébastien Pédron	571b97513f	Logging: Allow to set timezone in rfc3339- and format-string-based time formats This is not exposed to the end user (yet) through the Cuttlefish configuration. But this is required to make logging_SUITE timezone agnostic (i.e. the timezone of the host running the testsuite should not affect the formatted times).	2021-03-31 14:13:40 +02:00
Carl Hörberg	330b820a0f	Update proxy protocol test cases	2021-03-30 16:55:36 +02:00
Jean-Sébastien Pédron	2f648da118	config_schema_SUITE: Stop testing log configuration The design of the rabbit_ct_config_schema helper makes it impossible to do pattern matching and thus handle default values in the schema. As a consequence, the helper explicitly removes the `{rabbit, {log, _}}` configuration key to work around this limitation until a proper solution is implemented and all testsuites rewritten. See rabbitmq/rabbitmq-ct-helpers@b1f1f1ce68. Therefore, we can't test log configuration variables anymore using this helper. Thatt's ok because logging_SUITE already tests many things.	2021-03-30 10:21:26 +02:00
Jean-Sébastien Pédron	aca638abbb	Logging: Add configuration variables to set various formats In addition to the existing configuration variables to configure logging, the following variables were added to extend the settings. log..formatter = plaintext \| json Selects between the plain text (default) and JSON formatters. log..formatter.time_format = rfc3339_space \| rfc3339_T \| epoch_usecs \| epoch_secs \| lager_default Configures how the timestamp should be formatted. It has several values to get RFC3339 date & time, Epoch-based integers and Lager default format. log..formatter.level_format = lc \| uc \| lc3 \| uc3 \| lc4 \| uc4 Configures how to format the level. Things like uppercase vs. lowercase, full vs. truncated. Examples: lc: debug uc: DEBUG lc3: dbg uc3: DBG lw4: dbug uc4: DBUG log..formatter.single_line = on \| off Indicates if multi-line messages should be reformatted as a single-line message. A multi-line message is converted to a single-line message by joining all lines and separating them with ", ". log..formatter.plaintext.format Set to a pattern to indicate the format of the entire message. The format pattern is a string with $-based variables. Each variable corresponds to a field in the log event. Here is a non-exhaustive list of common fields: time level msg pid file line Example: $time [$level] $pid $msg log..formatter.json.field_map Indicates if fields should be renamed or removed, and the ordering which they should appear in the final JSON object. The order is set by the order of fields in that coniguration variable. Example: time:ts level msg :- In this example, `time` is renamed to `ts`. `:-` tells to remove all fields not mentionned in the list. In the end the JSON object will contain the fields in the following order: ts, level, msg. log..formatter.json.verbosity_map Indicates if a verbosity field should be added and how it should be derived from the level. If the verbosity map is not set, no verbosity field is added to the JSON object. Example: debug:2 info:1 notice:1 :0 In this example, debug verbosity is 2, info and notice verbosity is 1, other levels have a verbosity of 0. All of them work with the console, exchange, file and syslog outputs. The console output has specific variables too: log.console.stdio = stdout \| stderr Indicates if stdout or stderr should be used. The default is stdout. log.console.use_colors = on \| off Indicates if colors should be used in log messages. The default depends on the environment. log.console.color_esc_seqs.* Indicates how each level is mapped to a color. The value can be any string but the idea is to use an ANSI escape sequence. Example: log.console.color_esc_seqs.error = \033[1;31m V2: A custom time format pattern was introduced, first using variables, then a reference date & time (e.g. "Mon 2 Jan 2006"), thanks to @ansd. However, we decided to remove it for now until we have a better implementation of the reference date & time parser. V3: The testsuite was extended to cover new settings as well as the syslog output. To test it, a fake syslogd server was added (Erlang process, part of the testsuite). V4: The dependency to cuttlefish is moved to rabbitmq_prelaunch which actually uses the library. The version is updated to 3.0.1 because we need Kyorai/cuttlefish#25.	2021-03-29 17:39:50 +02:00
Philip Kuryloski	388654c542	Add a partial Bazel build (#2938 ) Adds WORKSPACE.bazel, BUILD.bazel & *.bzl files for partial build & test with Bazel. Introduces a build-time dependency on https://github.com/rabbitmq/bazel-erlang	2021-03-29 11:01:43 +02:00
Philip Kuryloski	09e85d2e3d	Merge pull request #2935 from rabbitmq/rabbitmq-queue-int-tests Fix integration tests to wait until ra cluster is ready	2021-03-26 17:28:06 +01:00
dcorbacho	a1caff2a86	Fix integration tests to wait until ra cluster is ready Publish/confirm before grow/shrink members is enough	2021-03-26 17:04:50 +01:00
Philip Kuryloski	1ead01081a	Increase startup delay range in peer_discovery_classic_config_SUITE I suspect the second ra system for coordination requires a bit more time in boot, as this seems to flake more often since the merge	2021-03-26 14:11:36 +01:00
Philip Kuryloski	3c0c0901b1	Restore retry in peer_discovery_classic_config_SUITE It was accidentally left commented out	2021-03-25 20:05:36 +01:00
Philip Kuryloski	c313f36b57	Fix Makefile for feature_flags_SUITE_data/my_plugin It was not updated for the rabbitmq-components.mk consolidation	2021-03-25 19:43:48 +01:00
Philip Kuryloski	008e47ef3c	Fixup the behavior of rabbit_mnesia:is_virgin_node/0 Given the addition of the Coord ra system (and additional files on disk)	2021-03-25 10:49:17 +01:00
kjnilsson	8d8b67bb34	fix rabbit_fifo_int_SUITE	2021-03-24 14:17:34 +00:00
Michael Klishin	8eac876bc8	Use "quorum_queues" for QQ Ra system "quorum" and "coordination" are not very distinctive	2021-03-22 21:44:19 +03:00
kjnilsson	75cea78415	fixes	2021-03-22 21:44:19 +03:00
kjnilsson	f6f02a5d2d	ra systems wip	2021-03-22 21:44:15 +03:00
Philip Kuryloski	a63f169fcb	Remove duplicate rabbitmq-components.mk and erlang.mk files Also adjust the references in rabbitmq-components.mk to account for post monorepo locations	2021-03-22 15:40:19 +01:00
Michael Klishin	373285093e	Merge pull request #2899 from rabbitmq/parallel-stream-suite Run most stream tests in parallel	2021-03-19 22:21:18 +03:00
Jean-Sébastien Pédron	9fd2d68e7a	rabbit_prelaunch_logging: $RABBITMQ_LOGS doesn't override log level ... if it is set in the configuration file. Here is an example of that use case: * The official Docker image sets RABBITMQ_LOGS=- in the environment * A user of that image adds a configuration file with: log.console.level = debug The initial implementation, introduced in rabbitmq/rabbitmq-server#2861, considered that if the output is overriden in the environment (through $RABBITMQ_LOGS), any output configuration in the configuration file is ignored. The problem is that the output-specific configuration could also set the log level which is not changed by $RABBITMQ_LOGS. This patch fixes that by keeping the log level from the configuration (if it is set obviously) even if the output is overridden in the environment.	2021-03-19 15:43:28 +01:00
dcorbacho	9b3b5d48ec	Run most stream tests in parallel The test suite isn't faster, I guess some contention on the coordinator, but is finding some bugs.	2021-03-17 21:32:42 +01:00
kjnilsson	cbf0107605	Stream coordinator bug fix Fix issue where a deleted replica could be restarted if the leader went down whilst the replica was still running it's start phase.	2021-03-17 13:54:28 +00:00
kjnilsson	9d83e0c5d9	Add logging to config decryption test To possibly get a bit more information on failure reasons on GH Actions.	2021-03-16 16:28:41 +00:00
kjnilsson	3a26cf8654	Stream coordinator: handle commands for unknown streams To avoid crashing.	2021-03-12 15:04:40 +00:00
kjnilsson	1709208105	Throw resource error when no local stream member As well as some additional tests	2021-03-12 15:04:40 +00:00
dcorbacho	e19aca8075	Use right map fields to compute streams info	2021-03-12 15:04:40 +00:00
kjnilsson	7fa3f6b6e1	Stream Coordinator: primitive backoff Sleep for 5s after a failure due to a node being down before reporting back to stream coordinator (which will immediately retry). stream coordinator: correct command type spec tidy up fix rabbit_fifo_prop tests stream coord: add function for member state query	2021-03-12 15:03:47 +00:00
kjnilsson	bb3e0a7674	Move stream coordinator unit tests into ct suite	2021-03-12 15:03:10 +00:00
kjnilsson	9fb2e6d2dd	Stream Coordinator refactor	2021-03-12 15:03:08 +00:00
Jean-Sébastien Pédron	cdcf602749	Switch from Lager to the new Erlang Logger API for logging The configuration remains the same for the end-user. The only exception is the log root directory: it is now set through the `log_root` application env. variable in `rabbit`. People using the Cuttlefish-based configuration file are not affected by this exception. The main change is how the logging facility is configured. It now happens in `rabbit_prelaunch_logging`. The `rabbit_lager` module is removed. The supported outputs remain the same: the console, text files, the `amq.rabbitmq.log` exchange and syslog. The message text format slightly changed: the timestamp is more precise (now to the microsecond) and the level can be abbreviated to always be 4-character long to align all messages and improve readability. Here is an example: 2021-03-03 10:22:30.377392+01:00 [dbug] <0.229.0> == Prelaunch DONE == 2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> 2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Starting RabbitMQ 3.8.10+115.g071f3fb on Erlang 23.2.5 2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Licensed under the MPL 2.0. Website: https://rabbitmq.com The example above also shows that multiline messages are supported and each line is prepended with the same prefix (the timestamp, the level and the Erlang process PID). JSON is also supported as a message format and now for any outputs. Indeed, it is possible to use it with e.g. syslog or the exchange. Here is an example of a JSON-formatted message sent to syslog: Mar 3 11:23:06 localhost rabbitmq-server[27908] <0.229.0> - {"time":"2021-03-03T11:23:06.998466+01:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","meta":{"domain":"rabbitmq.prelaunch","file":"src/rabbit_prelaunch_logging.erl","gl":"<0.228.0>","line":311,"mfa":["rabbit_prelaunch_logging","configure_logger",1],"pid":"<0.229.0>"}} For quick testing, the values accepted by the `$RABBITMQ_LOGS` environment variables were extended: * `-` still means stdout * `-stderr` means stderr * `syslog:` means syslog on localhost * `exchange:` means logging to `amq.rabbitmq.log` `$RABBITMQ_LOG` was also extended. It now accepts a `+json` modifier (in addition to the existing `+color` one). With that modifier, messages are formatted as JSON intead of plain text. The `rabbitmqctl rotate_logs` command is deprecated. The reason is Logger does not expose a function to force log rotation. However, it will detect when a file was rotated by an external tool. From a developer point of view, the old `rabbit_log` API remains supported, though it is now deprecated. It is implemented as regular modules: there is no `parse_transform` involved anymore. In the code, it is recommended to use the new Logger macros. For instance, `?LOG_INFO(Format, Args)`. If possible, messages should be augmented with some metadata. For instance (note the map after the message): ?LOG_NOTICE("Logging: switching to configured handler(s); following " "messages may not be visible in this log output", #{domain => ?RMQLOG_DOMAIN_PRELAUNCH}), Domains in Erlang Logger parlance are the way to categorize messages. Some predefined domains, matching previous categories, are currently defined in `rabbit_common/include/logging.hrl` or headers in the relevant plugins for plugin-specific categories. At this point, very few messages have been converted from the old `rabbit_log` API to the new macros. It can be done gradually when working on a particular module or logging. The Erlang builtin console/file handler, `logger_std_h`, has been forked because it lacks date-based file rotation. The configuration of date-based rotation is identical to Lager. Once the dust has settled for this feature, the goal is to submit it upstream for inclusion in Erlang. The forked module is calld `rabbit_logger_std_h` and is based `logger_std_h` in Erlang 23.0.	2021-03-11 15:17:36 +01:00
Michael Klishin	d77609bba4	Merge pull request #2846 from rabbitmq/cleanup-rabbit-fifo-usage Clean up rabbit_fifo_usage table on queue.delete	2021-03-03 18:33:33 +03:00
Michael Klishin	a2f98f25e9	Merge pull request #2804 from rabbitmq/rabbitmq-server-2756 Add federation support for quorum queues	2021-02-25 19:10:15 +03:00
dcorbacho	a147cc4877	Clean up rabbit_fifo_usage table on queue.delete	2021-02-25 16:57:43 +01:00
Michael Klishin	cd1a271499	As of Lager 3.8.2, Lager has a log_root default so override it unconditionally.	2021-02-25 00:43:02 +03:00
dcorbacho	699cd1ab29	Add federation support for quorum queues	2021-02-18 17:15:47 +01:00
Carl Hörberg	413bfe7b37	Disable Erlang busy wait by default By disabling Erlang busy wait threshold CPU usage with 5000 idle connection drops from 110% to 14%. Throughput does not seem to be affected at all, if any thing it actually goes up a bit when you have 5000 idle connections (because less CPU cycles are wasted polling idle connections). rabbitmq-perf-test-2.13.0/bin/runjava com.rabbitmq.perf.PerfTest -s 8000 -z 15 With default erlang busy wait threshold: id: test-115706-497, sending rate avg: 39589 msg/s id: test-115706-497, receiving rate avg: 39570 msg/s With busy wait disabled: id: test-115807-719, sending rate avg: 40340 msg/s id: test-115807-719, receiving rate avg: 40301 msg/s rabbitmq-diagnostics runtime_thread_stats output while running the PerfTest: with default busy wait threshold: Stats per type: async 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% aux 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 99.98% dirty_cpu_sche 0.00% 0.00% 0.00% 0.03% 0.05% 0.00% 99.92% dirty_io_sched 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 99.99% poll 0.00% 0.67% 0.00% 0.00% 0.00% 0.00% 99.33% scheduler 0.69% 0.18% 28.41% 5.49% 9.50% 7.43% 48.29% without busy wait threshold: Stats per type: async 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% aux 0.01% 0.00% 0.00% 0.00% 0.01% 0.00% 99.98% dirty_cpu_sche 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% dirty_io_sched 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% poll 0.00% 0.77% 0.00% 0.00% 0.00% 0.00% 99.23% scheduler 0.70% 0.14% 28.29% 5.41% 0.86% 7.22% 57.38%	2021-02-10 12:35:12 +01:00
Michael Klishin	ad20bfbc40	Use new crypto API cipher name here References rabbitmq/credentials-obfuscation#10	2021-02-09 11:22:48 +03:00
Michael Klishin	0939cec51a	Exclude aes_ige256 in one more test suite	2021-02-08 11:21:16 +03:00
Michael Klishin	c7b9c39352	Don't perform CMQ leadership transfer when entering maintenance mode The time this operation can take in clusters with a lot of classic mirrored queue (say, 10s or 100s of thousands) be prohibitive for upgrades. Upgrades that use a health check to ensure that there are in-sync replicas before entering maintenance mode, in which case the transfer is not really necessary. All of the above is more obvious with the recent changes in #2749.	2021-01-27 19:11:26 +03:00
Michael Klishin	52479099ec	Bump (c) year	2021-01-22 09:00:14 +03:00
kjnilsson	f2418cfe4c	Fix crash bug in QQ state conversion When there are consumers in the service queue.	2021-01-20 14:19:33 +00:00
kjnilsson	2f0dba45d8	Stream: Channel resend on leader change Detect when a new stream leader is elected and make stream_queues re-send any unconfirmed, pending messages to ensure they did not get lost during the leader change. This is done using the osiris deduplication feature to ensure the resend does not create duplicates of messages in the stream.	2021-01-13 12:09:44 +00:00
dcorbacho	9ef9dde6ce	Apply retention policy in all osiris members	2021-01-12 12:18:13 +00:00
dcorbacho	e5a2eaaa0d	Update retention when only stream retention policy has changed In any other case, the worker needs to be restarted	2021-01-12 12:18:13 +00:00
Michal Kuratczyk	6a81589c11	Expose `bypass_pem_cache` through rabbitmq.conf Bypassing PEM cache may speed up TLS handshakes in some cases as described here: https://blog.heroku.com/how-we-sped-up-sni-tls-handshakes-by-5x	2020-12-17 16:53:14 +01:00
Michael Klishin	4ea9ce1c0b	Clarify what version will be the first to use this format	2020-12-09 12:48:56 +03:00
Michael Klishin	e4c37db689	Support importing users with arrays of tags as opposed to a comma-separated binary. Part of #2667.	2020-12-08 18:22:56 +03:00
kjnilsson	6fdb7d29ec	Handle errors in crashing_queues_SUITE As the connection may crash during the previous declaration and a caught error would be returned in amqp_connection:open_channel/1 that wasn't handled previously. Exactly how things fail in this test is most likely very timing dependent and may vary. Also fixes mqtt test where the process that set up a mock auth ETS table was transient when an rpc timeout was introduced	2020-12-03 13:56:09 +00:00
Luke Bakken	ccf624211a	Add test that fails prior to the change for #2668	2020-12-02 12:33:02 -08:00
Arnaud Cogoluègnes	ffd66027af	Merge pull request #2506 from rabbitmq/stream-timestamp-offset Support timestamp offsets for stream consumers	2020-11-27 14:49:38 +01:00
Arnaud Cogoluègnes	43cfb45a74	Convert AMQP 091 timestamp to millisecond For start offset in stream queue.	2020-11-27 14:47:36 +01:00
kjnilsson	ea7c9e9b61	QQ: Emit release cursor for empty basic gets Else an application that polled an empty quorum queue frequntly using basic.get would never result in a snapshot being taken and results in unlimited log growth.	2020-11-19 15:59:51 +00:00
dcorbacho	f23a51261d	Merge remote-tracking branch 'origin/master' into stream-timestamp-offset	2020-11-18 14:27:41 +00:00
kjnilsson	d88b623c18	Use correct credit mode x-credit When the x-credit consumer arg is defined Quorum Queues should use use credit mode `credited` and not `simple_prefetch`.	2020-11-16 10:45:10 +01:00
Philip Kuryloski	a1fe3ab061	Change repo "root" to deps/rabbit rabbit must not be the monorepo root application, as other applications depend on it	2020-11-13 14:34:42 +01:00

... 17 18 19 20 21 ...

1149 Commits