rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
David Ansari	e2a113605d	Disallow transient entities in RabbitMQ AMQP 1.0 Erlang client Transient (i.e. `durable=false`) exchanges and queues are deprecated. Khepri will store all entities durably. (Even exclusive queues will be stored durably. Exclusive queues are still deleted when the declaring connection is closed.) Similar to how the RabbitMQ AMQP 1.0 Java client already disallows the creation of transient exchanges and queues, this commit will prohibit the declaration of transient exchanges and queues in the RabbitMQ AMQP 1.0 Erlang client starting with RabbitMQ 4.1.	2024-12-16 16:17:55 +01:00
David Ansari	b1eb354385	Strictly validate annotations	2024-09-18 12:42:27 +02:00
David Ansari	cdc5b886f8	Fix crash in consistent hash exchange Prior to this commit, a crash occurred when a consistent hash exchange got declared with a `hash-header` argument, but the publishing client didn't set that header on the message. This bug is present in RabbitMQ 3.13.0 - 3.13.6. Fixes https://github.com/rabbitmq/rabbitmq-server/discussions/11671	2024-07-24 11:42:59 +02:00
Michael Davis	da83358a4a	Respect RABBITMQ_METADATA_STORE in consistent hash exchange suite	2024-07-10 13:46:22 -04:00
David Ansari	1d02ea9e55	Fix crashes when message gets dead lettered Fix crashes when message is originally sent via AMQP and stored within a classic or quorum queue and subsequently dead lettered where the dead letter exchange needs access to message annotations or properties or application-properties.	2024-05-02 07:56:00 +00:00
Michael Klishin	01092ff31f	(c) year bumps	2024-01-01 22:02:20 -05:00
Michael Klishin	1b642353ca	Update (c) according to [1] 1. https://investors.broadcom.com/news-releases/news-release-details/broadcom-and-vmware-intend-close-transaction-november-22-2023	2023-11-21 23:18:22 -05:00
Diana Parra Corbacho	5f0981c5a3	Allow to use Khepri database to store metadata instead of Mnesia [Why] Mnesia is a very powerful and convenient tool for Erlang applications: it is a persistent disc-based database, it handles replication accross multiple Erlang nodes and it is available out-of-the-box from the Erlang/OTP distribution. RabbitMQ relies on Mnesia to manage all its metadata: * virtual hosts' properties * intenal users * queue, exchange and binding declarations (not queues data) * runtime parameters and policies * ... Unfortunately Mnesia makes it difficult to handle network partition and, as a consequence, the merge conflicts between Erlang nodes once the network partition is resolved. RabbitMQ provides several partition handling strategies but they are not bullet-proof. Users still hit situations where it is a pain to repair a cluster following a network partition. [How] @kjnilsson created Ra [1], a Raft consensus library that RabbitMQ already uses successfully to implement quorum queues and streams for instance. Those queues do not suffer from network partitions. We created Khepri [2], a new persistent and replicated database engine based on Ra and we want to use it in place of Mnesia in RabbitMQ to solve the problems with network partitions. This patch integrates Khepri as an experimental feature. When enabled, RabbitMQ will store all its metadata in Khepri instead of Mnesia. This change comes with behavior changes. While Khepri remains disabled, you should see no changes to the behavior of RabbitMQ. If there are changes, it is a bug. After Khepri is enabled, there are significant changes of behavior that you should be aware of. Because it is based on the Raft consensus algorithm, when there is a network partition, only the cluster members that are in the partition with at least `(Number of nodes in the cluster ÷ 2) + 1` number of nodes can "make progress". In other words, only those nodes may write to the Khepri database and read from the database and expect a consistent result. For instance in a cluster of 5 RabbitMQ nodes: * If there are two partitions, one with 3 nodes, one with 2 nodes, only the group of 3 nodes will be able to write to the database. * If there are three partitions, two with 2 nodes, one with 1 node, none of the group can write to the database. Because the Khepri database will be used for all kind of metadata, it means that RabbitMQ nodes that can't write to the database will be unable to perform some operations. A list of operations and what to expect is documented in the associated pull request and the RabbitMQ website. This requirement from Raft also affects the startup of RabbitMQ nodes in a cluster. Indeed, at least a quorum number of nodes must be started at once to allow nodes to become ready. To enable Khepri, you need to enable the `khepri_db` feature flag: rabbitmqctl enable_feature_flag khepri_db When the `khepri_db` feature flag is enabled, the migration code performs the following two tasks: 1. It synchronizes the Khepri cluster membership from the Mnesia cluster. It uses `mnesia_to_khepri:sync_cluster_membership/1` from the `khepri_mnesia_migration` application [3]. 2. It copies data from relevant Mnesia tables to Khepri, doing some conversion if necessary on the way. Again, it uses `mnesia_to_khepri:copy_tables/4` from `khepri_mnesia_migration` to do it. This can be performed on a running standalone RabbitMQ node or cluster. Data will be migrated from Mnesia to Khepri without any service interruption. Note that during the migration, the performance may decrease and the memory footprint may go up. Because this feature flag is considered experimental, it is not enabled by default even on a brand new RabbitMQ deployment. More about the implementation details below: In the past months, all accesses to Mnesia were isolated in a collection of `rabbit_db` modules. This is where the integration of Khepri mostly takes place: we use a function called `rabbit_khepri:handle_fallback/1` which selects the database and perform the query or the transaction. Here is an example from `rabbit_db_vhost`: Up until RabbitMQ 3.12.x: get(VHostName) when is_binary(VHostName) -> get_in_mnesia(VHostName). * Starting with RabbitMQ 3.13.0: get(VHostName) when is_binary(VHostName) -> rabbit_khepri:handle_fallback( #{mnesia => fun() -> get_in_mnesia(VHostName) end, khepri => fun() -> get_in_khepri(VHostName) end}). This `rabbit_khepri:handle_fallback/1` function relies on two things: 1. the fact that the `khepri_db` feature flag is enabled, in which case it always executes the Khepri-based variant. 4. the ability or not to read and write to Mnesia tables otherwise. Before the feature flag is enabled, or during the migration, the function will try to execute the Mnesia-based variant. If it succeeds, then it returns the result. If it fails because one or more Mnesia tables can't be used, it restarts from scratch: it means the feature flag is being enabled and depending on the outcome, either the Mnesia-based variant will succeed (the feature flag couldn't be enabled) or the feature flag will be marked as enabled and it will call the Khepri-based variant. The meat of this function really lives in the `khepri_mnesia_migration` application [3] and `rabbit_khepri:handle_fallback/1` is a wrapper on top of it that knows about the feature flag. However, some calls to the database do not depend on the existence of Mnesia tables, such as functions where we need to learn about the members of a cluster. For those, we can't rely on exceptions from Mnesia. Therefore, we just look at the state of the feature flag to determine which database to use. There are two situations though: * Sometimes, we need the feature flag state query to block because the function interested in it can't return a valid answer during the migration. Here is an example: case rabbit_khepri:is_enabled(RemoteNode) of true -> can_join_using_khepri(RemoteNode); false -> can_join_using_mnesia(RemoteNode) end * Sometimes, we need the feature flag state query to NOT block (for instance because it would cause a deadlock). Here is an example: case rabbit_khepri:get_feature_state() of enabled -> members_using_khepri(); _ -> members_using_mnesia() end Direct accesses to Mnesia still exists. They are limited to code that is specific to Mnesia such as classic queue mirroring or network partitions handling strategies. Now, to discover the Mnesia tables to migrate and how to migrate them, we use an Erlang module attribute called `rabbit_mnesia_tables_to_khepri_db` which indicates a list of Mnesia tables and an associated converter module. Here is an example in the `rabbitmq_recent_history_exchange` plugin: -rabbit_mnesia_tables_to_khepri_db( [{?RH_TABLE, rabbit_db_rh_exchange_m2k_converter}]). The converter module — `rabbit_db_rh_exchange_m2k_converter` in this example — is is fact a "sub" converter module called but `rabbit_db_m2k_converter`. See the documentation of a `mnesia_to_khepri` converter module to learn more about these modules. [1] https://github.com/rabbitmq/ra [2] https://github.com/rabbitmq/khepri [3] https://github.com/rabbitmq/khepri_mnesia_migration See #7206. Co-authored-by: Jean-Sébastien Pédron <jean-sebastien@rabbitmq.com> Co-authored-by: Diana Parra Corbacho <dparracorbac@vmware.com> Co-authored-by: Michael Davis <mcarsondavis@gmail.com>	2023-09-29 16:00:11 +02:00
Michael Klishin	ec4f1dba7d	(c) year bump: 2022 => 2023	2023-01-01 23:17:36 -05:00
Luke Bakken	7fe159edef	Yolo-replace format strings Replaces `~s` and `~p` with their unicode-friendly counterparts. ``` git ls-files *.erl \| xargs sed -i.ORIG -e s/~s>/~ts/g -e s/~p>/~tp/g ```	2022-10-10 10:32:03 +04:00
David Ansari	878f369b7a	Make adding bindings idempotent First binding wins. Duplicate bindings, i.e. bindings with the same source exchange and same destination queue / exchange but possibly different routing key (weight) are ignored from now on by the consistent hash exchange. This applies only to bindings being added. For bindings being deleted, any duplicate binding (independent of its routing key) will delete all buckets for the given source and destination. (This is to ensure that buckets for a given source and destination can be deleted for when upgrading from a version prior to this commit. This was also the behaviour prior to this commit, so nothing changes in that regard.) Note that duplicate bindings continue to be created in RabbitMQ. (They are only ignored by the consistent hash exchange.) Adding a binding will perform linear search in the bucket map. This is already stated in the README: "These two operations use linear algorithms to update the ring." The linear search when adding a binding could be optimised by adding another Mnesia table field which will require a new migration and feature flag. Hence, such an optimization is left out in this commit. Fixes #3386.	2022-06-30 09:24:02 +00:00
Michael Klishin	c38a3d697d	Bump (c) year	2022-03-21 01:21:56 +04:00
Philip Kuryloski	078321cce5	Attempt to reduce test flakes in consistent_hash_exchange suite	2022-03-11 15:37:00 +01:00
Falcon Taylor-Carter	4dab02289d	Add tests for duplicate binding scenarios	2021-10-18 23:25:06 -04:00
Michael Klishin	52479099ec	Bump (c) year	2021-01-22 09:00:14 +03:00
Jean-Sébastien Pédron	1cfb526a48	rabbit_exchange_type_consistent_hash_SUITE: Bump wait_for_confirms to 5 minutes We still get failures in CI. Let's see how it goes with a very large timeout value.	2020-11-04 17:29:49 +01:00
Jean-Sébastien Pédron	de1cccff7a	rabbit_exchange_type_consistent_hash_SUITE: Use ?assertEqual instead of matching The reported error will provide more information.	2020-11-04 17:28:42 +01:00
Jean-Sébastien Pédron	cd6c8e25cf	rabbit_exchange_type_consistent_hash_SUITE: Remove trailing whitespaces	2020-11-04 17:28:18 +01:00
Jean-Sébastien Pédron	c7354f0f45	rabbit_exchange_type_consistent_hash_SUITE: Wait for confirms for 60 seconds Switching from 5000 seconds to 5 seconds, after we discovered that this API expects seconds instead of milliseconds, made the wait too short.	2020-11-04 11:06:13 +01:00
Luke Bakken	868bd77859	wait_for_confirms timeout is in seconds References rabbitmq/rabbitmq-erlang-client#138 cc @dumbbell	2020-11-02 10:49:50 -08:00
dcorbacho	5d348bd3a1	Switch to Mozilla Public License 2.0 (MPL 2.0)	2020-07-11 19:45:03 +01:00
Michael Klishin	2d0adc176b	Integration tests for #45	2020-06-12 13:00:21 +03:00
Jean-Sébastien Pédron	f73587775a	Update copyright (year 2020)	2020-03-10 16:08:09 +01:00
Michael Klishin	8c28dff573	(c) bump	2019-12-29 05:50:26 +03:00
Spring Operator	f1ac305a24	URL Cleanup This commit updates URLs to prefer the https protocol. Redirects are not followed to avoid accidentally expanding intentionally shortened URLs (i.e. if using a URL shortener). # HTTP URLs that Could Not Be Fixed These URLs were unable to be fixed. Please review them to see if they can be manually resolved. * http://blog.listincomprehension.com/search/label/procket (200) with 1 occurrences could not be migrated: ([https](https://blog.listincomprehension.com/search/label/procket) result ClosedChannelException). * http://dozzie.jarowit.net/trac/wiki/TOML (200) with 1 occurrences could not be migrated: ([https](https://dozzie.jarowit.net/trac/wiki/TOML) result SSLHandshakeException). * http://dozzie.jarowit.net/trac/wiki/subproc (200) with 1 occurrences could not be migrated: ([https](https://dozzie.jarowit.net/trac/wiki/subproc) result SSLHandshakeException). * http://e2project.org (200) with 1 occurrences could not be migrated: ([https](https://e2project.org) result AnnotatedConnectException). * http://michaelnielsen.org/blog/consistent-hashing/ (200) with 1 occurrences could not be migrated: ([https](https://michaelnielsen.org/blog/consistent-hashing/) result SSLHandshakeException). * http://nitrogenproject.com/ (200) with 2 occurrences could not be migrated: ([https](https://nitrogenproject.com/) result ConnectTimeoutException). * http://proper.softlab.ntua.gr (200) with 1 occurrences could not be migrated: ([https](https://proper.softlab.ntua.gr) result SSLHandshakeException). * http://rubybunny.info (200) with 1 occurrences could not be migrated: ([https](https://rubybunny.info) result AnnotatedConnectException). * http://www.martinbroadhurst.com/Consistent-Hash-Ring.html (200) with 1 occurrences could not be migrated: ([https](https://www.martinbroadhurst.com/Consistent-Hash-Ring.html) result SSLHandshakeException). * http://yaws.hyber.org (200) with 1 occurrences could not be migrated: ([https](https://yaws.hyber.org) result AnnotatedConnectException). * http://choven.ca (503) with 1 occurrences could not be migrated: ([https](https://choven.ca) result ConnectTimeoutException). # Fixed URLs ## Fixed But Review Recommended These URLs were fixed, but the https status was not OK. However, the https status was the same as the http request or http redirected to an https URL, so they were migrated. Your review is recommended. * http://fixprotocol.org/ (301) with 1 occurrences migrated to: https://fixtrading.org ([https](https://fixprotocol.org/) result SSLHandshakeException). * http://erldb.org (UnknownHostException) with 1 occurrences migrated to: https://erldb.org ([https](https://erldb.org) result UnknownHostException). ## Fixed Success These URLs were switched to an https URL with a 2xx status. While the status was successful, your review is still recommended. * http://cloudi.org/ with 27 occurrences migrated to: https://cloudi.org/ ([https](https://cloudi.org/) result 200). * http://en.wikipedia.org/wiki/Consistent_hashing with 1 occurrences migrated to: https://en.wikipedia.org/wiki/Consistent_hashing ([https](https://en.wikipedia.org/wiki/Consistent_hashing) result 200). * http://erlware.org/ with 1 occurrences migrated to: https://erlware.org/ ([https](https://erlware.org/) result 200). * http://inaka.github.io/cowboy-trails/ with 1 occurrences migrated to: https://inaka.github.io/cowboy-trails/ ([https](https://inaka.github.io/cowboy-trails/) result 200). * http://ninenines.eu with 6 occurrences migrated to: https://ninenines.eu ([https](https://ninenines.eu) result 200). * http://www.actordb.com/ with 2 occurrences migrated to: https://www.actordb.com/ ([https](https://www.actordb.com/) result 200). * http://www.cs.kent.ac.uk/projects/wrangler/Home.html with 1 occurrences migrated to: https://www.cs.kent.ac.uk/projects/wrangler/Home.html ([https](https://www.cs.kent.ac.uk/projects/wrangler/Home.html) result 200). * http://www.rabbitmq.com/plugins.html with 1 occurrences migrated to: https://www.rabbitmq.com/plugins.html ([https](https://www.rabbitmq.com/plugins.html) result 200). * http://www.rebar3.org with 1 occurrences migrated to: https://www.rebar3.org ([https](https://www.rebar3.org) result 200). * http://contributor-covenant.org with 1 occurrences migrated to: https://contributor-covenant.org ([https](https://contributor-covenant.org) result 301). * http://contributor-covenant.org/version/1/3/0/ with 1 occurrences migrated to: https://contributor-covenant.org/version/1/3/0/ ([https](https://contributor-covenant.org/version/1/3/0/) result 301). * http://inaka.github.com/apns4erl with 1 occurrences migrated to: https://inaka.github.com/apns4erl ([https](https://inaka.github.com/apns4erl) result 301). * http://inaka.github.com/edis/ with 1 occurrences migrated to: https://inaka.github.com/edis/ ([https](https://inaka.github.com/edis/) result 301). * http://lasp-lang.org/ with 1 occurrences migrated to: https://lasp-lang.org/ ([https](https://lasp-lang.org/) result 301). * http://saleyn.github.com/erlexec with 1 occurrences migrated to: https://saleyn.github.com/erlexec ([https](https://saleyn.github.com/erlexec) result 301). * http://www.mozilla.org/MPL/ with 3 occurrences migrated to: https://www.mozilla.org/MPL/ ([https](https://www.mozilla.org/MPL/) result 301). * http://zhongwencool.github.io/observer_cli with 1 occurrences migrated to: https://zhongwencool.github.io/observer_cli ([https](https://zhongwencool.github.io/observer_cli) result 301).	2019-03-20 03:13:58 -05:00
Diana Corbacho	1b75f7beec	More tests for #40 Use served-named queues	2019-01-07 14:03:19 +00:00
Diana Corbacho	81836b3531	Refactor test code References #40.	2019-01-07 13:20:23 +00:00
Michael Klishin	235bacb3b4	More tests for #40	2019-01-07 15:56:17 +03:00
Michael Klishin	93626ee0c8	Add a failing test for the scenario outlined in #40	2019-01-07 15:24:12 +03:00
Michael Klishin	0638c70552	Make chi squared test an observation we log, not an assertion Due to randomness of the inputs and other characteristics that vary beetween environments it doesn't always end up being < the expected value but there's plenty of evidence that in most environments the resulting distribution is very uniform (for all intents and purposes of this plugin anyway). References #37, #39.	2018-08-31 23:51:36 +02:00
Michael Klishin	6ace19d972	Use only a subset of queues in routing tests	2018-08-28 20:01:25 +03:00
Michael Klishin	0b1776d59d	More tests, more idempotent binding management operations [#159822323]	2018-08-28 19:53:52 +03:00
Michael Klishin	b368ee922e	Increase sample count to pass Chi squared test in more environments, reorganise tests We still depend on the PRNG to provide a reasonably uniform distribution of inputs (e.g. routing keys) but things pass in at least 3 different environments reliably with 150K iterations. Pair: @dcorbacho. References #37, #38.	2018-08-21 16:40:21 +03:00
Michael Klishin	ab5f54ee8f	Bring back the Chi squared test assertion, bump the number of samples	2018-08-21 16:23:10 +03:00
Michael Klishin	67fe821b79	Fix a warning	2018-08-21 16:03:18 +03:00
Michael Klishin	05e7cc756f	Don't assert on Chi squared test value In some environments, namely our Concourse containers, with some iterations of the test the value exceeds the reference value of p-value = 0.01. This may be specific to OTP 19.3 or certain platforms. This is not something that I can reproduce in a number of OTP 21 environments. References #37, #38.	2018-08-21 06:50:04 +03:00
Michael Klishin	d6e9fd9b9e	Test suite improvements * Use publisher confirms, that's what the test really needs * Clean up exchanges before setting up topology to make sure failing tests do not leave anything behind	2018-08-20 19:47:43 +03:00
Michael Klishin	e132a0a865	A typo	2018-08-20 18:31:56 +03:00
Michael Klishin	b887efdcf3	Extract a few test helpers	2018-08-20 15:05:01 +03:00
Diana Corbacho	f319c84343	Test different bucket sizes	2018-08-20 11:07:12 +01:00
Diana Corbacho	02c5be2d54	Test - and fix - binding cleanup	2018-08-20 08:57:35 +01:00
Diana Corbacho	d82a77cecc	Verify distribution using chi-square test	2018-08-17 12:12:21 +01:00
Michael Klishin	72623501a4	Merge branch 'stable'	2017-04-02 21:56:37 +03:00
Michael Klishin	c30520abbd	(c) year	2017-04-02 21:47:35 +03:00
Jean-Sébastien Pédron	a5323b7379	Use `rand` directly in master because we require Erlang 18.3 References rabbitmq/rabbitmq-server#860. [#122335241]	2016-06-30 17:30:20 +02:00
Jean-Sébastien Pédron	9aa6728140	Use the new `rand_compat` module to transition from `random` to `rand` References #860. [#122335241]	2016-06-29 13:25:19 +02:00
Michael Klishin	54d88af579	Not really probabilistic	2016-06-23 00:17:31 +03:00
Michael Klishin	619160c4b4	Switch test suite to Common Test Fixes #21.	2016-06-20 16:30:26 +03:00
Michael Klishin	3e06644577	Update (c) info	2016-01-01 12:59:16 +03:00
Alvaro Videla	b506dad8c0	fixes connection leaking on tests	2015-09-16 14:38:26 +03:00

1 2

64 Commits