Commit Graph

204 Commits

Author SHA1 Message Date
Chunyi Lyu 17c5dffe7a Set common global counters for mqtt
- protocols are set to mqtt301 or mqtt311 depends
on protocal version set by client connection
- added boolean isPublisher in proc state to track
if connection was ever used to publish. This is used to
set publisher_create and publisher_delete global counters.
- tests added in integration_SUITE
2023-01-24 17:29:07 +00:00
David Ansari 319af3872e Handle duplicate packet IDs
"If a Client re-sends a particular Control Packet, then it MUST use the
same Packet Identifier in subsequent re-sends of that packet."

A client can re-send a PUBLISH packet with the same packet ID.
If the MQTT connection process already received the original packet and
sent it to destination queues, it will ignore this re-send.

The packet ID will be acked to the publisher once a confirmation from
all target queues is received.

There should be no risk of "stuck" messages within the MQTT connection
process because quorum and stream queue clients re-send the message and
classic queues will send a monitor DOWN message in case they are down.
2023-01-24 17:29:07 +00:00
Chunyi Lyu 645531bc95 Register mqtt connections in case event refresh 2023-01-24 17:29:07 +00:00
David Ansari 14f59f1380 Handle soft limit exceeded as queue action
Instead of performing credit_flow within quorum queue and stream queue
clients, return new {block | unblock, QueueName} actions.

The queue client process can then decide what to do.

For example, the channel continues to use credit_flow such that the
channel gets blocked sending any more credits to rabbit_reader.

However, the MQTT connection process does not use credit_flow. It
instead blocks its reader directly.
2023-01-24 17:29:07 +00:00
David Ansari 816fedf080 Enable flow control to target classic queue 2023-01-24 17:29:07 +00:00
Chunyi Lyu 8126925617 Implement format_status for mqtt reader
- truncate queue type state from mqtt proc_state, which
could be huge with many destination queues. Instead, format_status
now returns number of destination queues.
2023-01-24 17:29:07 +00:00
David Ansari 627ea8588a Add rabbit_event tests for MQTT
Add tests that MQTT plugin sends correct events to rabbit_event.

Add event connection_closed.
2023-01-24 17:29:07 +00:00
David Ansari 07ad410d81 Skip queue when MQTT QoS 0
This commit allows for huge fanouts if the MQTT subscriber connects with
clean_session = true and QoS 0. Messages are not sent to a conventional queue.
Instead, messages are forwarded directly from MQTT publisher connection
process or channel to MQTT subscriber connection process.
So, the queue process is skipped.

The MQTT subscriber connection process acts as the queue process.
Its mailbox is a superset of the queue. This new queue type is called
rabbit_mqtt_qos0_queue.
Given that the only current use case is MQTT, this queue type is
currently defined in the MQTT plugin.
The rabbit app is not aware that this new queue type exists.

The new queue gets persisted as any other queue such that routing via
the topic exchange contineues to work as usual. This allows routing
across different protocols without any additional changes, e.g. huge
fanout from AMQP client (or management UI) to all MQTT devices.

The main benefit is that memory usage of the publishing process is kept at
0 MB once garbage collection kicked in (when hibernating the gen_server).
This is achieved by having this queue type's client not maintain any
state. Previously, without this new queue type, the publisher process
maintained state of 500MB to all the 1 million destination queues even
long after stopping sending messages to these queues.

Another big benefit is that no queue process need to be created.
Prior to this commit, with 1 million MQTT subscribers, 3 million Erlang
processes got created: 1 million MQTT connection processes, 1 million classic
queue processes, and 1 million classic queue supervisor processes.
After this commit, only the 1 million MQTT connection processes get
created. Hence, a few GBs of process memory will be saved.

Yet another big benefit is that because the new queue type's client
auto-settles the delivery when sending, the publishing process only
awaits confirmation from queues who potentially have at-least-once
consumers. So, the publishing process is not blocked on sending the
confirm back to the publisher if 1 message is let's say routed to 1
million MQTT QoS 0 subscribers while 1 copy is routed to an important
quorum queue or stream and while a single out of the million MQTT
connection processes is down.

Other benefits include:
* Lower publisher confirm latency
* Reduced inter-node network traffic

In a certain sense, this commit allows RabbitMQ to act as a high scale
and high throughput MQTT router (that obviously can lose messages at any
time given the QoS is 0).
For example, it allows use cases as using RabbitMQ to send messages cheaply
and quickly to 1 million devices that happen to be online at the given
time: e.g. send a notification to any online mobile device.
2023-01-24 17:29:07 +00:00
David Ansari af68fb4484 Decrease memory usage of queue_type state
Prior to this commit, 1 MQTT publisher publishing to 1 Million target
classic queues requires around 680 MB of process memory.

After this commit, it requires around 290 MB of process memory.

This commit requires feature flag classic_queue_type_delivery_support
and introduces a new one called no_queue_name_in_classic_queue_client.

Instead of storing the binary queue name 4 times, this commit now stores
it only 1 time.

The monitor_registry is removed since only classic queue clients monitor
their classic queue server processes.

The classic queue client does not store the queue name anymore. Instead
the queue name is included in messages handled by the classic queue
client.

Storing the queue name in the record ctx was unnecessary.

More potential future memory optimisations:
* When routing to destination queues, looking up the queue record,
  delivering to queue: Use streaming / batching instead of fetching all
  at once
* Only fetch ETS columns that are necessary instead of whole queue
  records
* Do not hold the same vhost binary in memory many times. Instead,
  maintain a mapping.
* Remove unnecessary tuple fields.
2023-01-24 17:29:07 +00:00
David Ansari 199238d76e Use pg to track MQTT client IDs
Instead of tracking {Vhost, ClientId} to ConnectionPid mappings in our
custom process registry, i.e. custom local ETS table with a custom
gen_server process managing that ETS table, this commit uses the pg module
because pg is better tested.

To save memory with millions of MQTT client connections, we want to save
the mappings only locally on the node where the connection resides and
therfore not be replicated across all nodes.

According to Maxim Fedorov:
"The easy way to provide per-node unique pg scope is to start it like
pg:start_link(node()). At least that's what we've been doing to have
node-local scopes. It will still try to discover scopes on nodeup from
nodes joining the cluster, but since you cannot have nodes with the
same name in one cluster, using node() for local-only scopes worked
well for us."

So that's what we're doing in this commit.
2023-01-24 17:29:07 +00:00
David Ansari ab8957ba9c Use best-effort client ID tracking
"Each Client connecting to the Server has a unique ClientId"

"If the ClientId represents a Client already connected to
the Server then the Server MUST disconnect the existing
Client [MQTT-3.1.4-2]."

Instead of tracking client IDs via Raft, we use local ETS tables in this
commit.

Previous tracking of client IDs via Raft:
(+) consistency (does the right thing)
(-) state of Ra process becomes large > 1GB with many (> 1 Million) MQTT clients
(-) Ra process becomes a bottleneck when many MQTT clients (e.g. 300k)
    disconnect at the same time because monitor (DOWN) Ra commands get
    written resulting in Ra machine timeout.
(-) if we need consistency, we ideally want a single source of truth,
    e.g. only Mnesia, or only Khepri (but not Mnesia + MQTT ra process)

While above downsides could be fixed (e.g. avoiding DOWN commands by
instead doing periodic cleanups of client ID entries using session interval
in MQTT 5 or using subscription_ttl parameter in current RabbitMQ MQTT config),
in this case we do not necessarily need the consistency guarantees Raft provides.

In this commit, we try to comply with [MQTT-3.1.4-2] on a best-effort
basis: If there are no network failures and no messages get lost,
existing clients with duplicate client IDs get disconnected.

In the presence of network failures / lost messages, two clients with
the same client ID can end up publishing or receiving from the same
queue. Arguably, that's acceptable and less worse than the scaling
issues we experience when we want stronger consistency.

Note that it is also the responsibility of the client to not connect
twice with the same client ID.

This commit also ensures that the client ID is a binary to save memory.

A new feature flag is introduced, which when enabled, deletes the Ra
cluster named 'mqtt_node'.

Independent of that feature flag, client IDs are tracked locally in ETS
tables.
If that feature flag is disabled, client IDs are additionally tracked in
Ra.

The feature flag is required such that clients can continue to connect
to all nodes except for the node being udpated in a rolling update.

This commit also fixes a bug where previously all MQTT connections were
cluster-wide closed when one RabbitMQ node was put into maintenance
mode.
2023-01-24 17:29:07 +00:00
David Ansari 43bd548dcc Handle deprecated classic queue delivery
when feature flag classic_queue_type_delivery_support is disabled.
2023-01-24 17:29:07 +00:00
David Ansari 5710a9474a Support MQTT Keepalive in WebMQTT
Share the same MQTT keepalive code between rabbit_mqtt_reader and
rabbit_web_mqtt_handler.

Add MQTT keepalive test in both plugins rabbitmq_mqtt and
rabbitmq_web_mqtt.
2023-01-24 17:29:07 +00:00
David Ansari 6f00ccb3ad Get all existing rabbitmq_web_mqtt tests green 2023-01-24 17:29:07 +00:00
David Ansari a02cbb73a1 Get all existing rabbitmq_mqtt tests green 2023-01-24 17:29:07 +00:00
David Ansari 23dac495ad Support QoS 1 for sending and receiving 2023-01-24 17:29:07 +00:00
David Ansari cdd253ee87 Receive many messages from classic queue
Before this commit, a consumer from a classic queue was receiving max
200 messages:
bb5d6263c9/deps/rabbit/src/rabbit_queue_consumers.erl (L24)

MQTT consumer process must give credit to classic queue process
due to internal flow control.
2023-01-24 17:29:07 +00:00
David Ansari 99337b84d3 Emit stats
'connection' field is not needed anymore because it was
previously the internal AMQP connection PID
2023-01-24 17:29:07 +00:00
David Ansari 218ee196c4 Make proxy_protocol tests green 2023-01-24 17:29:07 +00:00
David Ansari 77da78f478 Get most auth_SUITE tests green
Some tests which require clean_start=false
or QoS1 are skipped for now.

Differentiate between v3 and v4:
v4 allows for an error code in SUBACK frame.
2023-01-24 17:29:07 +00:00
David Ansari 73ad3bafe7 Revert maybe expression
rabbit_misc:pipeline looks better and doesn't require experimental
feature
2023-01-24 17:29:07 +00:00
David Ansari f4d1f68212 Move authn / authz into rabbitmq_mqtt 2023-01-24 17:29:07 +00:00
David Ansari eac0622f37 Consume with QoS0 via queue_type interface 2023-01-24 17:29:07 +00:00
David Ansari 24b0a6bcb2 Publish with QoS0 via queue_type interface 2023-01-24 17:29:07 +00:00
David Ansari 8710565b2a Use 1 instead of 22 Erlang processes per MQTT connection
* Create MQTT connections without proxying via AMQP
* Do authn / authz in rabbitmq_mqtt instead of rabbit_direct:connect/5
* Remove rabbit_heartbeat process and per connection supervisors

Current status:

Creating 10k MQTT connections with clean session succeeds:
./emqtt_bench conn -V 4 -C true -c 10000 -R 500
2023-01-24 17:29:07 +00:00
Alexey Lebedeff b6cd708a08 Fix all dialyzer warnings in rabbitmq_web_mqtt 2023-01-19 17:23:23 +01:00
Michael Klishin ec4f1dba7d
(c) year bump: 2022 => 2023 2023-01-01 23:17:36 -05:00
Luke Bakken 7fe159edef
Yolo-replace format strings
Replaces `~s` and `~p` with their unicode-friendly counterparts.

```
git ls-files *.erl | xargs sed -i.ORIG -e s/~s>/~ts/g -e s/~p>/~tp/g
```
2022-10-10 10:32:03 +04:00
David Ansari 49ed70900e Fix failing proxy_protocol test
Prior to this commit, test
```
make -C deps/rabbitmq_web_mqtt ct-proxy_protocol t=http_tests:proxy_protocol
```

was failing with reason
```
exception error: no function clause matching
                 rabbit_net:sockname({rabbit_proxy_socket,#Port<0.96>,
```
2022-08-25 20:00:49 +02:00
David Ansari 28db862d56 Avoid crash when client disconnects before server handles MQTT CONNECT
In case of a resource alarm, the server accepts incoming TCP
connections, but does not read from the socket.
When a client connects during a resource alarm, the MQTT CONNECT frame
is therefore not processed.

While the resource alarm is ongoing, the client might time out waiting
on a CONNACK MQTT packet.

When the resource alarm clears on the server, the MQTT CONNECT frame
gets processed.

Prior to this commit, this results in the following crash on the server:
```
** Reason for termination ==
** {{badmatch,{error,einval}},
    [{rabbit_mqtt_processor,process_login,4,
                            [{file,"rabbit_mqtt_processor.erl"},{line,585}]},
     {rabbit_mqtt_processor,process_request,3,
                            [{file,"rabbit_mqtt_processor.erl"},{line,143}]},
     {rabbit_mqtt_processor,process_frame,2,
                            [{file,"rabbit_mqtt_processor.erl"},{line,69}]},
     {rabbit_mqtt_reader,process_received_bytes,2,
                         [{file,"src/rabbit_mqtt_reader.erl"},{line,307}]},
```

After this commit, the server just logs:
```
[error] <0.887.0> MQTT protocol error on connection 127.0.0.1:55725 -> 127.0.0.1:1883: peername_not_known
```

In case the client already disconnected, we want the server to bail out
early, i.e. not authenticating and registering the client at all
since that can be expensive when many clients connected while the
resource alarm was ongoing.

To detect whether the client disconnected, we rely on inet:peername/1
which will return an error when the peer is not connected anymore.

Ideally we could use some better mechanism for detecting whether the
client disconnected.

The MQTT reader does receive a {tcp_closed, Socket} message once the
socket becomes active. However, we don't really want to read frames
ahead (i.e. ahead of the received CONNECT frame), one reason being that:
"Clients are allowed to send further Control Packets immediately
after sending a CONNECT Packet; Clients need not wait for a CONNACK Packet
to arrive from the Server."

Setting socket option `show_econnreset` does not help either because the client
closes the connection normally.

Co-authored-by: Péter Gömöri @gomoripeti
2022-08-25 18:42:37 +02:00
David Ansari 20677395cd Check queue and exchange existence with ets:member/2
This reduces memory usage and improves code readability.
2022-05-10 10:16:40 +00:00
Michael Klishin 0ae3f19698
mqtt.queue_type => mqtt.durable_queue_type 2022-03-31 19:48:00 +04:00
Gabriele Santomaggio 2c49748c70
Add quorum queues support for MQTT
Enable the quorum queue for MQTT only if CleanSession is False.
QQs don't support auto-delete flag so in case Clean session is True
the queue will be a classic queue.

Add another group test non_parallel_tests_quorum.
For Mixed test the quorum_queue feature flag must be enabled.

Add log message
2022-03-30 08:49:17 -07:00
Michael Klishin c38a3d697d
Bump (c) year 2022-03-21 01:21:56 +04:00
Alexey Lebedeff e0723d5e66 Prevent crash logs when mqtt user is missing permissions
Fixes #2941

This adds proper exception handlers in the right places. And tests
ensure that it indeed provides nice neat logs without large
stacktraces for every amqp operation.

Unnecessary checking for subscribe permissions on topic was dropped,
as `queue.bind` does exactly the same check. Topic permissions tests
were also added, and they indeed confirm that there was no change in
behaviour.

Ideally the same explicit topic permission check should be dropped for
publishing, but it's more complicated - so for now there only a
detailed comment in the source code explaining it.

A few other things were also optimized away:
- Using amqp client to test for queue existence
- Creating queues/starting consumptions too eagerly, even if not yet
  requested by client
2021-11-12 18:03:05 +01:00
Michal Kuratczyk 41922b96cf
Change a log line from INFO to DEBUG
This line is printed on every new MQTT connection which leads to very chatty logs when there is a lot of connections. Given that the way MQTT uses vhosts is generally static (once set up, always the same for all connections), I think this can be a debug message instead.
2021-07-12 16:50:25 +02:00
Michael Klishin 97ff62d3b2
Drop trailing newlines from logged messages where possible
Lager strips trailing newline characters but OTP logger with the default
formatter adds a newline at the end. To avoid unintentional multi-line log
messages we have to revisit most messages logged.

Some log entries are intentionally multiline, others
are printed to stdout directly: newlines are required there
for sensible formatting.
2021-03-11 15:17:37 +01:00
Michael Klishin 52479099ec
Bump (c) year 2021-01-22 09:00:14 +03:00
dcorbacho d80e8e1bec Add protocol to auth attempt metrics 2020-09-23 11:16:13 +01:00
dcorbacho b138241b52 Add auth attempt metrics 2020-08-28 13:19:05 +01:00
dcorbacho 119eb99e8d Switch to Mozilla Public License 2.0 (MPL 2.0) 2020-07-13 17:39:36 +01:00
Jean-Sébastien Pédron dcc5f7b553 Update copyright (year 2020) 2020-03-10 16:39:48 +01:00
Michael Klishin e6a8d93bb5 Inject a delay before joining client ID tracking cluster
We have considered multiple options in preventing a split cluster
scenario when N nodes a started in parallel and are initially unaware of
each other. They all are fairly involved and run various risks, e.g.
of losing consistency for cluster members that need to rejoin a newly
discovered set of members.

A simple delay to see if there may be any peers seems to be a straightfoward
solution that would make a practical difference.

In the future consistent client ID tracking should be a feature the user
can opt out of because it tilts MQTT plugin potentially to far towards
C on the consistency/availability range.

Pair: @kjnilsson
2020-02-24 17:58:03 +03:00
Michael Klishin 377752d003 Ignore client ID tracker timeouts on connection closure
There isn't much to do about those at this stage in the connection
lifecycle anyway.
2020-02-21 21:42:39 +03:00
kjnilsson eadf5f7094 Make interactions with Ra async
To avoid blocking when registering or unregistering a client id. This is
ok as informing the current connection holder of the client id is
already async. This should be more scalable and provide much better MQTT
connection setup latency.
2020-02-10 17:28:18 +00:00
Michael Klishin 2927f473ce (c) bump 2019-12-29 05:50:32 +03:00
Grigory Starinkin a337839983 limit topic permission cache size 2019-11-07 14:40:10 +00:00
Grigory Starinkin 8c29181b7b cache topic permission access
performance optimisation
2019-11-07 14:39:43 +00:00
Michael Klishin 35a99a24a2 Downgrade QoS 2 to QoS 1 when sending Last Will
Closes #214.
2019-11-05 16:54:20 +00:00
Luke Bakken d0c0ec33ff Use new translation funs in library 2019-09-04 08:07:33 -07:00